AbstractΒΆ
This project implements the Debevec & Malik (1997) high dynamic range imaging pipeline from scratch in NumPy. The goal is to take a bracketed set of photographs (the same scene captured at multiple exposure times) and recover a single linear radiance map that captures detail across the full dynamic range of the scene, from deep shadows to bright highlights. The recovered map is then compressed into a displayable 8-bit image using two tone mapping operators.
The camera's inverse response function and per-pixel log-irradiance are recovered jointly from the bracketed exposures via a weighted least-squares system. Two practical extensions are implemented beyond the standard formulation: a joint iterative solver that estimates unknown exposure times directly from pixel data when EXIF metadata is absent or unreliable, and a chained RANSAC alignment strategy for handheld sequences where camera motion between shots must be corrected before merging. The chained approach, which matches only adjacent exposure pairs rather than each image directly to the reference, is necessary for high dynamic range scenes where the darkest and brightest images share almost no visible content.
The pipeline is evaluated on a variety of scenes ranging from 3-exposure tripod sequences to an 11-exposure handheld cave sequence spanning approximately 13 stops. Two tone mapping operators are compared: a global Reinhard (2002) operator implemented from scratch, and the Mantiuk (2006) gradient-domain operator accessed via OpenCV. Reinhard produces clean, natural-looking results on most scenes; Mantiuk has an advantage on scenes with extreme local contrast, though at the cost of some global contrast and occasional halo artifacts. The better operator genuinely depends on the scene.
Code LibrariesΒΆ
External LibrariesΒΆ
- NumPy: all core numerical computation: least-squares solving, matrix operations, array manipulation throughout the pipeline
- Pillow (PIL): image loading and saving; EXIF metadata extraction for shutter speed values
- Matplotlib: all visualisation: exposure plots, response curves, radiance maps, tone-mapped outputs
- scikit-image: Harris corner detection, BRIEF descriptor extraction, RANSAC homography fitting, CLAHE preprocessing, and projective transform warping used in
alignment_ransac.py - OpenCV (cv2): Mantiuk (2006) tone mapping operator via
cv2.createTonemapMantiuk; all other pipeline stages are implemented from scratch without OpenCV
All libraries above are included in the standard Anaconda distribution except OpenCV. Install with: pip install opencv-python
My Libraries (mylibs/)ΒΆ
debevec_malik.py: camera response curve solver (Szeliski Eq. 10.7), joint iterative exposure estimator, per-channel response solving with ramp correction, HDR radiance map builder with saturated-pixel fallback (Eq. 10.10)alignment_ransac.py: chained RANSAC alignment for handheld brackets: CLAHE preprocessing, Harris + BRIEF feature matching between adjacent exposure pairs, homography composition to a common reference frametone_mapping.py: global Reinhard (2002) tone mapping implemented from scratch; Mantiuk (2006) wrapper around OpenCV with NaN/Inf handling
SetupΒΆ
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath('hdr_project.ipynb')), 'mylibs'))
import math
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from IPython.display import display
from alignment_ransac import align_chained
from debevec_malik import (solve_joint_response_and_exposure,
solve_per_channel_curves, build_radiance_map)
from tone_mapping import tone_map_reinhard, tone_map_mantiuk
def load_images_and_exposures(folder):
"""Load all JPEGs/PNGs from folder (sorted by name) and read EXIF exposure times."""
files = sorted(f for f in os.listdir(folder)
if f.lower().endswith(('.jpg', '.jpeg', '.png')))
images, exposures = [], []
for f in files:
path = os.path.join(folder, f)
img = np.array(Image.open(path).convert('RGB'), dtype=np.uint8)
images.append(img)
pil = Image.open(path)
exif = pil._getexif()
exp = exif.get(33434) if exif else None
exposures.append(float(exp) if exp else None)
return images, exposures
ConfigurationΒΆ
Select an image folder below and adjust the solver parameters as needed. N_PIXELS controls how many pixels are sampled for the response curve least-squares system (more = better coverage of the 0-255 range). LAMBDA is the smoothness regularisation weight.
# ===== CHOOSE YOUR FOLDER =====
# Standard (3 exposures, tripod):
# 'images/austin' - cityscape at dawn
# 'images/tenerife' - landscape with bright sun
# 'images/pecos-river' - canyon / river scene
# 'images/santorini' - night city scene
# 'images/blimp' - blimp
# 'images/chinese-garden' - garden scene
# 'images/pond' - pond scene
# 'images/room' - indoor room
# 'images/cave' - dark cave interior (4 exposures, high DR)
# 'images/wadi-rum-sunset' - desert sunset
#
# Handheld (3 exposures, needs RANSAC alignment):
# 'images/door' - indoor door scene
# 'images/light' - interior lighting scene
# 'images/apartment' - apartment scene
# 'images/colour-windows' - colour windows scene
# 'images/crossing' - rail crossing scene
# 'images/e3' - Engineering 3 scene
# 'images/eit' - Centre for Environmental and Information Technology scene
# 'images/mc' - Math and Computing scene
# 'images/qnc' - Quantum Nano Centre scene
#
# High dynamic range (11 exposures):
# 'images/agia-sofia' - cave with bright exterior opening (~13 stops)
IMAGE_FOLDER = 'images/agia-sofia'
# ==============================
N_PIXELS = 1000 # pixels sampled for response curve fitting
LAMBDA = 100 # smoothness regularisation weight
Pipeline WalkthroughΒΆ
The cells below run the full pipeline on the selected scene, showing intermediate outputs at each stage.
Step 1: Load Images and Exposure TimesΒΆ
Images are loaded from the selected folder sorted by filename. For each file, EXIF tag 33434 (shutter speed) is read to recover the exposure time $t_j$. If all images carry valid EXIF, those values are used directly and the estimated values from Step 2 are shown only as a sanity check. If any file is missing EXIF, all exposure times are estimated by the joint solver.
Challenge: EXIF data is sometimes absent or stores rounded values (e.g. $1/N$ instead of the true shutter speed). The pipeline handles both cases without any manual configuration.
images, exif_exposures = load_images_and_exposures(IMAGE_FOLDER)
has_exif = all(e is not None for e in exif_exposures)
ref_idx_init = len(images) // 2
if has_exif:
print('EXIF exposures found:', [round(e, 5) for e in exif_exposures])
else:
print('No EXIF data - exposure times will be estimated from pixels.')
print(f'Loaded {len(images)} images, shape {images[0].shape}')
cols = 2
rows = math.ceil(len(images) / cols)
fig, axes = plt.subplots(rows, cols, figsize=(20, 10 * rows))
axes = np.array(axes).reshape(-1)
for i, img in enumerate(images):
label = f't={exif_exposures[i]:.5f}s' if has_exif else f'Image {i}'
axes[i].imshow(img); axes[i].set_title(label); axes[i].axis('off')
for ax in axes[len(images):]:
ax.axis('off')
plt.tight_layout(); plt.show()
EXIF exposures found: [0.125, 0.00063, 0.00025, 0.06667, 0.03333, 0.01667, 0.008, 0.004, 0.0025, 0.00156, 0.001] Loaded 11 images, shape (535, 870, 3)
Step 2: Camera Response and Exposure EstimationΒΆ
The camera maps scene irradiance $E$ and exposure time $t$ to pixel value $z$ via an unknown nonlinear function $f$: $$z = f(E \cdot t)$$ We want the inverse log-response $g = \log(f^{-1})$, so that: $$g(z_{ij}) = \log E_i + \log t_j$$
This is solved as a weighted least-squares system (Szeliski Eq. 10.7). The hat weighting function $w(z) = \min(z,255-z)$ down-weights near-saturated and near-black pixels where the response is unreliable. Two constraints make the system well-posed:
- Anchor: $g(128) = 0$ - removes the global shift ambiguity
- Smoothness: $\lambda \cdot w(k) \cdot g''(k) = 0$ for $k = 1,\ldots,254$ - penalises a jagged curve
Since exposure times may be unknown, the solver alternates between estimating $g$ and refining $\log t_j$:
- Initial estimate: $\log(t_b / t_a) \approx \dfrac{\sum_i w(z_{ai})\,w(z_{bi})\,\log(z_{bi}/z_{ai})}{\sum_i w(z_{ai})\,w(z_{bi})}$, chained over adjacent pairs sorted by brightness
- Iterate: solve $g\rightarrow$ refine $\log t_j$ using $g(z_{bi}) - g(z_{ai})$ differences $\rightarrow$ blend 50/50 $\rightarrow$ repeat until $\max|\Delta \log t| < 10^{-4}$
Why adjacent-pair chaining? Comparing images 10 stops apart leaves almost no pixels with valid hat-weights in both frames. Adjacent pairs (1-2 stops apart) always share sufficient overlap regardless of total dynamic range, which is critical for the 11-exposure agia-sofia sequence.
print(f'Running joint exposure solver (reference = image {ref_idx_init}) ...')
g_lum, log_t_estimated = solve_joint_response_and_exposure(
images, n_pixels=N_PIXELS, lam=LAMBDA, ref_idx=ref_idx_init, verbose=True)
x = np.arange(len(images))
bar_w = 0.35
fig, ax = plt.subplots(figsize=(max(6, len(images) * 1.2), 4))
if has_exif:
log_t_exif = np.log(np.array(exif_exposures, dtype=np.float64))
log_t_exif_rel = log_t_exif - log_t_exif[ref_idx_init]
print(f"{'Image':>6} {'Estimated (stops)':>18} {'EXIF truth (stops)':>18} {'Error (stops)':>14}")
print('-' * 62)
errors = []
for j in range(len(images)):
est = log_t_estimated[j] / np.log(2)
true = log_t_exif_rel[j] / np.log(2)
err = abs(est - true)
errors.append(err)
print(f'{j:>6} {est:>18.4f} {true:>18.4f} {err:>14.4f}')
print(f'\nMean absolute error: {np.mean(errors):.4f} stops')
ax.bar(x - bar_w/2, log_t_estimated / np.log(2), bar_w,
label='Estimated', color='steelblue')
ax.bar(x + bar_w/2, log_t_exif_rel / np.log(2), bar_w,
label='EXIF truth', color='coral')
ax.set_title('Estimated vs EXIF exposure ratios'); ax.legend()
log_t = log_t_exif_rel
print('\nUsing EXIF exposure times for radiance map.')
else:
print('No EXIF data - using estimated exposure times.')
for j in range(len(images)):
print(f' Image {j}: {log_t_estimated[j]/np.log(2):+.3f} stops')
ax.bar(x, log_t_estimated / np.log(2), color='steelblue', label='Estimated')
ax.set_title('Estimated exposure ratios (no EXIF ground truth)')
log_t = log_t_estimated
ax.set_xlabel('Image index')
ax.set_ylabel('Exposure ratio (stops, relative to reference)')
ax.set_xticks(x); plt.tight_layout(); plt.show()
Running joint exposure solver (reference = image 5) ...
Image Estimated (stops) EXIF truth (stops) Error (stops)
--------------------------------------------------------------
0 2.1455 2.9069 0.7614
1 -3.1464 -4.7370 1.5906
2 -3.9996 -6.0589 2.0593
3 1.4226 2.0000 0.5774
4 0.7025 1.0000 0.2975
5 0.0000 0.0000 0.0000
6 -0.7022 -1.0589 0.3567
7 -1.3840 -2.0589 0.6749
8 -1.8196 -2.7370 0.9174
9 -2.2214 -3.4150 1.1936
10 -2.6887 -4.0589 1.3702
Mean absolute error: 0.8908 stops
Using EXIF exposure times for radiance map.
Step 3: Handheld AlignmentΒΆ
For handheld brackets, each image is taken at a slightly different camera position. Before merging, all images must be warped into a common reference frame via homography alignment.
Approach - chained RANSAC (Random Sample Consensus): CLAHE (Contrast Limited Adaptive Histogram Equalisation)-enhanced Harris + BRIEF (Binary Robust Independent Elementary Features) features are matched between each adjacent pair in exposure order (sorted by $\log t$), not directly to the reference. Per-step projective homographies are composed to reach the reference frame: $$H_{k \to \text{ref}} = H_{k \to k+1} \circ \cdots \circ H_{\text{ref}-1 \to \text{ref}}$$
CLAHE is applied before feature extraction so that near-black regions in over-exposed frames and near-white regions in under-exposed frames still reveal detectable corners.
Why chained matching instead of direct-to-reference? In scenes that have a wide dynamic range, the darkest and brightest images share almost no visible content: corners in a dark cave wall are completely invisible in the over-exposed image, and vice versa. Matching only adjacent pairs (1-2 stops apart) ensures sufficient overlap at every step regardless of the total dynamic range.
print(f'Aligning {len(images)} images (chained RANSAC) ...')
images_aligned, ref_idx, step_info, sort_order = align_chained(
images, log_t, verbose=True, show_plots=True)
Aligning 11 images (chained RANSAC) ...
Exposure order (original indices): [np.int64(2), np.int64(1), np.int64(10), np.int64(9), np.int64(8), np.int64(7), np.int64(6), np.int64(5), np.int64(4), np.int64(3), np.int64(0)]
Reference: sorted[5] = original image 7
Adjacent-pair RANSAC:
sorted[0->1] (orig 2->1): (99 kp, 99 kp), 79 matches, 74 inliers
sorted[1->2] (orig 1->10): (99 kp, 95 kp), 76 matches, 73 inliers
sorted[2->3] (orig 10->9): (95 kp, 98 kp), 74 matches, 66 inliers
sorted[3->4] (orig 9->8): (98 kp, 102 kp), 73 matches, 69 inliers
sorted[4->5] (orig 8->7): (102 kp, 109 kp), 64 matches, 57 inliers
sorted[5->6] (orig 7->6): (109 kp, 190 kp), 76 matches, 67 inliers
sorted[6->7] (orig 6->5): (190 kp, 283 kp), 158 matches, 151 inliers
sorted[7->8] (orig 5->4): (283 kp, 332 kp), 223 matches, 211 inliers
sorted[8->9] (orig 4->3): (332 kp, 341 kp), 259 matches, 243 inliers
sorted[9->10] (orig 3->0): (341 kp, 351 kp), 268 matches, 253 inliers
Warping to reference frame:
sorted[0] (orig 2): warped (5/5 step(s) succeeded)
sorted[1] (orig 1): warped (4/4 step(s) succeeded)
sorted[2] (orig 10): warped (3/3 step(s) succeeded)
sorted[3] (orig 9): warped (2/2 step(s) succeeded)
sorted[4] (orig 8): warped (1/1 step(s) succeeded)
sorted[5] (orig 7): reference
sorted[6] (orig 6): warped (1/1 step(s) succeeded)
sorted[7] (orig 5): warped (2/2 step(s) succeeded)
sorted[8] (orig 4): warped (3/3 step(s) succeeded)
sorted[9] (orig 3): warped (4/4 step(s) succeeded)
sorted[10] (orig 0): warped (5/5 step(s) succeeded)
Step 4: Per-Channel Response CurvesΒΆ
With $\log t$ converged from Step 2, we now solve for $g_R$, $g_G$, $g_B$ independently on the aligned images. Each channel's response curve is recovered by minimising the same weighted least-squares objective (Szeliski Eq. 10.7):
$$\sum_{i=1}^{P} \sum_{j=1}^{N} \left[ w(z_{ij}^c) \left( g_c(z_{ij}^c) - \log E_i^c - \log t_j \right) \right]^2 + \lambda \sum_{k=1}^{254} \left[ w(k)\, g_c''(k) \right]^2$$
where $w(z) = \min(z, 255-z)$ is the hat weighting function, $P$ is the number of sampled pixels, and $N$ is the number of exposures. Two constraints make the system well-posed: an anchor $g_c(128) = 0$ to remove the global shift ambiguity, and the smoothness term penalising the second derivative $g_c''(k) = g_c(k-1) - 2g_c(k) + g_c(k+1)$.
The joint solver in Step 2 estimated $g$ on luminance only (faster, avoids a $3\times$ larger system). Solving per-channel here gives independent $g_R$, $g_G$, $g_B$ curves, which is necessary because real cameras have different spectral sensitivities in each channel and a single grayscale curve cannot accurately model all three. After solving, a small linear ramp correction is applied over $z = 128,\ldots,255$ to force the three curves to meet at $z = 255$, preventing inter-channel drift near saturation from producing colour errors on bright surfaces.
g_curves = solve_per_channel_curves(
images_aligned, log_t, n_pixels=N_PIXELS, lam=LAMBDA)
print('Solved R, G, B response curves.')
plt.figure(figsize=(6, 4))
z = np.arange(256)
plt.plot(np.maximum.accumulate(g_lum), z, 'k--', lw=2, label='Luma (joint solve)')
for g, color, name in zip(g_curves, ['r', 'g', 'b'], ['R', 'G', 'B']):
plt.plot(g, z, color=color, label=name)
plt.xlabel('log irradiance g(z)'); plt.ylabel('pixel value z')
plt.title('Recovered camera response curves')
plt.legend(); plt.tight_layout(); plt.show()
Solved R, G, B response curves.
import cv2
# Validate per-channel curves against OpenCV's cv2.createCalibrateDebevec
images_bgr = [img[:, :, ::-1].astype(np.uint8) for img in images_aligned]
times_cv = np.exp(log_t).astype(np.float32)
calibrate = cv2.createCalibrateDebevec()
cv_response = calibrate.process(images_bgr, times=times_cv)
# cv_response shape: (256, 1, 3), BGR channel order, linear values
z = np.arange(256)
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for i, (name, cv_ch, color) in enumerate(zip(['Red', 'Green', 'Blue'], [2, 1, 0], ['red', 'green', 'blue'])):
g_ours = g_curves[i] - g_curves[i][128]
g_cv = np.log(cv_response[:, 0, cv_ch].astype(np.float64) + 1e-10)
g_cv -= g_cv[128]
axes[i].plot(z, g_ours, color=color, linewidth=2, label='Ours')
axes[i].plot(z, g_cv, '--', color=color, linewidth=2, alpha=0.6, label='OpenCV')
axes[i].set_title(name); axes[i].set_xlabel('pixel value z')
axes[i].set_ylabel('g(z) (anchored at z=128)'); axes[i].legend()
plt.suptitle('Response curve comparison: our implementation vs OpenCV CalibrateDebevec', fontsize=12)
plt.tight_layout(); plt.show()
Our curves are in general (depends on the source images) smoother than OpenCV's because we apply a stronger second-order smoothness regularization. In the least-squares system, we add a penalty in the form of $\lambda \cdot w(k) \cdot g''(k) = 0$ for every $k$ from 1 to 254, with $\lambda = $ LAMBDA (set to 100 by default). This penalizes any curvature in $g$, pulling the solution toward a smooth shape. We also multiply these smoothness terms by the hat weight $w(k)$, which concentrates the penalty most strongly around $z = 128$ and lets it taper near the extremes. OpenCV's createCalibrateDebevec uses the same Debevec-Malik formulation but with a lower default smoothness weight, which is why its curves track the sample data more closely at the cost of more oscillation.
Step 5: Build the HDR Radiance MapΒΆ
With the response curve $g$ recovered in Step 2, we can now apply the Debevec & Malik weighted merge to reconstruct a linear HDR radiance map. Each pixel's log-irradiance is estimated by a hat-weighted average across all aligned exposures (Szeliski Eq. 10.10): $$\log E_i = \frac{\displaystyle\sum_j w(z_{ij})\bigl[g(z_{ij}) - \log t_j\bigr]}{\displaystyle\sum_j w(z_{ij})}$$
This is computed independently per colour channel. The plot below shows the recovered log-radiance map alongside the darkest and brightest input exposures.
Challenge - black sun artifact: Pixels saturated in every exposure have $\sum_j w = 0$ (all hat-weights are zero). The naive fallback of $\log E = 0$ assigns minimum radiance, rendering bright specular highlights (sun discs, river reflections) as solid black circles.
Fix: Pixels with $z \geq 250$ in the darkest exposure are assigned the maximum plausible radiance extrapolated from the response curve at saturation: $$\log E_{\text{sat}} = g(255) - \log t_{\text{darkest}}$$
hdr_channels = []
for c, g in enumerate(g_curves):
channel = [img[:, :, c] for img in images_aligned]
hdr_channels.append(build_radiance_map(channel, log_t, g))
hdr = np.stack(hdr_channels, axis=-1)
print(f'HDR radiance map shape: {hdr.shape}, range: [{hdr.min():.4f}, {hdr.max():.4f}]')
log_lum = np.log2(0.299*hdr[:,:,0] + 0.587*hdr[:,:,1] + 0.114*hdr[:,:,2] + 1e-6)
dr_stops = log_lum.max() - log_lum.min()
if has_exif:
exposures_arr = np.array(exif_exposures, dtype=np.float64)
darkest_idx = int(np.argmin(exposures_arr))
brightest_idx = int(np.argmax(exposures_arr))
dark_label = f'Darkest input (t={exposures_arr[darkest_idx]:.5f}s)'
bright_label = f'Brightest input (t={exposures_arr[brightest_idx]:.5f}s)'
else:
darkest_idx = int(np.argmin(log_t))
brightest_idx = int(np.argmax(log_t))
dark_label = f'Darkest input (est. {log_t[darkest_idx]/np.log(2):.2f} stops)'
bright_label = f'Brightest input (est. {log_t[brightest_idx]/np.log(2):.2f} stops)'
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
im = axes[0].imshow(log_lum, cmap='nipy_spectral')
plt.colorbar(im, ax=axes[0], label='log2 radiance (stops)')
axes[0].set_title(f'HDR radiance map (log2 luminance)\n'
f'range: {log_lum.min():.1f} to {log_lum.max():.1f} stops '
f'(delta = {dr_stops:.1f} stops)')
axes[0].axis('off')
axes[1].imshow(images_aligned[darkest_idx])
axes[1].set_title(dark_label); axes[1].axis('off')
axes[2].imshow(images_aligned[brightest_idx])
axes[2].set_title(bright_label); axes[2].axis('off')
plt.suptitle(f'Radiance map contains {dr_stops:.1f} stops of dynamic range')
plt.tight_layout(); plt.show()
print(f'Dynamic range: {dr_stops:.1f} stops')
HDR radiance map shape: (535, 870, 3), range: [0.0052, 184.1326]
Dynamic range: 14.2 stops
Step 6a: Global Reinhard Tone MappingΒΆ
The HDR radiance map has a linear dynamic range far exceeding a display's ~8 stops. Tone mapping compresses it to 8-bit LDR. The global Reinhard (2002) operator works on luminance $L_w = 0.299R + 0.587G + 0.114B$, the standard NTSC luma coefficients used across computer vision and broadcast video (ITU-R BT.601):
\begin{aligned} &1.\ \text{Log-average scene key: } L_{\text{avg}} = \exp\!\left(\frac{1}{N}\sum_i \log(L_{w,i} + \varepsilon)\right) \\ &2.\ \text{Scale to middle grey: } L_m = L_w \cdot \frac{0.18}{L_{\text{avg}}} \\ &3.\ \text{Soft asymptotic clip: } L_d = \frac{L_m\!\left(1 + \dfrac{L_m}{L_{\text{white}}^2}\right)}{1 + L_m} \\ &4.\ \text{Scale RGB by } L_d / L_w \text{ to preserve colour ratios} \end{aligned}
Remaining limitation: A single global tone curve cannot simultaneously render a dark cave interior and a bright exterior window. High-DR scenes will clip to white in the brightest regions - this is a fundamental limitation of global operators, addressed by Mantiuk below.
ldr_reinhard = tone_map_reinhard(hdr)
fig, axes = plt.subplots(len(images), 2, figsize=(14, 6 * len(images)))
if len(images) == 1:
axes = np.array([axes])
for i, img in enumerate(images):
label = f't={exif_exposures[i]:.5f}s' if has_exif else f'Image {i}'
axes[i, 0].imshow(img); axes[i, 0].set_title(label); axes[i, 0].axis('off')
axes[i, 1].imshow(ldr_reinhard)
axes[i, 1].set_title('Reinhard (2002)'); axes[i, 1].axis('off')
plt.suptitle('Global Reinhard tone mapping - each original paired with HDR result')
plt.tight_layout(); plt.show()
fig, ax = plt.subplots(figsize=(16, 12))
ax.imshow(ldr_reinhard); ax.set_title('Reinhard (2002) result'); ax.axis('off')
plt.tight_layout(); plt.show()