Stereo Matching
Computer Vision
CSE576, Spring 2005
Richard Szeliski

Stereo Matching
Given two or more images of the same scene or object, compute a representation of its shape
What are some possible applications?

Face modeling
From one stereo pair to a 3D head model








[Frederic Deverney, INRIA]

Z-keying: mix live and synthetic
Takeo Kanade, CMU  (Stereo Machine)

Virtualized RealityTM
[Takeo Kanade et al., CMU]
collect video from 50+ stream
reconstruct 3D model sequences
steerable version used for
SuperBowl XXV “eye vision”
http://www.cs.cmu.edu/afs/cs/project/VirtualizedR/www/VirtualizedR.html

View Interpolation
Given two images with correspondences, morph (warp and cross-dissolve) between them [Chen & Williams, SIGGRAPH’93]





input     depth image     novel view
 [Matthies,Szeliski,Kanade’88]

More view interpolation
Spline-based depth map





input depth image novel view
[Szeliski & Kang ‘95]

Video view interpolation

Slide 9

View Morphing
Morph between pair of images using epipolar geometry [Seitz & Dyer, SIGGRAPH’96]

Additional applications?
Real-time people tracking (systems from Pt. Gray Research and SRI)
“Gaze” correction for video conferencing [Ott,Lewis,Cox InterChi’93]
Other ideas?

Stereo Matching
Given two or more images of the same scene or object, compute a representation of its shape
What are some possible representations?
depth maps
volumetric models
3D surface models
planar (or offset) layers

Stereo Matching
What are some possible algorithms?
match “features” and interpolate
match edges and interpolate
match all pixels with windows (coarse-fine)
use optimization:
iterative updating
dynamic programming
energy minimization (regularization, stochastic)
graph algorithms

Outline (remainder of lecture)
Image rectification
Matching criteria
Local algorithms (aggregation)
iterative updating
Optimization algorithms:
energy (cost) formulation & Markov Random Fields
mean-field, stochastic, and graph algorithms
Multi-View stereo & occlusions

Stereo: epipolar geometry
Match features along epipolar lines

Stereo: epipolar geometry
for two images (or images with collinear camera centers), can find epipolar lines
epipolar lines are the projection of the pencil of planes passing through the centers
Rectification:  warping the input images (perspective transformation) so that epipolar lines are horizontal

Rectification
Project each image onto same plane, which is parallel to the epipole
Resample lines (and shear/stretch) to place lines in correspondence, and minimize distortion
[Zhang and Loop, MSR-TR-99-21]

Rectification

Rectification

Matching criteria
Raw pixel values (correlation)
Band-pass filtered images [Jones & Malik 92]
“Corner” like features [Zhang, …]
Edges [many people…]
Gradients [Seitz 89;  Scharstein 94]
Rank statistics [Zabih & Woodfill 94]

Finding correspondences
apply feature matching criterion (e.g., correlation or Lucas-Kanade) at all pixels simultaneously
search only over epipolar lines (many fewer candidate positions)

Image registration (revisited)
How do we determine correspondences?
block matching or SSD (sum squared differences)


d is the disparity (horizontal motion)
How big should the neighborhood be?

Neighborhood size
Smaller neighborhood: more details
Larger neighborhood:  fewer isolated mistakes
        w = 3 w = 20

Stereo: certainty modeling
Compute certainty map from correlations
    input   depth map       certainty map

Plane Sweep Stereo
Sweep family of planes through volume

Plane Sweep Stereo
For each depth plane
compute composite (mosaic) image — mean
compute error image — variance
convert to confidence and aggregate spatially
Select winning depth at each pixel

Plane sweep stereo
Re-order (pixel / disparity) evaluation loops






for every pixel, for every disparity
  for every disparity   for every pixel
    compute cost     compute cost

Stereo matching framework
For every disparity, compute raw matching costs


Why use a robust function?
occlusions, other outliers
Can also use alternative match criteria

Stereo matching framework
Aggregate costs spatially
Here, we are using a box filter
(efficient moving average
implementation)
Can also use weighted average,
[non-linear] diffusion…

Stereo matching framework
Choose winning disparity at each pixel
Interpolate to sub-pixel accuracy

Traditional Stereo Matching
Advantages:
gives detailed surface estimates
fast algorithms based on moving averages
sub-pixel disparity estimates and confidence
Limitations:
narrow baseline Þ noisy estimates
fails in textureless areas
gets confused near occlusion boundaries

Feature-based stereo
Match “corner” (interest) points
Interpolate complete solution

Data interpolation
Given a sparse set of 3D points, how do we interpolate to a full 3D surface?
Scattered data interpolation [Nielson93]
triangulate
put onto a grid and fill (use pyramid?)
place a kernel function over each data point
minimize an energy function

Energy minimization
1-D example:  approximating splines

Relaxation
Iteratively improve a solution by locally minimizing the energy: relax to solution
Earliest application: WWII numerical simulations

Relaxation
How can we get the best solution?
Differentiate energy function, set to 0

Dynamic programming
Evaluate best cumulative cost at each pixel

Dynamic programming
1-D cost function

Dynamic programming
Disparity space image and min. cost path

Dynamic programming
Sample result
 (note horizontal
 streaks)
[Intille & Bobick]

Dynamic programming
Can we apply this trick in 2D as well?

Graph cuts
Solution technique for general 2D problem

Graph cuts
a-b swap
expansion
modify smoothness penalty based on edges
compute best possible match within integer disparity

Graph cuts
Two different kinds of moves:

Bayesian inference
Formulate as statistical inference problem
Prior model pP(d)
Measurement model pM(IL, IR| d)
Posterior model
pM(d | IL, IR) µ pP(d) pM(IL, IR| d)
Maximum a Posteriori (MAP estimate):
maximize pM(d | IL, IR)

Markov Random Field
Probability distribution on disparity field d(x,y)
Enforces smoothness or coherence on field

Measurement model
Likelihood of intensity correspondence
Corresponds to Gaussian noise for quadratic r

MAP estimate
Maximize posterior likelihood
Equivalent to regularization (energy minimization with smoothness constraints)

Why Bayesian estimation?
Principled way of determining cost function
Explicit model of noise and prior knowledge
Admits a wider variety of optimization algorithms:
gradient descent (local minimization)
stochastic optimization (Gibbs Sampler)
mean-field optimization
graph theoretic (actually deterministic) [Zabih]
[loopy] belief propagation
large stochastic flips [Swendsen-Wang]

Depth Map Results
Input image Sum Abs Diff
Mean field Graph cuts

Traditional stereo
Advantages:
works very well in non-occluded regions
Disadvantages:
restricted to two images (not)
gets confused in occluded regions
can’t handle mixed pixels

Multi-View Stereo
…rest of this material not
covered in this lecture…

Stereo Reconstruction
Steps
Calibrate cameras
Rectify images
Compute disparity
Estimate depth

Choosing the Baseline
What’s the optimal baseline?
Too small:  large depth error
Too large:  difficult search problem

Effect of Baseline on  Estimation

Slide 56

Multibaseline Stereo
Basic Approach
Choose a reference view
Use your favorite stereo algorithm BUT
replace two-view SSD with SSD over all baselines
Limitations
Must choose a reference view
Visibility: select which frames to match
[Kang, Szeliski, Chai, CVPR’01]

Epipolar-Plane Images [Bolles 87]
http://www.graphics.lcs.mit.edu/~aisaksen/projects/drlf/epi/

Volumetric Stereo

Voxel Coloring

Slide 61

Reconstruction from Silhouettes

Volume Intersection
Reconstruction Contains the True Scene
But is generally not the same
In the limit get visual hull

Voxel Volume Intersection
Color voxel black if in silhouette in every image
O(MN3), for M images, N3  voxels
Don’t have to search 2N3 possible scenes!

Properties of Volume Intersection
Pros
Easy to implement, fast
Accelerated via octrees [Szeliski 1993]
Cons
No concavities
Reconstruction is not photo-consistent
Requires identification of silhouettes

Slide 66

Voxel Coloring Approach

Depth Ordering:  visit occluders first!

Compatible Camera Configurations

Calibrated Image Acquisition
Calibrated Turntable
360° rotation (21 images)

Voxel Coloring Results (Video)

Slide 72

Space Carving Algorithm
Space Carving Algorithm

Space Carving Algorithm
The Basic Algorithm is Unwieldy
Complex update procedure
Alternative:  Multi-Pass Plane Sweep
Efficient, can use texture-mapping hardware
Converges quickly in practice
Easy to implement

Multi-Pass Plane Sweep
Sweep plane in each of 6 principle directions
Consider cameras on only one side of plane
Repeat until convergence

Results:  African Violet

Results:  Hand

Other Approaches

Summary
Applications
Image rectification
Matching criteria
Local algorithms (aggregation & diffusion)
Optimization algorithms
energy (cost) formulation & Markov Random Fields
mean-field;  dynamic programming; stochastic; graph algorithms
Multi-View stereo
visibility, occlusion-ordered sweeps

Bibliography
D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1):7-42, May 2002.
R. Szeliski. Stereo algorithms and representations for image-based rendering. In British Machine Vision Conference (BMVC'99), volume 2, pages 314-328, Nottingham, England, September 1999.
G. M. Nielson, Scattered Data Modeling, IEEE Computer Graphics and Applications, 13(1), January 1993, pp. 60-70.
S. B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense multi-view stereo. In CVPR'2001, vol. I, pages 103-110, December 2001.
Y. Boykov, O. Veksler, and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, Unpublished manuscript, 2000.
A.F. Bobick and S.S. Intille. Large occlusion stereo. International Journal of Computer Vision, 33(3), September 1999. pp. 181-200
D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998

Bibliography
Volume Intersection
Martin & Aggarwal, “Volumetric description of objects from multiple views”, Trans. Pattern Analysis and Machine Intelligence,  5(2), 1991, pp. 150-158.
Szeliski, “Rapid Octree Construction from Image Sequences”, Computer Vision, Graphics, and Image Processing: Image Understanding, 58(1), 1993, pp. 23-32.
Voxel Coloring and Space Carving
Seitz & Dyer, “Photorealistic Scene Reconstruction by Voxel Coloring”, Proc. Computer Vision and Pattern Recognition (CVPR), 1997, pp. 1067-1073.
Seitz & Kutulakos, “Plenoptic Image Editing”,  Proc. Int. Conf. on Computer Vision (ICCV), 1998, pp. 17-24.
Kutulakos & Seitz, “A Theory of Shape by Space Carving”,  Proc. ICCV, 1998, pp. 307-314.

Bibliography
Related References
Bolles, Baker, and Marimont, “Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion”, International Journal of Computer Vision, vol 1, no 1, 1987, pp. 7-55.
Faugeras & Keriven, “Variational principles, surface evolution, PDE's, level set methods and the stereo problem", IEEE Trans. on Image Processing, 7(3), 1998, pp. 336-344.
Szeliski & Golland, “Stereo Matching with Transparency and Matting”, Proc. Int. Conf. on Computer Vision (ICCV), 1998, 517-524.
Roy & Cox, “A Maximum-Flow Formulation of the N-camera Stereo Correspondence Problem”, Proc. ICCV, 1998, pp. 492-499.
Fua & Leclerc, “Object-centered surface reconstruction:  Combining multi-image stereo and shading", International Journal of Computer Vision, 16, 1995, pp. 35-56.
Narayanan, Rander, & Kanade, “Constructing Virtual Worlds Using Dense Stereo”, Proc. ICCV, 1998, pp. 3-10.