Notes
Slide Show
Outline
1
Stereo Matching
  • Computer Vision
    CSE576, Spring 2005
    Richard Szeliski
2
Stereo Matching
  • Given two or more images of the same scene or object, compute a representation of its shape






  • What are some possible applications?
3
Face modeling
  • From one stereo pair to a 3D head model








    [Frederic Deverney, INRIA]
4
Z-keying: mix live and synthetic
  • Takeo Kanade, CMU  (Stereo Machine)
5
Virtualized RealityTM
  • [Takeo Kanade et al., CMU]
    • collect video from 50+ stream
    • reconstruct 3D model sequences
    • steerable version used for
      SuperBowl XXV “eye vision”
  • http://www.cs.cmu.edu/afs/cs/project/VirtualizedR/www/VirtualizedR.html
6
View Interpolation
  • Given two images with correspondences, morph (warp and cross-dissolve) between them [Chen & Williams, SIGGRAPH’93]







  • input     depth image     novel view
     [Matthies,Szeliski,Kanade’88]
7
More view interpolation
  • Spline-based depth map





    input depth image novel view
  • [Szeliski & Kang ‘95]
8
Video view interpolation
9
 
10
View Morphing
  • Morph between pair of images using epipolar geometry [Seitz & Dyer, SIGGRAPH’96]
11
Additional applications?
  • Real-time people tracking (systems from Pt. Gray Research and SRI)
  • “Gaze” correction for video conferencing [Ott,Lewis,Cox InterChi’93]
  • Other ideas?
12
Stereo Matching
  • Given two or more images of the same scene or object, compute a representation of its shape


  • What are some possible representations?
    • depth maps
    • volumetric models
    • 3D surface models
    • planar (or offset) layers
13
Stereo Matching
  • What are some possible algorithms?
    • match “features” and interpolate
    • match edges and interpolate
    • match all pixels with windows (coarse-fine)
    • use optimization:
      • iterative updating
      • dynamic programming
      • energy minimization (regularization, stochastic)
      • graph algorithms
14
Outline (remainder of lecture)
  • Image rectification
  • Matching criteria
  • Local algorithms (aggregation)
    • iterative updating
  • Optimization algorithms:
    • energy (cost) formulation & Markov Random Fields
    • mean-field, stochastic, and graph algorithms
  • Multi-View stereo & occlusions
15
Stereo: epipolar geometry
  • Match features along epipolar lines
16
Stereo: epipolar geometry
  • for two images (or images with collinear camera centers), can find epipolar lines
  • epipolar lines are the projection of the pencil of planes passing through the centers
  • Rectification:  warping the input images (perspective transformation) so that epipolar lines are horizontal
17
Rectification
  • Project each image onto same plane, which is parallel to the epipole
  • Resample lines (and shear/stretch) to place lines in correspondence, and minimize distortion
  • [Zhang and Loop, MSR-TR-99-21]
18
Rectification
19
Rectification
20
Matching criteria
  • Raw pixel values (correlation)
  • Band-pass filtered images [Jones & Malik 92]
  • “Corner” like features [Zhang, …]
  • Edges [many people…]
  • Gradients [Seitz 89;  Scharstein 94]
  • Rank statistics [Zabih & Woodfill 94]
21
Finding correspondences
  • apply feature matching criterion (e.g., correlation or Lucas-Kanade) at all pixels simultaneously
  • search only over epipolar lines (many fewer candidate positions)
22
Image registration (revisited)
  • How do we determine correspondences?
    • block matching or SSD (sum squared differences)


      d is the disparity (horizontal motion)
  • How big should the neighborhood be?
23
Neighborhood size
  • Smaller neighborhood: more details
  • Larger neighborhood:  fewer isolated mistakes
  •         w = 3 w = 20
24
Stereo: certainty modeling
  • Compute certainty map from correlations
  •     input   depth map       certainty map
25
Plane Sweep Stereo
  • Sweep family of planes through volume
26
Plane Sweep Stereo
  • For each depth plane
    • compute composite (mosaic) image — mean
    • compute error image — variance
    • convert to confidence and aggregate spatially
  • Select winning depth at each pixel
27
Plane sweep stereo
  • Re-order (pixel / disparity) evaluation loops






    for every pixel, for every disparity
      for every disparity   for every pixel
        compute cost     compute cost
28
Stereo matching framework
  • For every disparity, compute raw matching costs


    Why use a robust function?
    • occlusions, other outliers
  • Can also use alternative match criteria
29
Stereo matching framework
  • Aggregate costs spatially
  • Here, we are using a box filter
    (efficient moving average
    implementation)
  • Can also use weighted average,
    [non-linear] diffusion…
30
Stereo matching framework
  • Choose winning disparity at each pixel
  • Interpolate to sub-pixel accuracy
31
Traditional Stereo Matching
  • Advantages:
    • gives detailed surface estimates
    • fast algorithms based on moving averages
    • sub-pixel disparity estimates and confidence
  • Limitations:
    • narrow baseline Þ noisy estimates
    • fails in textureless areas
    • gets confused near occlusion boundaries
32
Feature-based stereo
  • Match “corner” (interest) points
  • Interpolate complete solution
33
Data interpolation
  • Given a sparse set of 3D points, how do we interpolate to a full 3D surface?
  • Scattered data interpolation [Nielson93]
  • triangulate
  • put onto a grid and fill (use pyramid?)
  • place a kernel function over each data point
  • minimize an energy function
34
Energy minimization
  • 1-D example:  approximating splines
35
Relaxation
  • Iteratively improve a solution by locally minimizing the energy: relax to solution






  • Earliest application: WWII numerical simulations
36
Relaxation
  • How can we get the best solution?
  • Differentiate energy function, set to 0
37
Dynamic programming
  • Evaluate best cumulative cost at each pixel


38
Dynamic programming
  • 1-D cost function
39
Dynamic programming
  • Disparity space image and min. cost path
40
Dynamic programming
  • Sample result
     (note horizontal
     streaks)
  • [Intille & Bobick]
41
Dynamic programming
  • Can we apply this trick in 2D as well?
42
Graph cuts
  • Solution technique for general 2D problem
43
Graph cuts
  • a-b swap
  • expansion
  • modify smoothness penalty based on edges
  • compute best possible match within integer disparity


44
Graph cuts
  • Two different kinds of moves:
45
Bayesian inference
  • Formulate as statistical inference problem
  • Prior model pP(d)
  • Measurement model pM(IL, IR| d)
  • Posterior model
  • pM(d | IL, IR) µ pP(d) pM(IL, IR| d)
  • Maximum a Posteriori (MAP estimate):
  • maximize pM(d | IL, IR)


46
Markov Random Field
  • Probability distribution on disparity field d(x,y)







  • Enforces smoothness or coherence on field
47
Measurement model
  • Likelihood of intensity correspondence





  • Corresponds to Gaussian noise for quadratic r
48
MAP estimate
  • Maximize posterior likelihood






  • Equivalent to regularization (energy minimization with smoothness constraints)
49
Why Bayesian estimation?
  • Principled way of determining cost function
  • Explicit model of noise and prior knowledge
  • Admits a wider variety of optimization algorithms:
    • gradient descent (local minimization)
    • stochastic optimization (Gibbs Sampler)
    • mean-field optimization
    • graph theoretic (actually deterministic) [Zabih]
    • [loopy] belief propagation
    • large stochastic flips [Swendsen-Wang]
50
Depth Map Results

  • Input image Sum Abs Diff






  • Mean field Graph cuts
51
Traditional stereo
  • Advantages:
    • works very well in non-occluded regions
  • Disadvantages:
    • restricted to two images (not)
    • gets confused in occluded regions
    • can’t handle mixed pixels
52
Multi-View Stereo
  • …rest of this material not
    covered in this lecture…
53
Stereo Reconstruction
  • Steps
    • Calibrate cameras
    • Rectify images
    • Compute disparity
    • Estimate depth
54
Choosing the Baseline
  • What’s the optimal baseline?
    • Too small:  large depth error
    • Too large:  difficult search problem
55
Effect of Baseline on  Estimation
56
 
57
Multibaseline Stereo
  • Basic Approach
    • Choose a reference view
    • Use your favorite stereo algorithm BUT
      • replace two-view SSD with SSD over all baselines

  • Limitations
    • Must choose a reference view
    • Visibility: select which frames to match
      [Kang, Szeliski, Chai, CVPR’01]
58
Epipolar-Plane Images [Bolles 87]
  • http://www.graphics.lcs.mit.edu/~aisaksen/projects/drlf/epi/


59
Volumetric Stereo
60
Voxel Coloring
61
 
62
Reconstruction from Silhouettes
63
Volume Intersection
  • Reconstruction Contains the True Scene
    • But is generally not the same
    • In the limit get visual hull
64
Voxel Volume Intersection
  • Color voxel black if in silhouette in every image
    • O(MN3), for M images, N3  voxels
    • Don’t have to search 2N3 possible scenes!
65
Properties of Volume Intersection
  • Pros
    • Easy to implement, fast
    • Accelerated via octrees [Szeliski 1993]

  • Cons
    • No concavities
    • Reconstruction is not photo-consistent
    • Requires identification of silhouettes
66
 
67
Voxel Coloring Approach
68
Depth Ordering:  visit occluders first!
69
Compatible Camera Configurations
70
Calibrated Image Acquisition
  • Calibrated Turntable
  • 360° rotation (21 images)
71
Voxel Coloring Results (Video)
72
 
73
Space Carving Algorithm
  • Space Carving Algorithm
74
Space Carving Algorithm
  • The Basic Algorithm is Unwieldy
    • Complex update procedure


  • Alternative:  Multi-Pass Plane Sweep
    • Efficient, can use texture-mapping hardware
    • Converges quickly in practice
    • Easy to implement


75
Multi-Pass Plane Sweep
    • Sweep plane in each of 6 principle directions
    • Consider cameras on only one side of plane
    • Repeat until convergence
76
Results:  African Violet
77
Results:  Hand
78
Other Approaches
79
Summary
  • Applications
  • Image rectification
  • Matching criteria
  • Local algorithms (aggregation & diffusion)
  • Optimization algorithms
    • energy (cost) formulation & Markov Random Fields
    • mean-field;  dynamic programming; stochastic; graph algorithms
  • Multi-View stereo
    • visibility, occlusion-ordered sweeps
80
Bibliography
  • D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1):7-42, May 2002.
  • R. Szeliski. Stereo algorithms and representations for image-based rendering. In British Machine Vision Conference (BMVC'99), volume 2, pages 314-328, Nottingham, England, September 1999.
  • G. M. Nielson, Scattered Data Modeling, IEEE Computer Graphics and Applications, 13(1), January 1993, pp. 60-70.
  • S. B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense multi-view stereo. In CVPR'2001, vol. I, pages 103-110, December 2001.
  • Y. Boykov, O. Veksler, and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, Unpublished manuscript, 2000.
  • A.F. Bobick and S.S. Intille. Large occlusion stereo. International Journal of Computer Vision, 33(3), September 1999. pp. 181-200
  • D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998
81
Bibliography
  • Volume Intersection
    • Martin & Aggarwal, “Volumetric description of objects from multiple views”, Trans. Pattern Analysis and Machine Intelligence,  5(2), 1991, pp. 150-158.
    • Szeliski, “Rapid Octree Construction from Image Sequences”, Computer Vision, Graphics, and Image Processing: Image Understanding, 58(1), 1993, pp. 23-32.
  • Voxel Coloring and Space Carving
    • Seitz & Dyer, “Photorealistic Scene Reconstruction by Voxel Coloring”, Proc. Computer Vision and Pattern Recognition (CVPR), 1997, pp. 1067-1073.
    • Seitz & Kutulakos, “Plenoptic Image Editing”,  Proc. Int. Conf. on Computer Vision (ICCV), 1998, pp. 17-24.
    • Kutulakos & Seitz, “A Theory of Shape by Space Carving”,  Proc. ICCV, 1998, pp. 307-314.
82
Bibliography
  • Related References
    • Bolles, Baker, and Marimont, “Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion”, International Journal of Computer Vision, vol 1, no 1, 1987, pp. 7-55.
    • Faugeras & Keriven, “Variational principles, surface evolution, PDE's, level set methods and the stereo problem", IEEE Trans. on Image Processing, 7(3), 1998, pp. 336-344.
    • Szeliski & Golland, “Stereo Matching with Transparency and Matting”, Proc. Int. Conf. on Computer Vision (ICCV), 1998, 517-524.
    • Roy & Cox, “A Maximum-Flow Formulation of the N-camera Stereo Correspondence Problem”, Proc. ICCV, 1998, pp. 492-499.
    • Fua & Leclerc, “Object-centered surface reconstruction:  Combining multi-image stereo and shading", International Journal of Computer Vision, 16, 1995, pp. 35-56.
    • Narayanan, Rander, & Kanade, “Constructing Virtual Worlds Using Dense Stereo”, Proc. ICCV, 1998, pp. 3-10.