Stereo Matching
|
|
|
Computer Vision
CSE576, Spring 2005
Richard Szeliski |
Stereo Matching
|
|
|
Given two or more images of the
same scene or object, compute a representation of its shape |
|
|
|
|
|
|
|
|
|
|
|
What are some possible
applications? |
Face modeling
|
|
|
From one stereo pair to a 3D
head model
[Frederic Deverney, INRIA] |
Z-keying: mix live and
synthetic
|
|
|
Takeo Kanade, CMU (Stereo Machine) |
Virtualized RealityTM
|
|
|
|
[Takeo Kanade et al., CMU] |
|
collect video from 50+ stream |
|
reconstruct 3D model sequences |
|
steerable version used
for
SuperBowl XXV “eye vision” |
|
http://www.cs.cmu.edu/afs/cs/project/VirtualizedR/www/VirtualizedR.html |
View Interpolation
|
|
|
Given two images with
correspondences, morph (warp and cross-dissolve) between them [Chen &
Williams, SIGGRAPH’93] |
|
|
|
input depth image novel view
[Matthies,Szeliski,Kanade’88] |
More view interpolation
|
|
|
Spline-based depth map
input depth image novel view |
|
[Szeliski & Kang ‘95] |
Video view interpolation
Slide 9
View Morphing
|
|
|
Morph between pair of images
using epipolar geometry [Seitz & Dyer, SIGGRAPH’96] |
Additional applications?
|
|
|
Real-time people tracking
(systems from Pt. Gray Research and SRI) |
|
“Gaze” correction for video
conferencing [Ott,Lewis,Cox InterChi’93] |
|
Other ideas? |
Stereo Matching
|
|
|
|
Given two or more images of the
same scene or object, compute a representation of its shape |
|
|
|
What are some possible
representations? |
|
depth maps |
|
volumetric models |
|
3D surface models |
|
planar (or offset) layers |
Stereo Matching
|
|
|
|
|
What are some possible
algorithms? |
|
match “features” and
interpolate |
|
match edges and interpolate |
|
match all pixels with windows
(coarse-fine) |
|
use optimization: |
|
iterative updating |
|
dynamic programming |
|
energy minimization
(regularization, stochastic) |
|
graph algorithms |
Outline (remainder of
lecture)
|
|
|
|
Image rectification |
|
Matching criteria |
|
Local algorithms (aggregation) |
|
iterative updating |
|
Optimization algorithms: |
|
energy (cost) formulation &
Markov Random Fields |
|
mean-field, stochastic, and
graph algorithms |
|
Multi-View stereo &
occlusions |
Stereo: epipolar geometry
|
|
|
Match features along epipolar
lines |
Stereo: epipolar geometry
|
|
|
for two images (or images with
collinear camera centers), can find epipolar lines |
|
epipolar lines are the
projection of the pencil of planes passing through the centers |
|
Rectification: warping the input images (perspective
transformation) so that epipolar lines are horizontal |
Rectification
|
|
|
Project each image onto same
plane, which is parallel to the epipole |
|
Resample lines (and
shear/stretch) to place lines in correspondence, and minimize distortion |
|
[Zhang and Loop, MSR-TR-99-21] |
Rectification
Rectification
Matching criteria
|
|
|
Raw pixel values (correlation) |
|
Band-pass filtered images
[Jones & Malik 92] |
|
“Corner” like features [Zhang,
…] |
|
Edges [many people…] |
|
Gradients [Seitz 89; Scharstein 94] |
|
Rank statistics [Zabih &
Woodfill 94] |
Finding correspondences
|
|
|
apply feature matching
criterion (e.g., correlation or Lucas-Kanade) at all pixels simultaneously |
|
search only over epipolar lines
(many fewer candidate positions) |
Image registration
(revisited)
|
|
|
|
How do we determine
correspondences? |
|
block matching or SSD (sum
squared differences)
d is the disparity (horizontal motion) |
|
How big should the neighborhood
be? |
Neighborhood size
|
|
|
Smaller neighborhood: more
details |
|
Larger neighborhood: fewer isolated mistakes |
|
|
|
w = 3 w = 20 |
Stereo: certainty
modeling
|
|
|
Compute certainty map from
correlations |
|
input
depth map certainty map |
Plane Sweep Stereo
|
|
|
Sweep family of planes through
volume |
Plane Sweep Stereo
|
|
|
|
For each depth plane |
|
compute composite (mosaic)
image — mean |
|
compute error image — variance |
|
convert to confidence and
aggregate spatially |
|
Select winning depth at each
pixel |
Plane sweep stereo
|
|
|
Re-order (pixel / disparity)
evaluation loops
for every pixel, for every disparity
for every disparity for every pixel
compute cost compute cost |
Stereo matching framework
|
|
|
|
For every disparity, compute raw
matching costs
Why use a robust function? |
|
occlusions, other outliers |
|
Can also use alternative match
criteria |
Stereo matching framework
|
|
|
Aggregate costs spatially |
|
Here, we are using a box filter
(efficient moving average
implementation) |
|
Can also use weighted
average,
[non-linear] diffusion… |
Stereo matching framework
|
|
|
Choose winning disparity at
each pixel |
|
Interpolate to sub-pixel
accuracy |
Traditional Stereo
Matching
|
|
|
|
Advantages: |
|
gives detailed surface
estimates |
|
fast algorithms based on moving
averages |
|
sub-pixel disparity estimates
and confidence |
|
Limitations: |
|
narrow baseline Þ noisy estimates |
|
fails in textureless areas |
|
gets confused near occlusion
boundaries |
Feature-based stereo
|
|
|
Match “corner” (interest)
points |
|
Interpolate complete solution |
Data interpolation
|
|
|
Given a sparse set of 3D
points, how do we interpolate to a full 3D surface? |
|
Scattered data interpolation
[Nielson93] |
|
triangulate |
|
put onto a grid and fill (use
pyramid?) |
|
place a kernel function over
each data point |
|
minimize an energy function |
Energy minimization
|
|
|
1-D example: approximating splines |
|
|
Relaxation
|
|
|
Iteratively improve a solution
by locally minimizing the energy: relax to solution |
|
|
|
|
|
|
|
|
|
|
|
Earliest application: WWII
numerical simulations |
Relaxation
|
|
|
How can we get the best
solution? |
|
Differentiate energy function,
set to 0 |
Dynamic programming
|
|
|
Evaluate best cumulative cost
at each pixel |
|
|
Dynamic programming
Dynamic programming
|
|
|
Disparity space image and min.
cost path |
Dynamic programming
|
|
|
Sample result
(note horizontal
streaks) |
|
[Intille & Bobick] |
Dynamic programming
|
|
|
Can we apply this trick in 2D
as well? |
Graph cuts
|
|
|
Solution technique for general
2D problem |
Graph cuts
|
|
|
a-b swap |
|
expansion |
|
modify smoothness penalty based
on edges |
|
compute best possible match
within integer disparity |
|
|
Graph cuts
|
|
|
Two different kinds of moves: |
Bayesian inference
|
|
|
Formulate as statistical
inference problem |
|
Prior model pP(d) |
|
Measurement model pM(IL,
IR| d) |
|
Posterior model |
|
pM(d | IL,
IR) µ pP(d) pM(IL,
IR| d) |
|
Maximum a Posteriori (MAP
estimate): |
|
maximize pM(d | IL,
IR) |
|
|
Markov Random Field
|
|
|
Probability distribution on
disparity field d(x,y) |
|
|
|
|
|
|
|
|
|
|
|
|
|
Enforces smoothness or coherence
on field |
Measurement model
|
|
|
Likelihood of intensity
correspondence |
|
|
|
|
|
|
|
|
|
Corresponds to Gaussian noise
for quadratic r |
MAP estimate
|
|
|
Maximize posterior likelihood |
|
|
|
|
|
|
|
|
|
|
|
Equivalent to regularization
(energy minimization with smoothness constraints) |
Why Bayesian estimation?
|
|
|
|
Principled way of determining
cost function |
|
Explicit model of noise and
prior knowledge |
|
Admits a wider variety of
optimization algorithms: |
|
gradient descent (local
minimization) |
|
stochastic optimization (Gibbs
Sampler) |
|
mean-field optimization |
|
graph theoretic (actually
deterministic) [Zabih] |
|
[loopy] belief propagation |
|
large stochastic flips
[Swendsen-Wang] |
Depth Map Results
|
|
|
|
|
|
|
Input image Sum Abs Diff |
|
|
|
|
|
|
|
|
|
|
|
Mean field Graph cuts |
Traditional stereo
|
|
|
|
Advantages: |
|
works very well in non-occluded
regions |
|
Disadvantages: |
|
restricted to two images (not) |
|
gets confused in occluded
regions |
|
can’t handle mixed pixels |
Multi-View Stereo
|
|
|
…rest of this material
not
covered in this lecture… |
Stereo Reconstruction
|
|
|
|
Steps |
|
Calibrate cameras |
|
Rectify images |
|
Compute disparity |
|
Estimate depth |
Choosing the Baseline
|
|
|
|
What’s the optimal baseline? |
|
Too small: large depth error |
|
Too large: difficult search problem |
Effect of Baseline
on Estimation
Slide 56
Multibaseline Stereo
|
|
|
|
|
Basic Approach |
|
Choose a reference view |
|
Use your favorite stereo
algorithm BUT |
|
replace two-view SSD with SSD
over all baselines |
|
|
|
Limitations |
|
Must choose a reference view |
|
Visibility: select which frames
to match
[Kang, Szeliski, Chai, CVPR’01] |
Epipolar-Plane Images [Bolles
87]
|
|
|
http://www.graphics.lcs.mit.edu/~aisaksen/projects/drlf/epi/ |
|
|
Volumetric Stereo
Voxel Coloring
Slide 61
Reconstruction from
Silhouettes
Volume Intersection
|
|
|
|
Reconstruction Contains the
True Scene |
|
But is generally not the same |
|
In the limit get visual hull |
Voxel Volume Intersection
|
|
|
|
Color voxel black if in
silhouette in every image |
|
O(MN3), for M
images, N3 voxels |
|
Don’t have to search 2N3
possible scenes! |
Properties of Volume
Intersection
|
|
|
|
Pros |
|
Easy to implement, fast |
|
Accelerated via octrees
[Szeliski 1993] |
|
|
|
Cons |
|
No concavities |
|
Reconstruction is not
photo-consistent |
|
Requires identification of
silhouettes |
Slide 66
Voxel Coloring Approach
Depth Ordering: visit occluders first!
Compatible Camera
Configurations
Calibrated Image
Acquisition
|
|
|
Calibrated Turntable |
|
360° rotation (21 images) |
Voxel Coloring Results
(Video)
Slide 72
Space Carving Algorithm
Space Carving Algorithm
|
|
|
|
The Basic Algorithm is Unwieldy |
|
Complex update procedure |
|
|
|
Alternative: Multi-Pass Plane Sweep |
|
Efficient, can use
texture-mapping hardware |
|
Converges quickly in practice |
|
Easy to implement |
|
|
Multi-Pass Plane Sweep
|
|
|
|
Sweep plane in each of 6
principle directions |
|
Consider cameras on only one
side of plane |
|
Repeat until convergence |
Results: African Violet
Results: Hand
Other Approaches
Summary
|
|
|
|
Applications |
|
Image rectification |
|
Matching criteria |
|
Local algorithms (aggregation
& diffusion) |
|
Optimization algorithms |
|
energy (cost) formulation &
Markov Random Fields |
|
mean-field; dynamic programming; stochastic; graph
algorithms |
|
Multi-View stereo |
|
visibility, occlusion-ordered
sweeps |
Bibliography
|
|
|
D. Scharstein and R. Szeliski.
A taxonomy and evaluation of dense two-frame stereo correspondence
algorithms. International Journal of Computer Vision, 47(1):7-42, May 2002. |
|
R. Szeliski. Stereo algorithms
and representations for image-based rendering. In British Machine Vision
Conference (BMVC'99), volume 2, pages 314-328, Nottingham, England, September
1999. |
|
G. M. Nielson, Scattered Data
Modeling, IEEE Computer Graphics and Applications, 13(1), January 1993, pp.
60-70. |
|
S. B. Kang, R. Szeliski, and J.
Chai. Handling occlusions in dense multi-view stereo. In CVPR'2001, vol. I,
pages 103-110, December 2001. |
|
Y. Boykov, O. Veksler, and
Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, Unpublished
manuscript, 2000. |
|
A.F. Bobick and S.S. Intille.
Large occlusion stereo. International Journal of Computer Vision, 33(3),
September 1999. pp. 181-200 |
|
D. Scharstein and R. Szeliski.
Stereo matching with nonlinear diffusion. International Journal of Computer
Vision, 28(2):155-174, July 1998 |
Bibliography
|
|
|
|
Volume Intersection |
|
Martin & Aggarwal,
“Volumetric description of objects from multiple views”, Trans. Pattern
Analysis and Machine Intelligence,
5(2), 1991, pp. 150-158. |
|
Szeliski, “Rapid Octree
Construction from Image Sequences”, Computer Vision, Graphics, and Image
Processing: Image Understanding, 58(1), 1993, pp. 23-32. |
|
Voxel Coloring and Space
Carving |
|
Seitz & Dyer,
“Photorealistic Scene Reconstruction by Voxel Coloring”, Proc. Computer
Vision and Pattern Recognition (CVPR), 1997, pp. 1067-1073. |
|
Seitz & Kutulakos,
“Plenoptic Image Editing”, Proc. Int.
Conf. on Computer Vision (ICCV), 1998, pp. 17-24. |
|
Kutulakos & Seitz, “A
Theory of Shape by Space Carving”,
Proc. ICCV, 1998, pp. 307-314. |
Bibliography
|
|
|
|
Related References |
|
Bolles, Baker, and Marimont, “Epipolar-Plane
Image Analysis: An Approach to Determining Structure from Motion”,
International Journal of Computer Vision, vol 1, no 1, 1987, pp. 7-55. |
|
Faugeras & Keriven,
“Variational principles, surface evolution, PDE's, level set methods and the
stereo problem", IEEE Trans. on Image Processing, 7(3), 1998, pp.
336-344. |
|
Szeliski & Golland, “Stereo
Matching with Transparency and Matting”, Proc. Int. Conf. on Computer Vision
(ICCV), 1998, 517-524. |
|
Roy & Cox, “A Maximum-Flow
Formulation of the N-camera Stereo Correspondence Problem”, Proc. ICCV, 1998,
pp. 492-499. |
|
Fua & Leclerc,
“Object-centered surface reconstruction:
Combining multi-image stereo and shading", International Journal
of Computer Vision, 16, 1995, pp. 35-56. |
|
Narayanan, Rander, &
Kanade, “Constructing Virtual Worlds Using Dense Stereo”, Proc. ICCV, 1998,
pp. 3-10. |