|
1
|
- Computer Vision
CSE576, Spring 2005
Richard Szeliski
|
|
2
|
- Given two or more images of the same scene or object, compute a
representation of its shape
- What are some possible applications?
|
|
3
|
- From one stereo pair to a 3D head model
[Frederic Deverney, INRIA]
|
|
4
|
- Takeo Kanade, CMU (Stereo Machine)
|
|
5
|
- [Takeo Kanade et al., CMU]
- collect video from 50+ stream
- reconstruct 3D model sequences
- steerable version used for
SuperBowl XXV “eye vision”
- http://www.cs.cmu.edu/afs/cs/project/VirtualizedR/www/VirtualizedR.html
|
|
6
|
- Given two images with correspondences, morph (warp and cross-dissolve)
between them [Chen & Williams, SIGGRAPH’93]
input depth image novel view
[Matthies,Szeliski,Kanade’88]
|
|
7
|
- Spline-based depth map
input depth image novel view
- [Szeliski & Kang ‘95]
|
|
8
|
|
|
9
|
|
|
10
|
- Morph between pair of images using epipolar geometry [Seitz & Dyer,
SIGGRAPH’96]
|
|
11
|
- Real-time people tracking (systems from Pt. Gray Research and SRI)
- “Gaze” correction for video conferencing [Ott,Lewis,Cox InterChi’93]
- Other ideas?
|
|
12
|
- Given two or more images of the same scene or object, compute a
representation of its shape
- What are some possible representations?
- depth maps
- volumetric models
- 3D surface models
- planar (or offset) layers
|
|
13
|
- What are some possible algorithms?
- match “features” and interpolate
- match edges and interpolate
- match all pixels with windows (coarse-fine)
- use optimization:
- iterative updating
- dynamic programming
- energy minimization (regularization, stochastic)
- graph algorithms
|
|
14
|
- Image rectification
- Matching criteria
- Local algorithms (aggregation)
- Optimization algorithms:
- energy (cost) formulation & Markov Random Fields
- mean-field, stochastic, and graph algorithms
- Multi-View stereo & occlusions
|
|
15
|
- Match features along epipolar lines
|
|
16
|
- for two images (or images with collinear camera centers), can find
epipolar lines
- epipolar lines are the projection of the pencil of planes passing
through the centers
- Rectification: warping the input
images (perspective transformation) so that epipolar lines are
horizontal
|
|
17
|
- Project each image onto same plane, which is parallel to the epipole
- Resample lines (and shear/stretch) to place lines in correspondence, and
minimize distortion
- [Zhang and Loop, MSR-TR-99-21]
|
|
18
|
|
|
19
|
|
|
20
|
- Raw pixel values (correlation)
- Band-pass filtered images [Jones & Malik 92]
- “Corner” like features [Zhang, …]
- Edges [many people…]
- Gradients [Seitz 89; Scharstein
94]
- Rank statistics [Zabih & Woodfill 94]
|
|
21
|
- apply feature matching criterion (e.g., correlation or Lucas-Kanade) at all
pixels simultaneously
- search only over epipolar lines (many fewer candidate positions)
|
|
22
|
- How do we determine correspondences?
- block matching or SSD (sum squared differences)
d is the disparity (horizontal motion)
- How big should the neighborhood be?
|
|
23
|
- Smaller neighborhood: more details
- Larger neighborhood: fewer
isolated mistakes
- w = 3 w = 20
|
|
24
|
- Compute certainty map from correlations
- input depth map certainty map
|
|
25
|
- Sweep family of planes through volume
|
|
26
|
- For each depth plane
- compute composite (mosaic) image — mean
- compute error image — variance
- convert to confidence and aggregate spatially
- Select winning depth at each pixel
|
|
27
|
- Re-order (pixel / disparity) evaluation loops
for every pixel, for every disparity
for every disparity for every pixel
compute cost compute cost
|
|
28
|
- For every disparity, compute raw matching costs
Why use a robust function?
- occlusions, other outliers
- Can also use alternative match criteria
|
|
29
|
- Aggregate costs spatially
- Here, we are using a box filter
(efficient moving average
implementation)
- Can also use weighted average,
[non-linear] diffusion…
|
|
30
|
- Choose winning disparity at each pixel
- Interpolate to sub-pixel accuracy
|
|
31
|
- Advantages:
- gives detailed surface estimates
- fast algorithms based on moving averages
- sub-pixel disparity estimates and confidence
- Limitations:
- narrow baseline Þ noisy
estimates
- fails in textureless areas
- gets confused near occlusion boundaries
|
|
32
|
- Match “corner” (interest) points
- Interpolate complete solution
|
|
33
|
- Given a sparse set of 3D points, how do we interpolate to a full 3D
surface?
- Scattered data interpolation [Nielson93]
- triangulate
- put onto a grid and fill (use pyramid?)
- place a kernel function over each data point
- minimize an energy function
|
|
34
|
- 1-D example: approximating
splines
|
|
35
|
- Iteratively improve a solution by locally minimizing the energy: relax
to solution
- Earliest application: WWII numerical simulations
|
|
36
|
- How can we get the best solution?
- Differentiate energy function, set to 0
|
|
37
|
- Evaluate best cumulative cost at each pixel
|
|
38
|
|
|
39
|
- Disparity space image and min. cost path
|
|
40
|
- Sample result
(note horizontal
streaks)
- [Intille & Bobick]
|
|
41
|
- Can we apply this trick in 2D as well?
|
|
42
|
- Solution technique for general 2D problem
|
|
43
|
- a-b swap
- expansion
- modify smoothness penalty based on edges
- compute best possible match within integer disparity
|
|
44
|
- Two different kinds of moves:
|
|
45
|
- Formulate as statistical inference problem
- Prior model pP(d)
- Measurement model pM(IL, IR| d)
- Posterior model
- pM(d | IL, IR) µ pP(d) pM(IL, IR|
d)
- Maximum a Posteriori (MAP estimate):
- maximize pM(d | IL, IR)
|
|
46
|
- Probability distribution on disparity field d(x,y)
- Enforces smoothness or coherence on field
|
|
47
|
- Likelihood of intensity correspondence
- Corresponds to Gaussian noise for quadratic r
|
|
48
|
- Maximize posterior likelihood
- Equivalent to regularization (energy minimization with smoothness
constraints)
|
|
49
|
- Principled way of determining cost function
- Explicit model of noise and prior knowledge
- Admits a wider variety of optimization algorithms:
- gradient descent (local minimization)
- stochastic optimization (Gibbs Sampler)
- mean-field optimization
- graph theoretic (actually deterministic) [Zabih]
- [loopy] belief propagation
- large stochastic flips [Swendsen-Wang]
|
|
50
|
- Input image Sum Abs Diff
- Mean field Graph cuts
|
|
51
|
- Advantages:
- works very well in non-occluded regions
- Disadvantages:
- restricted to two images (not)
- gets confused in occluded regions
- can’t handle mixed pixels
|
|
52
|
- …rest of this material not
covered in this lecture…
|
|
53
|
- Steps
- Calibrate cameras
- Rectify images
- Compute disparity
- Estimate depth
|
|
54
|
- What’s the optimal baseline?
- Too small: large depth error
- Too large: difficult search
problem
|
|
55
|
|
|
56
|
|
|
57
|
- Basic Approach
- Choose a reference view
- Use your favorite stereo algorithm BUT
- replace two-view SSD with SSD over all baselines
- Limitations
- Must choose a reference view
- Visibility: select which frames to match
[Kang, Szeliski, Chai, CVPR’01]
|
|
58
|
- http://www.graphics.lcs.mit.edu/~aisaksen/projects/drlf/epi/
|
|
59
|
|
|
60
|
|
|
61
|
|
|
62
|
|
|
63
|
- Reconstruction Contains the True Scene
- But is generally not the same
- In the limit get visual hull
|
|
64
|
- Color voxel black if in silhouette in every image
- O(MN3), for M images, N3 voxels
- Don’t have to search 2N3 possible scenes!
|
|
65
|
- Pros
- Easy to implement, fast
- Accelerated via octrees [Szeliski 1993]
- Cons
- No concavities
- Reconstruction is not photo-consistent
- Requires identification of silhouettes
|
|
66
|
|
|
67
|
|
|
68
|
|
|
69
|
|
|
70
|
- Calibrated Turntable
- 360° rotation (21 images)
|
|
71
|
|
|
72
|
|
|
73
|
|
|
74
|
- The Basic Algorithm is Unwieldy
- Alternative: Multi-Pass Plane
Sweep
- Efficient, can use texture-mapping hardware
- Converges quickly in practice
- Easy to implement
|
|
75
|
- Sweep plane in each of 6 principle directions
- Consider cameras on only one side of plane
- Repeat until convergence
|
|
76
|
|
|
77
|
|
|
78
|
|
|
79
|
- Applications
- Image rectification
- Matching criteria
- Local algorithms (aggregation & diffusion)
- Optimization algorithms
- energy (cost) formulation & Markov Random Fields
- mean-field; dynamic programming;
stochastic; graph algorithms
- Multi-View stereo
- visibility, occlusion-ordered sweeps
|
|
80
|
- D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms. International Journal of
Computer Vision, 47(1):7-42, May 2002.
- R. Szeliski. Stereo algorithms and representations for image-based
rendering. In British Machine Vision Conference (BMVC'99), volume 2,
pages 314-328, Nottingham, England, September 1999.
- G. M. Nielson, Scattered Data Modeling, IEEE Computer Graphics and
Applications, 13(1), January 1993, pp. 60-70.
- S. B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense
multi-view stereo. In CVPR'2001, vol. I, pages 103-110, December 2001.
- Y. Boykov, O. Veksler, and Ramin Zabih, Fast Approximate Energy
Minimization via Graph Cuts, Unpublished manuscript, 2000.
- A.F. Bobick and S.S. Intille. Large occlusion stereo. International
Journal of Computer Vision, 33(3), September 1999. pp. 181-200
- D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion.
International Journal of Computer Vision, 28(2):155-174, July 1998
|
|
81
|
- Volume Intersection
- Martin & Aggarwal, “Volumetric description of objects from multiple
views”, Trans. Pattern Analysis and Machine Intelligence, 5(2), 1991, pp. 150-158.
- Szeliski, “Rapid Octree Construction from Image Sequences”, Computer
Vision, Graphics, and Image Processing: Image Understanding, 58(1),
1993, pp. 23-32.
- Voxel Coloring and Space Carving
- Seitz & Dyer, “Photorealistic Scene Reconstruction by Voxel
Coloring”, Proc. Computer Vision and Pattern Recognition (CVPR), 1997,
pp. 1067-1073.
- Seitz & Kutulakos, “Plenoptic Image Editing”, Proc. Int. Conf. on Computer Vision
(ICCV), 1998, pp. 17-24.
- Kutulakos & Seitz, “A Theory of Shape by Space Carving”, Proc. ICCV, 1998, pp. 307-314.
|
|
82
|
- Related References
- Bolles, Baker, and Marimont, “Epipolar-Plane Image Analysis: An
Approach to Determining Structure from Motion”, International Journal
of Computer Vision, vol 1, no 1, 1987, pp. 7-55.
- Faugeras & Keriven, “Variational principles, surface evolution,
PDE's, level set methods and the stereo problem", IEEE Trans. on
Image Processing, 7(3), 1998, pp. 336-344.
- Szeliski & Golland, “Stereo Matching with Transparency and
Matting”, Proc. Int. Conf. on Computer Vision (ICCV), 1998, 517-524.
- Roy & Cox, “A Maximum-Flow Formulation of the N-camera Stereo
Correspondence Problem”, Proc. ICCV, 1998, pp. 492-499.
- Fua & Leclerc, “Object-centered surface reconstruction: Combining multi-image stereo and
shading", International Journal of Computer Vision, 16, 1995, pp.
35-56.
- Narayanan, Rander, & Kanade, “Constructing Virtual Worlds Using
Dense Stereo”, Proc. ICCV, 1998, pp. 3-10.
|