Reconstruct | ||
Scene geometry | ||
Camera motion |
The SFM Problem | ||
Reconstruct scene geometry and camera motion from two or more images |
Step 1: Track Features | |||
Detect good features | |||
corners, line segments | |||
Find correspondences between frames | |||
Lucas & Kanade-style motion estimation | |||
window-based correlation |
Step 2: Estimate Motion and Structure | ||
Simplified projection model, e.g., [Tomasi 92] | ||
2 or 3 views at a time [Hartley 00] |
Step 3: Refine Estimates | ||
“Bundle adjustment” in photogrammetry |
Step 4: Recover Surfaces | ||
Image-based triangulation [Morris 00, Baillard 99] | ||
Silhouettes [Fitzgibbon 98] | ||
Stereo [Pollefeys 99] |
Problem | ||
Find correspondence between n features in f images | ||
Issues | ||
What’s a feature? | ||
What does it mean to “correspond”? | ||
How can correspondence be reliably computed? |
What’s a good feature? |
Recall Lucas-Kanade equation: |
Correspondence Problem | ||
Given feature patch F in frame H, find best match in frame I |
Feature may change shape over time | ||
Need a distortion model to really make this work |
So far we’ve only considered two frames | ||
Basic extension to f frames | ||
Select features in first frame | ||
Given feature in frame i, compute position/deformation in i+1 | ||
Select more features if needed | ||
i = i + 1 | ||
If i < f, go to step 2 |
Idea | |||
Can get better performance if we know something about the way points move | |||
Most approaches assume constant velocity | |||
or constant acceleration | |||
Use above to predict position in next frame, initialize search |
Kalman Filtering (http://www.cs.unc.edu/~welch/kalman/ ) | ||
Updates feature state and Gaussian uncertainty model | ||
Get better prediction, confidence estimate | ||
CONDENSATION (http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/ISARD1/condensation.html ) | ||
Also known as “particle filtering” | ||
Updates probability distribution over all possible states | ||
Can cope with multiple hypotheses |
Treat tracking problem as a Markov process | |||
Estimate p(xt | zt, xt-1) | |||
prob of being in state xt given measurement zt and previous state xt-1 | |||
Combine Markov assumption with Bayes Rule | |||
Kalman filtering: assume p(x) is a Gaussian
Key | ||
s = x (position) | ||
o = z (sensor) |
Modeling probabilities with samples
Allocate samples according to probability | ||
Higher probability—more samples |
Prediction: | ||
draw new samples from the PDF | ||
use the motion model to move the samples |
Monte Carlo robot localization
Particle Filters [Fox, Dellaert, Thrun and collaborators] |
Training a tracker |
Red: smooth drawing | |
Green: scribble | |
Blue: pause |
The SFM Problem | |||
Reconstruct scene geometry and camera positions from two or more images | |||
Assume | |||
Pixel correspondence | |||
via tracking | |||
Projection model | |||
classic methods are orthographic | |||
newer methods use perspective | |||
practically any model is possible with bundle adjustment | |||
SFM under orthographic projection
Trick | ||
Choose scene origin to be centroid of 3D points | ||
Choose image origins to be centroid of 2D points | ||
Allows us to drop the camera translation: |
Shape by factorization [Tomasi & Kanade, 92]
Shape by factorization [Tomasi & Kanade, 92]
Singular value decomposition (SVD)
SVD decomposes any mxn matrix A as | |||
Properties | |||
Σ is a diagonal matrix containing the eigenvalues of ATA | |||
known as “singular values” of A | |||
diagonal entries are sorted from largest to smallest | |||
columns of U are eigenvectors of AAT | |||
columns of V are eigenvectors of ATA | |||
If A is singular (e.g., has rank 3) | |||
only first 3 singular values are nonzero | |||
we can throw away all but first 3 columns of U and V | |||
Choose M’ = U’, S’ = Σ’V’T |
Shape by factorization [Tomasi & Kanade, 92]
Orthographic Camera | ||
Rows of P are orthonormal: | ||
Weak Perspective Camera | ||
Rows of P are orthogonal: | ||
Enforcing “Metric” Constraints | ||
Compute A such that rows of M have these properties | ||
Independently Moving Objects | |
Perspective Projection | |
Outlier Rejection | |
Subspace Constraints | |
SFM Without Correspondence |
Extending factorization to perspective
Several Recent Approaches | |||
[Christy 96]; [Triggs 96]; [Han 00]; [Mahamud 01] | |||
Initialize with ortho/weak perspective model then iterate | |||
Christy & Horaud | |||
Derive expression for weak perspective as a perspective projection plus a correction term: | |||
Basic procedure: | |||
Run Tomasi-Kanade with weak perspective | |||
Solve for ei (different for each row of M) | |||
Add correction term to W, solve again (until convergence) |
3D → 2D mapping | |||
a function of intrinsics K, extrinsics R & t | |||
measurement affected by noise | |||
Log likelihood of K,R,t given {(ui,vi)} | |||
Minimized via nonlinear least squares regression | |||
called “Bundle Adjustment” | |||
e.g., Levenberg-Marquardt | |||
described in Press et al., Numerical Recipes |
Film industry is a heavy consumer | |||
composite live footage with 3D graphics | |||
known as “match move” | |||
Commercial products | |||
2D3 | |||
http://www.2d3.com/ | |||
RealVis | |||
http://www.realviz.com/ | |||
Show video | |||
Problem | ||
requires good tracked features as input | ||
Can we use SFM to help track points? | ||
basic idea: recall form of Lucas-Kanade equation: | ||
with n points in f frames, we can stack into a big matrix |
C. Baillard & A. Zisserman, “Automatic Reconstruction of Planar Models from Multiple Views”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 99) 1999, pp. 559-565. | ||
S. Christy & R. Horaud, “Euclidean shape and motion from multiple perspective views by affine iterations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):1098-1104, November 1996 (ftp://ftp.inrialpes.fr/pub/movi/publications/rec-affiter-long.ps.gz ) | ||
A.W. Fitzgibbon, G. Cross, & A. Zisserman, “Automatic 3D Model Construction for Turn-Table Sequences”, SMILE Workshop, 1998. | ||
M. Han & T. Kanade, “Creating 3D Models with Uncalibrated Cameras”, Proc. IEEE Computer Society Workshop on the Application of Computer Vision (WACV2000), 2000. | ||
R. Hartley & A. Zisserman, “Multiple View Geometry”, Cambridge Univ. Press, 2000. | ||
R. Hartley, “Euclidean Reconstruction from Uncalibrated Views”, In Applications of Invariance in Computer Vision, Springer-Verlag, 1994, pp. 237-256. | ||
M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, International Journal Computer Vision, 29, 1, 5--28, 1998. (ftp://ftp.robots.ox.ac.uk/pub/ox.papers/VisualDynamics/ijcv98.ps.gz ) | ||
S. Mahamud, M. Hebert, Y. Omori and J. Ponce, “Provably-Convergent Iterative Methods for Projective Structure from Motion”,Proc. Conf. on Computer Vision and Pattern Recognition, (CVPR 01), 2001. (http://www.cs.cmu.edu/~mahamud/cvpr-2001b.pdf ) | ||
D. Morris & T. Kanade, “Image-Consistent Surface Triangulation”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 00), pp. 332-338. | ||
M. Pollefeys, R. Koch & L. Van Gool, “Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters”, Int. J. of Computer Vision, 32(1), 1999, pp. 7-25. | ||
J. Shi and C. Tomasi, “Good Features to Track”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 94), 1994, pp. 593-600 (http://www.cs.washington.edu/education/courses/cse590ss/01wi/notes/good-features.pdf ) | ||
C. Tomasi & T. Kanade, ”Shape and Motion from Image Streams Under Orthography: A Factorization Method", Int. Journal of Computer Vision, 9(2), 1992, pp. 137-154. | ||
B. Triggs, “Factorization methods for projective structure and motion”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 96), 1996, pages 845--51. | ||
M. Irani, “Multi-Frame Optical Flow Estimation Using Subspace Constraints”, IEEE International Conference on Computer Vision (ICCV), 1999 (http://www.wisdom.weizmann.ac.il/~irani/abstracts/flow_iccv99.html ) | ||