Structure from motion

Reconstruct

Scene geometry

Camera motion

Structure from motion

The SFM Problem

Reconstruct scene geometry and camera motion from two or more images

Structure from motion

Step 1: Track Features

Detect good features

corners, line segments

Find correspondences between frames

Lucas & Kanade-style motion estimation

window-based correlation

Structure from motion

Step 2: Estimate Motion and Structure

Simplified projection model, e.g., [Tomasi 92]

2 or 3 views at a time [Hartley 00]

Structure from motion

Step 3: Refine Estimates

“Bundle adjustment” in photogrammetry

Structure from motion

Step 4: Recover Surfaces

Image-based triangulation [Morris 00, Baillard 99]

Silhouettes [Fitzgibbon 98]

Stereo [Pollefeys 99]

Feature tracking

Problem

Find correspondence between n features in f images

Issues

What’s a feature?

What does it mean to “correspond”?

How can correspondence be reliably computed?

Feature detection

What’s a good feature?

Good features to track

Recall Lucas-Kanade equation:

Feature correspondence

Correspondence Problem

Given feature patch F in frame H, find best match in frame I

Feature distortion

Feature may change shape over time

Need a distortion model to really make this work

Tracking over many frames

So far we’ve only considered two frames

Basic extension to f frames

Select features in first frame

Given feature in frame i, compute position/deformation in i+1

Select more features if needed

i = i + 1

If i < f, go to step 2

Incorporating dynamics

Idea

Can get better performance if we know something about the way points move

Most approaches assume constant velocity

or constant acceleration

Use above to predict position in next frame, initialize search

Modeling uncertainty

Kalman Filtering (http://www.cs.unc.edu/~welch/kalman/ )

Updates feature state and Gaussian uncertainty model

Get better prediction, confidence estimate

CONDENSATION (http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/ISARD1/condensation.html )

Also known as “particle filtering”

Updates probability distribution over all possible states

Can cope with multiple hypotheses

Probabilistic Tracking

Treat tracking problem as a Markov process

Estimate p(x_t | z_t, x_t-1)

prob of being in state x_t given measurement z_t and previous state x_t-1

Combine Markov assumption with Bayes Rule

Kalman filtering: assume p(x) is a Gaussian

Key

s = x (position)

o = z (sensor)

Modeling probabilities with samples

Allocate samples according to probability

Higher probability—more samples

CONDENSATION [Isard & Blake]

CONDENSATION [Isard & Blake]

Prediction:

draw new samples from the PDF

use the motion model to move the samples

CONDENSATION [Isard & Blake]

Monte Carlo robot localization

Particle Filters [Fox, Dellaert, Thrun and collaborators]

CONDENSATION Contour Tracking

Training a tracker

CONDENSATION Contour Tracking

Red: smooth drawing

Green: scribble

Blue: pause

Structure from motion

The SFM Problem

Reconstruct scene geometry and camera positions from two or more images

Assume

Pixel correspondence

via tracking

Projection model

classic methods are orthographic

newer methods use perspective

practically any model is possible with bundle adjustment

SFM under orthographic projection

Trick

Choose scene origin to be centroid of 3D points

Choose image origins to be centroid of 2D points

Allows us to drop the camera translation:

Shape by factorization [Tomasi & Kanade, 92]

Singular value decomposition (SVD)

SVD decomposes any mxn matrix A as

Properties

Σ is a diagonal matrix containing the eigenvalues of A^TA

known as “singular values” of A

diagonal entries are sorted from largest to smallest

columns of U are eigenvectors of AA^T

columns of V are eigenvectors of A^TA

If A is singular (e.g., has rank 3)

only first 3 singular values are nonzero

we can throw away all but first 3 columns of U and V

Choose M’ = U’, S’ = Σ’V’^T

Shape by factorization [Tomasi & Kanade, 92]

Metric constraints

Orthographic Camera

Rows of P are orthonormal:

Weak Perspective Camera

Rows of P are orthogonal:

Enforcing “Metric” Constraints

Compute A such that rows of M have these properties

Factorization with noisy data

Many extensions

Independently Moving Objects

Perspective Projection

Outlier Rejection

Subspace Constraints

SFM Without Correspondence

Extending factorization to perspective

Several Recent Approaches

[Christy 96]; [Triggs 96]; [Han 00]; [Mahamud 01]

Initialize with ortho/weak perspective model then iterate

Christy & Horaud

Derive expression for weak perspective as a perspective projection plus a correction term:

Basic procedure:

Run Tomasi-Kanade with weak perspective

Solve for e_i(different for each row of M)

Add correction term to W, solve again (until convergence)

Bundle adjustment

3D → 2D mapping

a function of intrinsics K, extrinsics R & t

measurement affected by noise

Log likelihood of K,R,t given {(u_i,v_i)}

Minimized via nonlinear least squares regression

called “Bundle Adjustment”

e.g., Levenberg-Marquardt

described in Press et al., Numerical Recipes

Match Move

Film industry is a heavy consumer

composite live footage with 3D graphics

known as “match move”

Commercial products

2D3

http://www.2d3.com/

RealVis

http://www.realviz.com/

Show video

Closing the loop

Problem

requires good tracked features as input

Can we use SFM to help track points?

basic idea: recall form of Lucas-Kanade equation:

with n points in f frames, we can stack into a big matrix

Slide 37

References

C. Baillard & A. Zisserman, “Automatic Reconstruction of Planar Models from Multiple Views”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 99) 1999, pp. 559-565.

S. Christy & R. Horaud, “Euclidean shape and motion from multiple perspective views by affine iterations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):1098-1104, November 1996 (ftp://ftp.inrialpes.fr/pub/movi/publications/rec-affiter-long.ps.gz )

A.W. Fitzgibbon, G. Cross, & A. Zisserman, “Automatic 3D Model Construction for Turn-Table Sequences”, SMILE Workshop, 1998.

M. Han & T. Kanade, “Creating 3D Models with Uncalibrated Cameras”, Proc. IEEE Computer Society Workshop on the Application of Computer Vision (WACV2000), 2000.

R. Hartley & A. Zisserman, “Multiple View Geometry”, Cambridge Univ. Press, 2000.

R. Hartley, “Euclidean Reconstruction from Uncalibrated Views”, In Applications of Invariance in Computer Vision, Springer-Verlag, 1994, pp. 237-256.

M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, International Journal Computer Vision, 29, 1, 5--28, 1998. (ftp://ftp.robots.ox.ac.uk/pub/ox.papers/VisualDynamics/ijcv98.ps.gz )

S. Mahamud, M. Hebert, Y. Omori and J. Ponce, “Provably-Convergent Iterative Methods for Projective Structure from Motion”,Proc. Conf. on Computer Vision and Pattern Recognition, (CVPR 01), 2001. (http://www.cs.cmu.edu/~mahamud/cvpr-2001b.pdf )

D. Morris & T. Kanade, “Image-Consistent Surface Triangulation”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 00), pp. 332-338.

M. Pollefeys, R. Koch & L. Van Gool, “Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters”, Int. J. of Computer Vision, 32(1), 1999, pp. 7-25.

J. Shi and C. Tomasi, “Good Features to Track”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 94), 1994, pp. 593-600 (http://www.cs.washington.edu/education/courses/cse590ss/01wi/notes/good-features.pdf )

C. Tomasi & T. Kanade, ”Shape and Motion from Image Streams Under Orthography: A Factorization Method", Int. Journal of Computer Vision, 9(2), 1992, pp. 137-154.

B. Triggs, “Factorization methods for projective structure and motion”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 96), 1996, pages 845--51.

M. Irani, “Multi-Frame Optical Flow Estimation Using Subspace Constraints”, IEEE International Conference on Computer Vision (ICCV), 1999 (http://www.wisdom.weizmann.ac.il/~irani/abstracts/flow_iccv99.html )


	The SFM Problem
		Reconstruct scene geometry and camera motion from two or more images


Step 1: Track Features
	Detect good features
		corners, line segments
	Find correspondences between frames
		Lucas & Kanade-style motion estimation
		window-based correlation


	Step 2: Estimate Motion and Structure
		Simplified projection model, e.g., [Tomasi 92]
		2 or 3 views at a time [Hartley 00]


	Step 3: Refine Estimates
		“Bundle adjustment” in photogrammetry


	Step 4: Recover Surfaces
		Image-based triangulation [Morris 00, Baillard 99]
		Silhouettes [Fitzgibbon 98]
		Stereo [Pollefeys 99]


	Problem
		Find correspondence between n features in f images

	Issues
		What’s a feature?
		What does it mean to “correspond”?
		How can correspondence be reliably computed?


	Correspondence Problem
		Given feature patch F in frame H, find best match in frame I


	Feature may change shape over time
		Need a distortion model to really make this work


	So far we’ve only considered two frames
	Basic extension to f frames
		Select features in first frame
		Given feature in frame i, compute position/deformation in i+1
		Select more features if needed
		i = i + 1
		If i < f, go to step 2


Idea
	Can get better performance if we know something about the way points move
	Most approaches assume constant velocity



		or constant acceleration



	Use above to predict position in next frame, initialize search


	Kalman Filtering (http://www.cs.unc.edu/~welch/kalman/ )
		Updates feature state and Gaussian uncertainty model
		Get better prediction, confidence estimate

	CONDENSATION (http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/ISARD1/condensation.html )
		Also known as “particle filtering”
		Updates probability distribution over all possible states
		Can cope with multiple hypotheses


Treat tracking problem as a Markov process
	Estimate p(x_t \| z_t, x_t-1)
		prob of being in state x_t given measurement z_t and previous state x_t-1
	Combine Markov assumption with Bayes Rule


	Allocate samples according to probability
		Higher probability—more samples


	Prediction:
		draw new samples from the PDF
		use the motion model to move the samples


The SFM Problem
	Reconstruct scene geometry and camera positions from two or more images

Assume
	Pixel correspondence
		via tracking
	Projection model
		classic methods are orthographic
		newer methods use perspective
		practically any model is possible with bundle adjustment


	Trick
		Choose scene origin to be centroid of 3D points
		Choose image origins to be centroid of 2D points
		Allows us to drop the camera translation:


SVD decomposes any mxn matrix A as


Properties
	Σ is a diagonal matrix containing the eigenvalues of A^TA
		known as “singular values” of A
		diagonal entries are sorted from largest to smallest
	columns of U are eigenvectors of AA^T
	columns of V are eigenvectors of A^TA
If A is singular (e.g., has rank 3)
	only first 3 singular values are nonzero
	we can throw away all but first 3 columns of U and V


	Choose M’ = U’, S’ = Σ’V’^T


	Orthographic Camera
		Rows of P are orthonormal:
	Weak Perspective Camera
		Rows of P are orthogonal:
	Enforcing “Metric” Constraints
		Compute A such that rows of M have these properties


	Independently Moving Objects
	Perspective Projection
	Outlier Rejection
	Subspace Constraints
	SFM Without Correspondence


Several Recent Approaches
	[Christy 96]; [Triggs 96]; [Han 00]; [Mahamud 01]
	Initialize with ortho/weak perspective model then iterate
Christy & Horaud
	Derive expression for weak perspective as a perspective projection plus a correction term:






	Basic procedure:
		Run Tomasi-Kanade with weak perspective
		Solve for e_i(different for each row of M)
		Add correction term to W, solve again (until convergence)


3D → 2D mapping
	a function of intrinsics K, extrinsics R & t
	measurement affected by noise


Log likelihood of K,R,t given {(u_i,v_i)}
Minimized via nonlinear least squares regression
	called “Bundle Adjustment”
	e.g., Levenberg-Marquardt
		described in Press et al., Numerical Recipes


Film industry is a heavy consumer
	composite live footage with 3D graphics
	known as “match move”

Commercial products
	2D3
		http://www.2d3.com/
	RealVis
		http://www.realviz.com/

Show video


	Problem
		requires good tracked features as input
	Can we use SFM to help track points?
		basic idea: recall form of Lucas-Kanade equation:


		with n points in f frames, we can stack into a big matrix


		C. Baillard & A. Zisserman, “Automatic Reconstruction of Planar Models from Multiple Views”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 99) 1999, pp. 559-565.
		S. Christy & R. Horaud, “Euclidean shape and motion from multiple perspective views by affine iterations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10):1098-1104, November 1996 (ftp://ftp.inrialpes.fr/pub/movi/publications/rec-affiter-long.ps.gz )
		A.W. Fitzgibbon, G. Cross, & A. Zisserman, “Automatic 3D Model Construction for Turn-Table Sequences”, SMILE Workshop, 1998.
		M. Han & T. Kanade, “Creating 3D Models with Uncalibrated Cameras”, Proc. IEEE Computer Society Workshop on the Application of Computer Vision (WACV2000), 2000.
		R. Hartley & A. Zisserman, “Multiple View Geometry”, Cambridge Univ. Press, 2000.
		R. Hartley, “Euclidean Reconstruction from Uncalibrated Views”, In Applications of Invariance in Computer Vision, Springer-Verlag, 1994, pp. 237-256.
		M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, International Journal Computer Vision, 29, 1, 5--28, 1998. (ftp://ftp.robots.ox.ac.uk/pub/ox.papers/VisualDynamics/ijcv98.ps.gz )
		S. Mahamud, M. Hebert, Y. Omori and J. Ponce, “Provably-Convergent Iterative Methods for Projective Structure from Motion”,Proc. Conf. on Computer Vision and Pattern Recognition, (CVPR 01), 2001. (http://www.cs.cmu.edu/~mahamud/cvpr-2001b.pdf )
		D. Morris & T. Kanade, “Image-Consistent Surface Triangulation”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 00), pp. 332-338.
		M. Pollefeys, R. Koch & L. Van Gool, “Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters”, Int. J. of Computer Vision, 32(1), 1999, pp. 7-25.
		J. Shi and C. Tomasi, “Good Features to Track”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 94), 1994, pp. 593-600 (http://www.cs.washington.edu/education/courses/cse590ss/01wi/notes/good-features.pdf )
		C. Tomasi & T. Kanade, ”Shape and Motion from Image Streams Under Orthography: A Factorization Method", Int. Journal of Computer Vision, 9(2), 1992, pp. 137-154.
		B. Triggs, “Factorization methods for projective structure and motion”, Proc. Computer Vision and Pattern Recognition Conf. (CVPR 96), 1996, pages 845--51.
		M. Irani, “Multi-Frame Optical Flow Estimation Using Subspace Constraints”, IEEE International Conference on Computer Vision (ICCV), 1999 (http://www.wisdom.weizmann.ac.il/~irani/abstracts/flow_iccv99.html )