Global Alignment
and
Structure from Motion
|
|
|
Computer Vision
CSE576, Spring 2005
Richard Szeliski |
Today’s lecture
|
|
|
|
Rotational alignment (“3D
stitching”) [Project 3] |
|
pairwise alignment (Procrustes) |
|
global alignment (linearized
least squares) |
|
|
|
Calibration |
|
camera matrix (Direct Linear
Transform) |
|
non-linear least squares |
|
separating intrinsics and
extrinsics |
|
focal length and optic center |
Today’s lecture
|
|
|
Structure from Motion |
|
triangulation and pose |
|
two-frame methods |
|
factorization |
|
bundle adjustment |
|
robust statistics |
Global rotational
alignment
|
|
|
Fully Automated Panoramic
Stitching |
|
[Project 3] |
AutoStitch [Brown &
Lowe’03]
|
|
|
|
|
Stitch panoramic image from an arbitrary
collection of photographs (known focal length) |
|
Extract and (pairwise) match
features |
|
Estimate pairwise rotations
using RANSAC |
|
Add to stitch and re-run global
alignment |
|
Warp images to sphere and blend |
3D Rotation Model
|
|
|
Projection equations |
|
Project from image to 3D ray |
|
(x0,y0,z0)
= (u0-uc,v0-vc,f) |
|
Rotate the ray by camera motion |
|
(x1,y1,z1)
= R01 (x0,y0,z0) |
|
Project back into new (source)
image |
|
(u1,v1) =
(fx1/z1+uc,fy1/z1+vc) |
|
|
Pairwise alignment
|
|
|
Absolute orientation [Arun et
al., PAMI 1987] [Horn et al., JOSA A 1988], Procrustes Algorithm [Golub &
VanLoan] |
|
|
|
Given two sets of matching
points, compute R |
|
pi’ = R pi with 3D rays |
|
pi = N(xi,yi,zi)
= N(ui-uc,vi-vc,f) |
|
A = Σi pi
pi’T = Σi pi piT
RT = U S VT = (U S UT) RT |
|
VT = UT RT |
|
R = V UT |
Pairwise alignment
|
|
|
RANSAC loop: |
|
Select two feature pairs (at
random)
pi = N(ui-uc,vi-vc,f
), pi’ = N(ui’-uc,vi’-vc,f
), i=0,1 |
|
Compute outer product matrix A
= Σi pi pi’T |
|
Compute R using SVD, A = U S VT, R = V UT |
|
Compute inliers where f |pi’ - R pi| < ε |
|
Keep largest set of inliers |
|
Re-compute least-squares SVD
estimate on all of the inliers, i=0..n |
Automatic stitching
|
|
|
Match all pairs and keep the
good ones
(# inliers > threshold) |
|
Sort pairs by strength (#
inliers) |
|
Add in next strongest match
(and other relevant matches) to current stitch |
|
Perform global alignment |
|
|
|
|
Incremental selection
& addition
|
|
|
[3] |
|
[4] (3,4) (4,3) |
|
[2] (2,4) (4,2) |
|
(2,3) (3,2) |
|
[1] (1,2) (2,1) |
|
(1,4) (4,1) (1,3) (3,1) |
|
[5] (5,3) (3,5) |
|
(4,5) (5,4) |
Global alignment
|
|
|
Task: Compute globally consistent set of
rotations {Ri} such that
Rjpij ≈ Rkpik or min |Rjpij
- Rkpik|2 |
|
|
|
Initialize “first” frame Ri
= I |
|
Multiply “next” frame by
pairwise rotation Rij |
|
Globally update all of the
current {Ri} |
|
|
|
Q: How to parameterize and
update the {Ri} ? |
Parameterizing rotations
|
|
|
|
How do we parameterize R and ΔR? |
|
Euler angles: bad idea |
|
quaternions: 4-vectors on unit
sphere |
|
use incremental rotation R(I + DR) |
|
|
|
|
|
|
|
update with Rodriguez formula |
Global alignment
|
|
|
Least-squares solution of
min |Rjpij - Rkpik|2 or
Rjpij - Rkpik = 0 |
|
Use the linearized update
(I+[ωj]´)Rjpij
- (I+[ωk]´) Rkpik
= 0 |
|
or |
|
[qij]´ωj- [qik]´ωk = qij-qik, qij=
Rjpij |
|
Estimate least square solution
over {ωi} |
|
Iterate a few times (updating
the {Ri}) |
Iterative focal length
adjustment
|
|
|
(Optional) [Szeliski &
Shum’97; MSR-TR-03] |
|
|
|
Simplest approach: |
|
arg minf f |Rjpij
- Rkpik|2 |
|
|
|
More complex approach: |
|
full bundle adjustment (op.
cit. & later in talk) |
Camera Calibration
Camera calibration
|
|
|
Determine camera parameters
from known 3D points or calibration object(s) |
|
internal or intrinsic
parameters such as focal length, optical center, aspect ratio:
what kind of camera? |
|
external or extrinsic (pose)
parameters:
where is the camera? |
|
How can we do this? |
Camera calibration –
approaches
|
|
|
Possible approaches: |
|
linear regression (least
squares) |
|
non-linear optimization |
|
vanishing points |
|
multiple planar patterns |
|
panoramas (rotational motion) |
Image formation equations
Calibration matrix
|
|
|
Is this form of K good enough? |
|
non-square pixels (digital
video) |
|
skew |
|
radial distortion |
|
|
Camera matrix
|
|
|
Fold intrinsic calibration
matrix K and extrinsic pose parameters (R,t) together into a
camera matrix |
|
M = K [R | t ] |
|
|
|
|
|
|
|
|
|
(put 1 in lower r.h. corner for
11 d.o.f.) |
Camera matrix calibration
|
|
|
Directly estimate 11 unknowns
in the M matrix using known 3D points (Xi,Yi,Zi)
and measured feature positions (ui,vi) |
Camera matrix calibration
|
|
|
|
Linear regression: |
|
Bring denominator over, solve
set of (over-determined) linear equations.
How? |
|
Least squares (pseudo-inverse) |
|
Is this good enough? |
Levenberg-Marquardt
|
|
|
|
Iterative non-linear least
squares [Press’92] |
|
Linearize measurement equations |
|
Substitute into log-likelihood
equation: quadratic cost function in Dm |
Levenberg-Marquardt
|
|
|
|
Iterative non-linear least
squares [Press’92] |
|
Solve for minimum
Hessian:
error: |
|
|
Levenberg-Marquardt
|
|
|
|
What if it doesn’t converge? |
|
Multiply diagonal by (1 + l),
increase l until it does |
|
Halve the step size Dm (my
favorite) |
|
Use line search |
|
Other ideas? |
|
Uncertainty analysis: covariance S = A-1 |
|
Is maximum likelihood the best
idea? |
|
How to start in vicinity of
global minimum? |
Camera matrix calibration
|
|
|
|
Advantages: |
|
very simple to formulate and
solve |
|
can recover K [R | t] from M
using QR decomposition [Golub & VanLoan 96] |
|
Disadvantages: |
|
doesn't compute internal
parameters |
|
more unknowns than true degrees
of freedom |
|
need a separate camera matrix
for each new view |
Separate intrinsics /
extrinsics
|
|
|
|
New feature measurement
equations |
|
|
|
|
|
Use non-linear minimization |
|
Standard technique in
photogrammetry, computer vision, computer graphics |
|
[Tsai 87] – also estimates k1
(freeware @ CMU)
http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html |
|
[Bogart 91] – View Correlation |
Intrinsic/extrinsic
calibration
|
|
|
|
Advantages: |
|
can solve for more than one
camera pose at a time |
|
potentially fewer degrees of
freedom |
|
Disadvantages: |
|
more complex update rules |
|
need a good initialization
(recover K [R | t] from M) |
Vanishing Points
|
|
|
Determine focal length f and
optical center (uc,vc) from image of cube’s
(or building’s)
vanishing points
[Caprile ’90][Antone & Teller ’00] |
Vanishing point
calibration
|
|
|
|
Advantages: |
|
only need to see vanishing
points
(e.g., architecture, table, …) |
|
Disadvantages: |
|
not that accurate |
|
need rectahedral object(s) in
scene |
Multi-plane calibration
|
|
|
|
|
Use several images of planar
target held at unknown orientations [Zhang 99] |
|
Compute plane homographies |
|
|
|
|
|
Solve for K-TK-1
from Hk’s |
|
1plane if only f unknown |
|
2 planes if (f,uc,vc)
unknown |
|
3+ planes for full K |
|
Code available from Zhang and
OpenCV |
Rotational motion
|
|
|
|
Use pure rotation (large scene)
to estimate f |
|
estimate f from pairwise
homographies |
|
re-estimate f from 360º “gap” |
|
optimize over all {K,Rj}
parameters
[Stein 95; Hartley ’97; Shum & Szeliski ’00; Kang & Weiss ’99] |
|
|
|
|
|
|
|
Most accurate way to get f,
short of surveying distant points |
Pose estimation and
triangulation
Pose estimation
|
|
|
|
Once the internal camera
parameters are known, can compute camera pose |
|
|
|
|
|
|
|
[Tsai87] [Bogart91] |
|
Application: superimpose 3D
graphics onto video |
|
|
|
How do we initialize (R,t)? |
Pose estimation
|
|
|
|
Previous initialization
techniques: |
|
vanishing points [Caprile 90] |
|
planar pattern [Zhang 99] |
|
Other possibilities |
|
Through-the-Lens Camera Control
[Gleicher92]: differential update |
|
3+ point “linear methods”: |
|
[DeMenthon 95][Quan
99][Ameller 00] |
Triangulation
|
|
|
|
|
Problem: Given some points in correspondence across
two or more images (taken from calibrated cameras), {(uj,vj)},
compute the 3D location X |
Triangulation
|
|
|
|
Method I: intersect viewing
rays in 3D, minimize: |
|
X is the unknown 3D point |
|
Cj is the optical
center of camera j |
|
Vj is the viewing
ray for pixel (uj,vj) |
|
sj is unknown
distance along Vj |
|
Advantage: geometrically
intuitive |
Triangulation
|
|
|
|
Method II: solve linear
equations in X |
|
advantage: very simple |
|
|
|
|
|
Method III: non-linear
minimization |
|
advantage: most accurate (image
plane error) |
Structure from Motion
Structure from motion
|
|
|
Given many points in correspondence
across several images, {(uij,vij)}, simultaneously
compute the 3D location xi and camera (or motion) parameters (K, Rj,
tj) |
|
|
|
|
|
|
|
Two main variants: calibrated,
and uncalibrated (sometimes associated with Euclidean and projective
reconstructions) |
Structure from motion
|
|
|
|
How many points do we need to
match? |
|
2 frames: |
|
(R,t): 5 dof + 3n point
locations £ |
|
4n point measurements Þ |
|
n ³ 5 |
|
k frames: |
|
6(k–1)-1 + 3n £ 2kn |
|
always want to use many more |
Two-frame methods
|
|
|
Two main variants: |
|
Calibrated: “Essential matrix” E
use ray directions (xi,
xi’ ) |
|
Uncalibrated: “Fundamental
matrix” F |
|
|
|
[Hartley & Zisserman 2000] |
Essential matrix
|
|
|
Co-planarity constraint: |
|
x’ ≈ R x + t |
|
[t]´ x’ ≈ [t]´ R x |
|
x’T [t]´ x’ ≈ x’ T [t]´ R x |
|
x’
T E x = 0 with E =[t]´ R |
|
Solve for E using least squares
(SVD) |
|
t is the least singular vector
of E |
|
R obtained from the other two
s.v.s |
Fundamental matrix
|
|
|
|
Camera calibrations are unknown |
|
x’ F x = 0 with F = [e]´ H = K’[t]´ R
K-1 |
|
Solve for F using least squares
(SVD) |
|
re-scale (xi, xi’
) so that |xi|≈1/2 [Hartley] |
|
e (epipole) is still the least
singular vector of F |
|
H obtained from the other two
s.v.s |
|
“plane + parallax” (projective)
reconstruction |
|
use self-calibration to
determine K [Pollefeys] |
|
|
Multi-frame Structure
from Motion
Factorization
|
|
|
[Tomasi & Kanade, IJCV 92] |
Structure [from] Motion
|
|
|
Given a set of feature
tracks,
estimate the 3D structure and 3D (camera) motion. |
|
Assumption: orthographic
projection |
|
|
|
Tracks: (ufp,vfp), f: frame, p:
point |
|
Subtract out mean 2D position… |
|
ufp = ifT
sp if: rotation, sp: position |
|
vfp = jfT
sp |
Measurement equations
|
|
|
Measurement equations |
|
ufp = ifT
sp if: rotation, sp: position |
|
vfp = jfT
sp |
|
Stack them up… |
|
W = R S |
|
R = (i1,…,iF,
j1,…,jF)T |
|
S = (s1,…,sP) |
|
|
Factorization
|
|
|
W = R2F´3 S3´P |
|
SVD |
|
W = U Λ V Λ must be
rank 3 |
|
W’ = (U Λ 1/2)(Λ1/2
V) = U’ V’ |
|
Make R orthogonal |
|
R = QU’ , S = Q-1V’ |
|
ifTQTQif
= 1 … |
Results
Extensions
|
|
|
Paraperspective |
|
[Poelman & Kanade, PAMI 97] |
|
Sequential Factorization |
|
[Morita & Kanade, PAMI 97] |
|
Factorization under perspective |
|
[Christy & Horaud, PAMI 96] |
|
[Sturm & Triggs, ECCV 96] |
|
Factorization with Uncertainty |
|
[Anandan & Irani, IJCV 2002] |
Bundle Adjustment
|
|
|
|
What makes this non-linear
minimization hard? |
|
many more parameters:
potentially slow |
|
poorer conditioning (high
correlation) |
|
potentially lots of outliers |
|
gauge (coordinate) freedom |
Lots of parameters:
sparsity
|
|
|
Only a few entries in Jacobian
are non-zero |
Sparse Cholesky (skyline)
|
|
|
First used in finite element
analysis |
|
Applied to SfM by [Szeliski
& Kang 1994]
structure | motion fill-in |
Conditioning and gauge
freedom
|
|
|
|
Poor conditioning: |
|
use 2nd order method |
|
use Cholesky decomposition |
|
|
|
|
|
|
|
|
|
Gauge freedom |
|
fix certain parameters
(orientation) or |
|
zero out last few rows in
Cholesky decomposition |
Robust error models
|
|
|
|
Outlier rejection |
|
use robust penalty
applied
to each set of joint
measurements |
|
for extremely bad data, use
random sampling [RANSAC, Fischler & Bolles, CACM’81] |
Structure from motion:
limitations
|
|
|
|
Very difficult to reliably
estimate metric
structure and motion unless: |
|
large (x or y) rotation or |
|
large field of view and depth
variation |
|
Camera calibration important
for Euclidean reconstructions |
|
Need good feature tracker |
Bibliography
|
|
|
M.-A. Ameller, B. Triggs, and
L. Quan. |
|
Camera pose revisited -- new
linear algorithms. |
|
http://www.inrialpes.fr/movi/people/Triggs/home.html,
2000. |
|
|
|
M. Antone and S. Teller. |
|
Recovering relative camera
rotations in urban scenes. |
|
In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR'2000), volume 2,
pages 282--289, Hilton Head Island, June 2000. |
|
|
|
S. Becker and V. M. Bove. |
|
Semiautomatic {3-D model
extraction from uncalibrated 2-d camera views. |
|
In SPIE Vol. 2410, Visual Data
Exploration and Analysis {II, pages 447--461, San Jose, CA, February 1995.
Society of Photo-Optical Instrumentation Engineers. |
|
|
|
R. G. Bogart. |
|
View correlation. |
|
In J. Arvo, editor, Graphics
Gems II, pages 181--190. Academic Press, Boston, 1991. |
Bibliography
|
|
|
D. C. Brown. |
|
Close-range camera calibration. |
|
Photogrammetric Engineering,
37(8):855--866, 1971. |
|
|
|
B. Caprile and V. Torre. |
|
Using vanishing points for
camera calibration. |
|
International Journal of
Computer Vision, 4(2):127--139, March 1990. |
|
|
|
R. T. Collins and R. S. Weiss. |
|
Vanishing point calculation as
a statistical inference on the unit sphere. |
|
In Third International
Conference on Computer Vision (ICCV'90), pages 400--403, Osaka, Japan,
December 1990. IEEE Computer Society Press. |
|
|
|
A. Criminisi, I. Reid, and A.
Zisserman. |
|
Single view metrology. |
|
In Seventh International
Conference on Computer Vision (ICCV'99), pages 434--441, Kerkyra, Greece,
September 1999. |
Bibliography
|
|
|
L. {de Agapito, R. I. Hartley,
and E. Hayman. |
|
Linear calibration of a
rotating and zooming camera. |
|
In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR'99), volume 1,
pages 15--21, Fort Collins, June 1999. |
|
|
|
D. I. DeMenthon and L. S.
Davis. |
|
Model-based object pose in 25
lines of code. |
|
International Journal of
Computer Vision, 15:123--141, June 1995. |
|
|
|
M. Gleicher and A. Witkin. |
|
Through-the-lens camera
control. |
|
Computer Graphics
(SIGGRAPH'92), 26(2):331--340, July 1992. |
|
|
|
R. I. Hartley. |
|
An algorithm for self
calibration from several views. |
|
In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR'94), pages
908--912, Seattle, Washington, June 1994. IEEE Computer Society. |
Bibliography
|
|
|
R. I. Hartley. |
|
Self-calibration of stationary
cameras. |
|
International Journal of
Computer Vision, 22(1):5--23, 1997. |
|
|
|
R. I. Hartley, E. Hayman, L.
{de Agapito, and I. Reid. |
|
Camera calibration and the
search for infinity. |
|
In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR'2000), volume 1,
pages 510--517, Hilton Head Island, June 2000. |
|
|
|
R. I. Hartley. and A.
Zisserman. |
|
Multiple View Geometry. |
|
Cambridge University Press,
2000. |
|
|
|
B. K. P. Horn. |
|
Closed-form solution of
absolute orientation using unit quaternions. |
|
Journal of the Optical Society
of America A, 4(4):629--642, 1987. |
Bibliography
|
|
|
S. B. Kang and R. Weiss. |
|
Characterization of errors in
compositing panoramic images. |
|
Computer Vision and Image
Understanding, 73(2):269--280, February 1999. |
|
|
|
M. Pollefeys, R. Koch and L.
Van Gool. |
|
Self-Calibration and Metric
Reconstruction in spite of Varying and Unknown Internal Camera Parameters. |
|
International Journal of
Computer Vision, 32(1), 7-25, 1999. [pdf] |
|
|
|
L. Quan and Z. Lan. |
|
Linear N-point camera pose
determination. |
|
IEEE Transactions on Pattern
Analysis and Machine Intelligence, 21(8):774--780, August 1999. |
|
|
|
G. Stein. |
|
Accurate internal camera
calibration using rotation, with analysis of sources of error. |
|
In Fifth International
Conference on Computer Vision (ICCV'95), pages 230--236, Cambridge,
Massachusetts, June 1995. |
Bibliography
|
|
|
Stewart, C. V. (1999). Robust
parameter estimation in computer vision. SIAM Reviews, 41(3), |
|
513–537. |
|
|
|
R. Szeliski and S. B. Kang. |
|
Recovering 3D Shape and Motion
from Image Streams using Nonlinear Least Squares |
|
Journal of Visual Communication
and Image Representation, 5(1):10-28, March 1994. |
|
|
|
R. Y. Tsai. |
|
A versatile camera calibration
technique for high-accuracy {3D machine vision metrology using off-the-shelf
{TV cameras and lenses. |
|
IEEE Journal of Robotics and
Automation, RA-3(4):323--344, August 1987. |
|
|
|
Z. Zhang. |
|
Flexible camera calibration by
viewing a plane from unknown orientations. |
|
In Seventh International
Conference on Computer Vision (ICCV'99), pages 666--687, Kerkyra, Greece,
September 1999. |