Notes
Slide Show
Outline
1
Global Alignment and
Structure from Motion
  • Computer Vision
    CSE576, Spring 2005
    Richard Szeliski
2
Today’s lecture
  • Rotational alignment (“3D stitching”) [Project 3]
  • pairwise alignment (Procrustes)
  • global alignment (linearized least squares)


  • Calibration
  • camera matrix (Direct Linear Transform)
    • non-linear least squares
  • separating intrinsics and extrinsics
  • focal length and optic center
3
Today’s lecture
  • Structure from Motion
  • triangulation and pose
  • two-frame methods
  • factorization
  • bundle adjustment
  • robust statistics
4
Global rotational alignment
  • Fully Automated Panoramic Stitching

  • [Project 3]
5
AutoStitch [Brown & Lowe’03]

  • Stitch panoramic image from an arbitrary collection of photographs (known focal length)
  • Extract and (pairwise) match features
  • Estimate pairwise rotations using RANSAC
  • Add to stitch and re-run global alignment
  • Warp images to sphere and blend
6
3D Rotation Model
  • Projection equations
  • Project from image to 3D ray
  • (x0,y0,z0) = (u0-uc,v0-vc,f)
  • Rotate the ray by camera motion
  • (x1,y1,z1) = R01 (x0,y0,z0)
  • Project back into new (source) image
  • (u1,v1) = (fx1/z1+uc,fy1/z1+vc)


7
Pairwise alignment
  • Absolute orientation [Arun et al., PAMI 1987] [Horn et al., JOSA A 1988], Procrustes Algorithm [Golub & VanLoan]


  • Given two sets of matching points, compute R
  • pi’ = R pi     with 3D rays
  • pi = N(xi,yi,zi) = N(ui-uc,vi-vc,f)
  • A = Σi pi pi’T = Σi pi piT RT = U S VT = (U S UT) RT
  • VT = UT RT
  • R = V UT
8
Pairwise alignment
  • RANSAC loop:
  • Select two feature pairs (at random)
    pi = N(ui-uc,vi-vc,f ),  pi’ = N(ui’-uc,vi’-vc,f ), i=0,1
  • Compute outer product matrix A = Σi pi pi’T
  • Compute R using SVD, A = U S VT,    R = V UT
  • Compute inliers where  f |pi’ - R pi| < ε
  • Keep largest set of inliers
  • Re-compute least-squares SVD estimate on all of the inliers, i=0..n
9
Automatic stitching
  • Match all pairs and keep the good ones
    (# inliers > threshold)
  • Sort pairs by strength (# inliers)
  • Add in next strongest match (and other relevant matches) to current stitch
  • Perform global alignment



10
Incremental selection & addition
  • [3]
  • [4] (3,4) (4,3)
  • [2] (2,4) (4,2)
  •      (2,3) (3,2)
  • [1] (1,2) (2,1)
  •      (1,4) (4,1) (1,3) (3,1)
  • [5] (5,3) (3,5)
  •       (4,5) (5,4)
11
Global alignment
  • Task:  Compute globally consistent set of rotations {Ri} such that
    Rjpij ≈ Rkpik or min |Rjpij - Rkpik|2


  • Initialize “first” frame Ri = I
  • Multiply “next” frame by pairwise rotation Rij
  • Globally update all of the current {Ri}


  • Q: How to parameterize and update the {Ri} ?
12
Parameterizing rotations
  • How do we parameterize R and ΔR?
    • Euler angles:  bad idea
    • quaternions: 4-vectors on unit sphere
    • use incremental rotation R(I + DR)




    • update with Rodriguez formula
13
Global alignment
  • Least-squares solution of
    min |Rjpij - Rkpik|2 or Rjpij - Rkpik = 0
  • Use the linearized update
    (I+[ωj]´)Rjpij - (I+[ωk]´) Rkpik = 0
  • or
  • [qij]´ωj- [qik]´ωk = qij-qik, qij= Rjpij
  • Estimate least square solution over {ωi}
  • Iterate a few times (updating the {Ri})
14
Iterative focal length adjustment
  • (Optional) [Szeliski & Shum’97; MSR-TR-03]


  • Simplest approach:
  • arg minf f |Rjpij - Rkpik|2


  • More complex approach:
  • full bundle adjustment (op. cit. & later in talk)
15
Camera Calibration
16
Camera calibration
  • Determine camera parameters from known 3D points or calibration object(s)
  • internal or intrinsic parameters such as focal length, optical center, aspect ratio:
    what kind of camera?
  • external or extrinsic (pose)
    parameters:
    where is the camera?
  • How can we do this?
17
Camera calibration – approaches
  • Possible approaches:
  • linear regression (least squares)
  • non-linear optimization
  • vanishing points
  • multiple planar patterns
  • panoramas (rotational motion)
18
Image formation equations


19
Calibration matrix
  • Is this form of K good enough?
  • non-square pixels (digital video)
  • skew
  • radial distortion


20
Camera matrix
  • Fold intrinsic calibration matrix K and extrinsic pose parameters (R,t) together into a
    camera matrix
  • M = K [R | t ]





  • (put 1 in lower r.h. corner for 11 d.o.f.)
21
Camera matrix calibration
  • Directly estimate 11 unknowns in the M matrix using known 3D points (Xi,Yi,Zi) and measured feature positions (ui,vi)
22
Camera matrix calibration
  • Linear regression:
    • Bring denominator over, solve set of (over-determined) linear equations.  How?
    • Least squares (pseudo-inverse)
    • Is this good enough?
23
Levenberg-Marquardt
  • Iterative non-linear least squares [Press’92]
    • Linearize measurement equations
    • Substitute into log-likelihood equation:  quadratic cost function in Dm
24
Levenberg-Marquardt
  • Iterative non-linear least squares [Press’92]
    • Solve for minimum


      Hessian:

      error:


25
Levenberg-Marquardt
  • What if it doesn’t converge?
    • Multiply diagonal by (1 + l), increase l until it does
    • Halve the step size Dm (my favorite)
    • Use line search
    • Other ideas?
  • Uncertainty analysis:  covariance S = A-1
  • Is maximum likelihood the best idea?
  • How to start in vicinity of global minimum?
26
Camera matrix calibration
  • Advantages:
    • very simple to formulate and solve
    • can recover K [R | t] from M using QR decomposition [Golub & VanLoan 96]
  • Disadvantages:
    • doesn't compute internal parameters
    • more unknowns than true degrees of freedom
    • need a separate camera matrix for each new view
27
Separate intrinsics / extrinsics
  • New feature measurement equations



  • Use non-linear minimization
  • Standard technique in photogrammetry, computer vision, computer graphics
    • [Tsai 87] – also estimates k1 (freeware @ CMU)
      http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html
    • [Bogart 91] – View Correlation
28
Intrinsic/extrinsic calibration
  • Advantages:
    • can solve for more than one camera pose at a time
    • potentially fewer degrees of freedom
  • Disadvantages:
    • more complex update rules
    • need a good initialization (recover K [R | t] from M)
29
Vanishing Points
  • Determine focal length f and optical center (uc,vc) from image of cube’s
    (or building’s)
    vanishing points
    [Caprile ’90][Antone & Teller ’00]
30
Vanishing point calibration
  • Advantages:
    • only need to see vanishing points
      (e.g., architecture, table, …)
  • Disadvantages:
    • not that accurate
    • need rectahedral object(s) in scene
31
Multi-plane calibration
  • Use several images of planar target held at unknown orientations [Zhang 99]
    • Compute plane homographies



    • Solve for K-TK-1 from Hk’s
      • 1plane if only f unknown
      • 2 planes if (f,uc,vc) unknown
      • 3+ planes for full K
    • Code available from Zhang and OpenCV
32
Rotational motion
  • Use pure rotation (large scene) to estimate f
    • estimate f from pairwise homographies
    • re-estimate f from 360º “gap”
    • optimize over all {K,Rj} parameters
      [Stein 95; Hartley ’97; Shum & Szeliski ’00; Kang & Weiss ’99]



  • Most accurate way to get f, short of surveying distant points
33
Pose estimation and triangulation
34
Pose estimation
  • Once the internal camera parameters are known, can compute camera pose




    • [Tsai87] [Bogart91]
  • Application: superimpose 3D graphics onto video


  • How do we initialize (R,t)?
35
Pose estimation
  • Previous initialization techniques:
    • vanishing points [Caprile 90]
    • planar pattern [Zhang 99]
  • Other possibilities
    • Through-the-Lens Camera Control [Gleicher92]: differential update
    • 3+ point “linear methods”:
    • [DeMenthon 95][Quan 99][Ameller 00]
36
Triangulation

  • Problem:  Given some points in correspondence across two or more images (taken from calibrated cameras), {(uj,vj)}, compute the 3D location X
37
Triangulation
  • Method I: intersect viewing rays in 3D, minimize:
    • X is the unknown 3D point
    • Cj is the optical center of camera j
    • Vj is the viewing ray for pixel (uj,vj)
    • sj is unknown distance along Vj
  • Advantage: geometrically intuitive
38
Triangulation
  • Method II: solve linear equations in X
    • advantage: very simple


  • Method III: non-linear minimization
    • advantage: most accurate (image plane error)
39
Structure from Motion
40
Structure from motion
  • Given many points in correspondence across several images, {(uij,vij)}, simultaneously compute the 3D location xi and camera (or motion) parameters (K, Rj, tj)




  • Two main variants: calibrated, and uncalibrated (sometimes associated with Euclidean and projective reconstructions)
41
Structure from motion
  • How many points do we need to match?
  • 2 frames:
    • (R,t): 5 dof + 3n point locations £
    • 4n point measurements Þ
    • n ³ 5
  • k frames:
    • 6(k–1)-1 + 3n £ 2kn
  • always want to use many more
42
Two-frame methods
  • Two main variants:
  • Calibrated: “Essential matrix” E
      use ray directions (xi, xi’ )
  • Uncalibrated: “Fundamental matrix” F


  • [Hartley & Zisserman 2000]
43
Essential matrix
  • Co-planarity constraint:
  •   x’ ≈  R x + t
  • [t]´ x’ ≈ [t]´ R x
  •   x’T [t]´ x’ ≈ x’ T [t]´ R x
  •       x’ T E x = 0  with E =[t]´ R
  • Solve for E using least squares (SVD)
  • t is the least singular vector of E
  • R obtained from the other two s.v.s
44
Fundamental matrix
  • Camera calibrations are unknown
  • x’ F x = 0 with F  = [e]´ H = K’[t]´ R K-1
  • Solve for F using least squares (SVD)
    • re-scale (xi, xi’ ) so that |xi|≈1/2  [Hartley]
  • e (epipole) is still the least singular vector of F
  • H obtained from the other two s.v.s
  • “plane + parallax” (projective) reconstruction
  • use self-calibration to determine K [Pollefeys]


45
Multi-frame Structure from Motion
46
Factorization
  • [Tomasi & Kanade, IJCV 92]
47
Structure [from] Motion
  • Given a set of feature tracks,
    estimate the 3D structure and 3D (camera) motion.
  • Assumption: orthographic projection


  • Tracks:  (ufp,vfp), f: frame, p: point
  • Subtract out mean 2D position…
  • ufp = ifT sp if: rotation,  sp: position
  • vfp = jfT sp
48
Measurement equations
  • Measurement equations
  • ufp = ifT sp if: rotation,  sp: position
  • vfp = jfT sp
  • Stack them up…
  • W = R S
  • R = (i1,…,iF, j1,…,jF)T
  • S = (s1,…,sP)


49
Factorization
  • W = R2F´3 S3´P
  • SVD
  • W = U Λ V Λ must be rank 3
  • W’ = (U Λ 1/2)(Λ1/2 V) = U’ V’
  • Make R orthogonal
  • R = QU’ ,  S = Q-1V’
  • ifTQTQif = 1 …
50
Results
  • Look at paper figures…
51
Extensions
  • Paraperspective
  •  [Poelman & Kanade, PAMI 97]
  • Sequential Factorization
  •  [Morita & Kanade, PAMI 97]
  • Factorization under perspective
  •  [Christy & Horaud, PAMI 96]
  •  [Sturm & Triggs, ECCV 96]
  • Factorization with Uncertainty
  •  [Anandan & Irani, IJCV 2002]
52
Bundle Adjustment
  • What makes this non-linear minimization hard?
    • many more parameters: potentially slow
    • poorer conditioning (high correlation)
    • potentially lots of outliers
    • gauge (coordinate) freedom
53
Lots of parameters: sparsity
  • Only a few entries in Jacobian are non-zero
54
Sparse Cholesky (skyline)
  • First used in finite element analysis
  • Applied to SfM by [Szeliski & Kang 1994]






       structure | motion        fill-in
55
Conditioning and gauge freedom
  • Poor conditioning:
    • use 2nd order method
    • use Cholesky decomposition





  • Gauge freedom
    • fix certain parameters (orientation)  or
    • zero out last few rows in Cholesky decomposition
56
Robust error models
  • Outlier rejection
    • use robust penalty applied
      to each set of joint
      measurements
    • for extremely bad data, use random sampling [RANSAC, Fischler & Bolles, CACM’81]
57
Structure from motion: limitations
  • Very difficult to reliably estimate metric
    structure and motion unless:
    • large (x or y) rotation or
    • large field of view and depth variation
  • Camera calibration important for Euclidean reconstructions
  • Need good feature tracker
58
Bibliography
  • M.-A. Ameller, B. Triggs, and L. Quan.
  • Camera pose revisited -- new linear algorithms.
  • http://www.inrialpes.fr/movi/people/Triggs/home.html, 2000.


  • M. Antone and S. Teller.
  • Recovering relative camera rotations in urban scenes.
  • In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2000), volume 2, pages 282--289, Hilton Head Island, June 2000.


  • S. Becker and V. M. Bove.
  • Semiautomatic {3-D model extraction from uncalibrated 2-d camera views.
  • In SPIE Vol. 2410, Visual Data Exploration and Analysis {II, pages 447--461, San Jose, CA, February 1995. Society of Photo-Optical Instrumentation Engineers.


  • R. G. Bogart.
  • View correlation.
  • In J. Arvo, editor, Graphics Gems II, pages 181--190. Academic Press, Boston, 1991.
59
Bibliography
  • D. C. Brown.
  • Close-range camera calibration.
  • Photogrammetric Engineering, 37(8):855--866, 1971.


  • B. Caprile and V. Torre.
  • Using vanishing points for camera calibration.
  • International Journal of Computer Vision, 4(2):127--139, March 1990.


  • R. T. Collins and R. S. Weiss.
  • Vanishing point calculation as a statistical inference on the unit sphere.
  • In Third International Conference on Computer Vision (ICCV'90), pages 400--403, Osaka, Japan, December 1990. IEEE Computer Society Press.


  • A. Criminisi, I. Reid, and A. Zisserman.
  • Single view metrology.
  • In Seventh International Conference on Computer Vision (ICCV'99), pages 434--441, Kerkyra, Greece, September 1999.
60
Bibliography
  • L. {de Agapito, R. I. Hartley, and E. Hayman.
  • Linear calibration of a rotating and zooming camera.
  • In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), volume 1, pages 15--21, Fort Collins, June 1999.


  • D. I. DeMenthon and L. S. Davis.
  • Model-based object pose in 25 lines of code.
  • International Journal of Computer Vision, 15:123--141, June 1995.


  • M. Gleicher and A. Witkin.
  • Through-the-lens camera control.
  • Computer Graphics (SIGGRAPH'92), 26(2):331--340, July 1992.


  • R. I. Hartley.
  • An algorithm for self calibration from several views.
  • In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'94), pages 908--912, Seattle, Washington, June 1994. IEEE Computer Society.
61
Bibliography
  • R. I. Hartley.
  • Self-calibration of stationary cameras.
  • International Journal of Computer Vision, 22(1):5--23, 1997.


  • R. I. Hartley, E. Hayman, L. {de Agapito, and I. Reid.
  • Camera calibration and the search for infinity.
  • In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2000), volume 1, pages 510--517, Hilton Head Island, June 2000.


  • R. I. Hartley. and A. Zisserman.
  • Multiple View Geometry.
  • Cambridge University Press, 2000.


  • B. K. P. Horn.
  • Closed-form solution of absolute orientation using unit quaternions.
  • Journal of the Optical Society of America A, 4(4):629--642, 1987.
62
Bibliography
  • S. B. Kang and R. Weiss.
  • Characterization of errors in compositing panoramic images.
  • Computer Vision and Image Understanding, 73(2):269--280, February 1999.


  • M. Pollefeys, R. Koch and L. Van Gool.
  • Self-Calibration and Metric Reconstruction in spite of Varying and Unknown Internal Camera Parameters.
  • International Journal of Computer Vision, 32(1), 7-25, 1999. [pdf]


  • L. Quan and Z. Lan.
  • Linear N-point camera pose determination.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):774--780, August 1999.


  • G. Stein.
  • Accurate internal camera calibration using rotation, with analysis of sources of error.
  • In Fifth International Conference on Computer Vision (ICCV'95), pages 230--236, Cambridge, Massachusetts, June 1995.
63
Bibliography
  • Stewart, C. V. (1999). Robust parameter estimation in computer vision. SIAM Reviews, 41(3),
  • 513–537.


  • R. Szeliski and S. B. Kang.
  • Recovering 3D Shape and Motion from Image Streams using Nonlinear Least Squares
  • Journal of Visual Communication and Image Representation, 5(1):10-28, March 1994.


  • R. Y. Tsai.
  • A versatile camera calibration technique for high-accuracy {3D machine vision metrology using off-the-shelf {TV cameras and lenses.
  • IEEE Journal of Robotics and Automation, RA-3(4):323--344, August 1987.


  • Z. Zhang.
  • Flexible camera calibration by viewing a plane from unknown orientations.
  • In Seventh International Conference on Computer Vision (ICCV'99), pages 666--687, Kerkyra, Greece, September 1999.