Notes
Slide Show
Outline
1
Motion estimation
  • Computer Vision
    CSE576, Spring 2005
    Richard Szeliski
2
Why estimate visual motion?
  • Visual Motion can be annoying
    • Camera instabilities, jitter
    • Measure it; remove it (stabilize)
  • Visual Motion indicates dynamics in the scene
    • Moving objects, behavior
    • Track objects and analyze trajectories
  • Visual Motion reveals spatial layout
    • Motion parallax
3
Today’s lecture
  • Motion estimation
  • image warping (skip: see handout)
  • patch-based motion (optic flow)
  • parametric (global) motion
  • application: image morphing
  • advanced: layered motion models
4
Readings
  • Bergen et al.  Hierarchical model-based motion estimation. ECCV’92,  pp. 237–252.
  • Szeliski, R.  Image Alignment and Stitching:  A Tutorial, MSR-TR-2004-92, Sec. 3.4 & 3.5.
  • Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pp. 593–600.
  • Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework.  IJCV, 56(3), 221–255.
5
Patch-based motion estimation
6
Classes of Techniques
  • Feature-based methods
    • Extract visual features (corners, textured areas) and track them over multiple frames
    • Sparse motion fields, but possibly robust tracking
    • Suitable especially when image motion is large (10-s of pixels)

  • Direct-methods
    • Directly recover image motion from spatio-temporal image brightness variations
    • Global motion parameters directly recovered without an intermediate feature motion calculation
    • Dense motion fields, but more sensitive to appearance variations
    • Suitable for video and when image motion is small (< 10 pixels)
7
Patch matching (revisited)
  • How do we determine correspondences?
    • block matching or SSD (sum squared differences)
8
The Brightness Constraint
  • Brightness Constancy Equation:
9
The Brightness Constraint
  • Brightness Constancy Equation:
10
Gradient Constraint (or the Optical Flow Constraint)
11
Patch Translation [Lucas-Kanade]
12
Local Patch Analysis
  • How certain are the motion estimates?
13
The Aperture Problem
14
SSD Surface – Textured area
15
SSD Surface -- Edge
16
SSD – homogeneous area
17
Iterative Refinement
  • Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation
  • Warp one image toward the other using the estimated flow field
    • (easier said than done)
  • Refine estimate by repeating the process
18
Optical Flow: Iterative Estimation
19
Optical Flow: Iterative Estimation
20
Optical Flow: Iterative Estimation
21
Optical Flow: Iterative Estimation
22
Optical Flow: Iterative Estimation
  • Some Implementation Issues:
    • Warping is not easy (ensure that errors in warping are smaller than the estimate refinement)
    • Warp one image, take derivatives of the other so you don’t need to re-compute the gradient after each iteration.
    • Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)
23
Optical Flow: Aliasing
24
 
25
 
26
Parametric motion estimation
27
Global (parametric) motion models
  • 2D Models:
  • Affine
  • Quadratic
  • Planar projective transform (Homography)


  • 3D Models:
  • Instantaneous camera motion models
  • Homography+epipole
  • Plane+Parallax


28
Motion models
29
Example:  Affine Motion
  • Substituting into the B.C. Equation:
30
Other 2D Motion Models
31
3D Motion Models
32
Patch matching (revisited)
  • How do we determine correspondences?
    • block matching or SSD (sum squared differences)
33
Correlation and SSD
  • For larger displacements, do template matching
    • Define a small area around a pixel as the template
    • Match the template against each pixel within a search area in next image.
    • Use a match measure such as correlation, normalized correlation, or sum-of-squares difference
    • Choose the maximum (or minimum) as the match
    • Sub-pixel estimate (Lucas-Kanade)
34
Discrete Search vs. Gradient Based
35
Shi-Tomasi feature tracker
  • Find good features (min eigenvalue of 2´2 Hessian)
  • Use Lucas-Kanade to track with pure translation
  • Use affine registration with first feature patch
  • Terminate tracks whose dissimilarity gets too large
  • Start new tracks when needed
36
Tracking results
37
Tracking - dissimilarity
38
Tracking results
39
Correlation Window Size
  • Small windows lead to more false matches
  • Large windows are better this way, but…
    • Neighboring flow vectors will be more correlated (since the template windows have more in common)
    • Flow resolution also lower (same reason)
    • More expensive to compute


  • Small windows are good for local search:
    more detailed and less smooth (noisy?)
  • Large windows good for global search:
    less detailed and smoother
40
Robust Estimation
  • Noise distributions are often non-Gaussian, having much heavier tails.  Noise samples from the tails are called outliers.
  • Sources of outliers (multiple motions):
    • specularities / highlights
    • jpeg artifacts / interlacing / motion blur
    • multiple motions (occlusion boundaries, transparency)


41
Robust Estimation
42
Robust Estimation
43
Robust Estimation
44
Image Morphing
45
Image Warping – non-parametric
  • Specify more detailed warp function




  • Examples:
    • splines
    • triangles
    • optical flow (per-pixel motion)

46
Image Warping – non-parametric
  • Move control points to specify spline warp
47
Image Morphing
  • How can we in-between two images?
    • Cross-dissolve








      (all examples from [Gomes et al.’99])
48
Image Morphing
  • How can we in-between two images?
    • Warp then cross-dissolve = morph
49
Warp specification
  • How can we specify the warp?
    • Specify corresponding points
      • interpolate to a complete warping function
      • Nielson, Scattered Data Modeling, IEEE CG&A’93]
50
Warp specification
  • How can we specify the warp?
    • Specify corresponding vectors
      • interpolate to a complete warping function
51
Warp specification
  • How can we specify the warp?
    • Specify corresponding vectors
      • interpolate [Beier & Neely, SIGGRAPH’92]
52
Warp specification
  • How can we specify the warp?
    • Specify corresponding spline control points
      • interpolate to a complete warping function
53
Final Morph Result
54
Layered Scene Representations
55
Motion representations
  • How can we describe this scene?
56
Block-based motion prediction
  • Break image up into square blocks
  • Estimate translation for each block
  • Use this to predict next frame, code difference  (MPEG-2)
57
Layered motion
  • Break image sequence up into “layers”:


  •       ¸   =





  • Describe each layer’s motion
58
Layered motion
  • Advantages:
  • can represent occlusions / disocclusions
  • each layer’s motion can be smooth
  • video segmentation for semantic processing
  • Difficulties:
  • how do we determine the correct number?
  • how do we assign pixels?
  • how do we model the motion?
59
Layers for video summarization
60
Background modeling (MPEG-4)
  • Convert masked images into a background sprite for layered video coding


  • +   +     +



  • =



61
What are layers?
  • [Wang & Adelson, 1994]
  • intensities
  • alphas
  • velocities
62
How do we form them?
63
How do we estimate the layers?
  • compute coarse-to-fine flow
  • estimate affine motion in blocks (regression)
  • cluster with k-means
  • assign pixels to best fitting affine region
  • re-estimate affine motions in each region…
64
Layer synthesis
  • For each layer:
  • stabilize the sequence with the affine motion
  • compute median value at each pixel
  • Determine occlusion relationships
65
Results
66
Bibliography
  • L. Williams. Pyramidal parametrics.  Computer Graphics, 17(3):1--11, July 1983.


  • L. G. Brown. A survey of image registration techniques.  Computing Surveys, 24(4):325--376, December 1992.


  • C. D. Kuglin and D. C. Hines. The phase correlation image alignment method.  In IEEE 1975 Conference on Cybernetics and Society, pages 163--165, New York, September 1975.


  • J. Gomes, L. Darsa, B. Costa, and L. Velho. Warping and Morphing of Graphical Objects.  Morgan Kaufmann, 1999.


  • T. Beier and S. Neely. Feature-based image metamorphosis. Computer Graphics (SIGGRAPH'92), 26(2):35--42, July 1992.
67
Bibliography
  • J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani.  Hierarchical model-based motion estimation. In ECCV’92,  pp. 237–252, Italy, May 1992.


  • M. J. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Comp. Vis. Image Understanding, 63(1):75–104, 1996.


  • Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pages 593–600, IEEE Computer Society, Seattle.


  • Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework: Part 1: The quantity approximated, the warp update rule, and the gradient descent approximation.  IJCV, 56(3), 221–255.
68
Bibliography
  • H. S. Sawhney and S. Ayer. Compact representation of videos through dominant multiple motion estimation. IEEE Trans. Patt. Anal. Mach. Intel., 18(8):814–830, Aug. 1996.


  • Y. Weiss. Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In CVPR’97, pp. 520–526, June 1997.


  • J. Y. A. Wang and E. H. Adelson.  Representing moving images with layers.  IEEE Transactions on Image Processing, 3(5):625--638, September 1994.


69
Bibliography
  • Y. Weiss and E. H. Adelson.  A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 321--326, San Francisco, California, June 1996.
  • Y. Weiss.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97), pages 520--526, San Juan, Puerto Rico, June 1997.
  • P. R. Hsu, P. Anandan, and S. Peleg.  Accurate computation of optical flow by using layered motion representations.  In Twelfth International Conference on Pattern Recognition (ICPR'94), pages 743--746, Jerusalem, Israel, October 1994. IEEE Computer Society Press
70
Bibliography
  • T. Darrell and A. Pentland.  Cooperative robust estimation using layers of support.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):474--487, May 1995.
  • S. X. Ju, M. J. Black, and A. D. Jepson.  Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 307--314, San Francisco, California, June 1996.
  • M. Irani, B. Rousso, and S. Peleg.  Computing occluding and transparent motions.  International Journal of Computer Vision, 12(1):5--16, January 1994.
  • H. S. Sawhney and S. Ayer.  Compact representation of videos through dominant multiple motion estimation.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):814--830, August 1996.
  • M.-C. Lee et al.  A layered video object coding system using sprite and affine motion model.  IEEE Transactions on Circuits and Systems for Video Technology, 7(1):130--145, February 1997.
71
Bibliography
  • S. Baker, R. Szeliski, and P. Anandan.  A layered approach to stereo reconstruction.  In IEEE CVPR'98, pages 434--441, Santa Barbara, June 1998.
  • R. Szeliski, S. Avidan, and P. Anandan.  Layer extraction from multiple images containing reflections and transparency.  In IEEE CVPR'2000, volume 1, pages 246--253, Hilton Head Island, June 2000.
  • J. Shade, S. Gortler, L.-W. He, and R. Szeliski.  Layered depth images.  In Computer Graphics (SIGGRAPH'98) Proceedings, pages 231--242, Orlando, July 1998. ACM SIGGRAPH.
  • S. Laveau and O. D. Faugeras.  3-d scene representation as a collection of images.  In Twelfth International Conference on Pattern Recognition (ICPR'94), volume A, pages 689--691, Jerusalem, Israel, October 1994. IEEE Computer Society Press.
  • P. H. S. Torr, R. Szeliski, and P. Anandan.  An integrated Bayesian approach to layer extraction from image sequences.  In Seventh ICCV'98, pages 983--990, Kerkyra, Greece, September 1999.