Motion estimation
Computer Vision
CSE576, Spring 2005
Richard Szeliski

Why estimate visual motion?
Visual Motion can be annoying
Camera instabilities, jitter
Measure it; remove it (stabilize)
Visual Motion indicates dynamics in the scene
Moving objects, behavior
Track objects and analyze trajectories
Visual Motion reveals spatial layout
Motion parallax

Today’s lecture
Motion estimation
image warping (skip: see handout)
patch-based motion (optic flow)
parametric (global) motion
application: image morphing
advanced: layered motion models

Readings
Bergen et al.  Hierarchical model-based motion estimation. ECCV’92,  pp. 237–252.
Szeliski, R.  Image Alignment and Stitching:  A Tutorial, MSR-TR-2004-92, Sec. 3.4 & 3.5.
Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pp. 593–600.
Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework.  IJCV, 56(3), 221–255.

Patch-based motion estimation

Classes of Techniques
Feature-based methods
Extract visual features (corners, textured areas) and track them over multiple frames
Sparse motion fields, but possibly robust tracking
Suitable especially when image motion is large (10-s of pixels)
Direct-methods
Directly recover image motion from spatio-temporal image brightness variations
Global motion parameters directly recovered without an intermediate feature motion calculation
Dense motion fields, but more sensitive to appearance variations
Suitable for video and when image motion is small (< 10 pixels)

Patch matching (revisited)
How do we determine correspondences?
block matching or SSD (sum squared differences)

The Brightness Constraint
Brightness Constancy Equation:

The Brightness Constraint
Brightness Constancy Equation:

Gradient Constraint (or the Optical Flow Constraint)

Patch Translation [Lucas-Kanade]

Local Patch Analysis
How certain are the motion estimates?

The Aperture Problem

SSD Surface – Textured area

SSD Surface -- Edge

SSD – homogeneous area

Iterative Refinement
Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation
Warp one image toward the other using the estimated flow field
(easier said than done)
Refine estimate by repeating the process

Optical Flow: Iterative Estimation

Optical Flow: Iterative Estimation

Optical Flow: Iterative Estimation

Optical Flow: Iterative Estimation

Optical Flow: Iterative Estimation
Some Implementation Issues:
Warping is not easy (ensure that errors in warping are smaller than the estimate refinement)
Warp one image, take derivatives of the other so you don’t need to re-compute the gradient after each iteration.
Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)

Optical Flow: Aliasing

Slide 24

Slide 25

Parametric motion estimation

Global (parametric) motion models
2D Models:
Affine
Quadratic
Planar projective transform (Homography)
3D Models:
Instantaneous camera motion models
Homography+epipole
Plane+Parallax

Motion models

Example:  Affine Motion
Substituting into the B.C. Equation:

Other 2D Motion Models

3D Motion Models

Patch matching (revisited)
How do we determine correspondences?
block matching or SSD (sum squared differences)

Correlation and SSD
For larger displacements, do template matching
Define a small area around a pixel as the template
Match the template against each pixel within a search area in next image.
Use a match measure such as correlation, normalized correlation, or sum-of-squares difference
Choose the maximum (or minimum) as the match
Sub-pixel estimate (Lucas-Kanade)

Discrete Search vs. Gradient Based

Shi-Tomasi feature tracker
Find good features (min eigenvalue of 2´2 Hessian)
Use Lucas-Kanade to track with pure translation
Use affine registration with first feature patch
Terminate tracks whose dissimilarity gets too large
Start new tracks when needed

Tracking results

Tracking - dissimilarity

Tracking results

Correlation Window Size
Small windows lead to more false matches
Large windows are better this way, but…
Neighboring flow vectors will be more correlated (since the template windows have more in common)
Flow resolution also lower (same reason)
More expensive to compute
Small windows are good for local search:
more detailed and less smooth (noisy?)
Large windows good for global search:
less detailed and smoother

Robust Estimation
Noise distributions are often non-Gaussian, having much heavier tails.  Noise samples from the tails are called outliers.
Sources of outliers (multiple motions):
specularities / highlights
jpeg artifacts / interlacing / motion blur
multiple motions (occlusion boundaries, transparency)

Robust Estimation

Robust Estimation

Robust Estimation

Image Morphing

Image Warping – non-parametric
Specify more detailed warp function
Examples:
splines
triangles
optical flow (per-pixel motion)

Image Warping – non-parametric
Move control points to specify spline warp

Image Morphing
How can we in-between two images?
Cross-dissolve








(all examples from [Gomes et al.’99])

Image Morphing
How can we in-between two images?
Warp then cross-dissolve = morph

Warp specification
How can we specify the warp?
Specify corresponding points
interpolate to a complete warping function
Nielson, Scattered Data Modeling, IEEE CG&A’93]

Warp specification
How can we specify the warp?
Specify corresponding vectors
interpolate to a complete warping function

Warp specification
How can we specify the warp?
Specify corresponding vectors
interpolate [Beier & Neely, SIGGRAPH’92]

Warp specification
How can we specify the warp?
Specify corresponding spline control points
interpolate to a complete warping function

Final Morph Result

Layered Scene Representations

Motion representations
How can we describe this scene?

Block-based motion prediction
Break image up into square blocks
Estimate translation for each block
Use this to predict next frame, code difference  (MPEG-2)

Layered motion
Break image sequence up into “layers”:
      ¸   =
Describe each layer’s motion

Layered motion
Advantages:
can represent occlusions / disocclusions
each layer’s motion can be smooth
video segmentation for semantic processing
Difficulties:
how do we determine the correct number?
how do we assign pixels?
how do we model the motion?

Layers for video summarization

Background modeling (MPEG-4)
Convert masked images into a background sprite for layered video coding
+   +     +

=

What are layers?
[Wang & Adelson, 1994]
intensities
alphas
velocities

How do we form them?

How do we estimate the layers?
compute coarse-to-fine flow
estimate affine motion in blocks (regression)
cluster with k-means
assign pixels to best fitting affine region
re-estimate affine motions in each region…

Layer synthesis
For each layer:
stabilize the sequence with the affine motion
compute median value at each pixel
Determine occlusion relationships

Results

Bibliography
L. Williams. Pyramidal parametrics.  Computer Graphics, 17(3):1--11, July 1983.
L. G. Brown. A survey of image registration techniques.  Computing Surveys, 24(4):325--376, December 1992.
C. D. Kuglin and D. C. Hines. The phase correlation image alignment method.  In IEEE 1975 Conference on Cybernetics and Society, pages 163--165, New York, September 1975.
J. Gomes, L. Darsa, B. Costa, and L. Velho. Warping and Morphing of Graphical Objects.  Morgan Kaufmann, 1999.
T. Beier and S. Neely. Feature-based image metamorphosis. Computer Graphics (SIGGRAPH'92), 26(2):35--42, July 1992.

Bibliography
J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani.  Hierarchical model-based motion estimation. In ECCV’92,  pp. 237–252, Italy, May 1992.
M. J. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Comp. Vis. Image Understanding, 63(1):75–104, 1996.
Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pages 593–600, IEEE Computer Society, Seattle.
Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework: Part 1: The quantity approximated, the warp update rule, and the gradient descent approximation.  IJCV, 56(3), 221–255.

Bibliography
H. S. Sawhney and S. Ayer. Compact representation of videos through dominant multiple motion estimation. IEEE Trans. Patt. Anal. Mach. Intel., 18(8):814–830, Aug. 1996.
Y. Weiss. Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In CVPR’97, pp. 520–526, June 1997.
J. Y. A. Wang and E. H. Adelson.  Representing moving images with layers.  IEEE Transactions on Image Processing, 3(5):625--638, September 1994.

Bibliography
Y. Weiss and E. H. Adelson.  A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 321--326, San Francisco, California, June 1996.
Y. Weiss.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97), pages 520--526, San Juan, Puerto Rico, June 1997.
P. R. Hsu, P. Anandan, and S. Peleg.  Accurate computation of optical flow by using layered motion representations.  In Twelfth International Conference on Pattern Recognition (ICPR'94), pages 743--746, Jerusalem, Israel, October 1994. IEEE Computer Society Press

Bibliography
T. Darrell and A. Pentland.  Cooperative robust estimation using layers of support.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):474--487, May 1995.
S. X. Ju, M. J. Black, and A. D. Jepson.  Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency.  In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96), pages 307--314, San Francisco, California, June 1996.
M. Irani, B. Rousso, and S. Peleg.  Computing occluding and transparent motions.  International Journal of Computer Vision, 12(1):5--16, January 1994.
H. S. Sawhney and S. Ayer.  Compact representation of videos through dominant multiple motion estimation.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):814--830, August 1996.
M.-C. Lee et al.  A layered video object coding system using sprite and affine motion model.  IEEE Transactions on Circuits and Systems for Video Technology, 7(1):130--145, February 1997.

Bibliography
S. Baker, R. Szeliski, and P. Anandan.  A layered approach to stereo reconstruction.  In IEEE CVPR'98, pages 434--441, Santa Barbara, June 1998.
R. Szeliski, S. Avidan, and P. Anandan.  Layer extraction from multiple images containing reflections and transparency.  In IEEE CVPR'2000, volume 1, pages 246--253, Hilton Head Island, June 2000.
J. Shade, S. Gortler, L.-W. He, and R. Szeliski.  Layered depth images.  In Computer Graphics (SIGGRAPH'98) Proceedings, pages 231--242, Orlando, July 1998. ACM SIGGRAPH.
S. Laveau and O. D. Faugeras.  3-d scene representation as a collection of images.  In Twelfth International Conference on Pattern Recognition (ICPR'94), volume A, pages 689--691, Jerusalem, Israel, October 1994. IEEE Computer Society Press.
P. H. S. Torr, R. Szeliski, and P. Anandan.  An integrated Bayesian approach to layer extraction from image sequences.  In Seventh ICCV'98, pages 983--990, Kerkyra, Greece, September 1999.