Motion estimation
|
|
|
Computer Vision
CSE576, Spring 2005
Richard Szeliski |
Why estimate visual
motion?
|
|
|
|
Visual Motion can be annoying |
|
Camera instabilities, jitter |
|
Measure it; remove it
(stabilize) |
|
Visual Motion indicates
dynamics in the scene |
|
Moving objects, behavior |
|
Track objects and analyze
trajectories |
|
Visual Motion reveals spatial
layout |
|
Motion parallax |
Today’s lecture
|
|
|
Motion estimation |
|
image warping (skip: see
handout) |
|
patch-based motion (optic flow) |
|
parametric (global) motion |
|
application: image morphing |
|
advanced: layered motion models |
Readings
|
|
|
Bergen et al. Hierarchical model-based motion estimation.
ECCV’92, pp. 237–252. |
|
Szeliski, R. Image Alignment and Stitching: A Tutorial, MSR-TR-2004-92, Sec. 3.4 &
3.5. |
|
Shi, J. and Tomasi, C. (1994).
Good features to track. In CVPR’94, pp. 593–600. |
|
Baker, S. and Matthews, I.
(2004). Lucas-kanade 20 years on: A unifying framework. IJCV, 56(3), 221–255. |
Patch-based motion
estimation
Classes of Techniques
|
|
|
|
Feature-based methods |
|
Extract visual features
(corners, textured areas) and track them over multiple frames |
|
Sparse motion fields, but
possibly robust tracking |
|
Suitable especially when image
motion is large (10-s of pixels) |
|
|
|
Direct-methods |
|
Directly recover image motion
from spatio-temporal image brightness variations |
|
Global motion parameters
directly recovered without an intermediate feature motion calculation |
|
Dense motion fields, but more
sensitive to appearance variations |
|
Suitable for video and when
image motion is small (< 10 pixels) |
Patch matching
(revisited)
|
|
|
|
How do we determine
correspondences? |
|
block matching or SSD (sum
squared differences) |
The Brightness Constraint
|
|
|
Brightness Constancy Equation: |
The Brightness Constraint
|
|
|
Brightness Constancy Equation: |
Gradient Constraint (or
the Optical Flow Constraint)
Patch Translation
[Lucas-Kanade]
Local Patch Analysis
|
|
|
How certain are the motion
estimates? |
The Aperture Problem
SSD Surface – Textured
area
SSD Surface -- Edge
SSD – homogeneous area
Iterative Refinement
|
|
|
|
Estimate velocity at each pixel
using one iteration of Lucas and Kanade estimation |
|
Warp one image toward the other
using the estimated flow field |
|
(easier said than done) |
|
Refine estimate by repeating
the process |
Optical Flow: Iterative
Estimation
Optical Flow: Iterative
Estimation
Optical Flow: Iterative
Estimation
Optical Flow: Iterative
Estimation
Optical Flow: Iterative
Estimation
|
|
|
|
Some Implementation Issues: |
|
Warping is not easy (ensure
that errors in warping are smaller than the estimate refinement) |
|
Warp one image, take
derivatives of the other so you don’t need to re-compute the gradient after
each iteration. |
|
Often useful to low-pass filter
the images before motion estimation (for better derivative estimation, and
linear approximations to image intensity) |
Optical Flow: Aliasing
Slide 24
Slide 25
Parametric motion
estimation
Global (parametric)
motion models
|
|
|
|
2D Models: |
|
Affine |
|
Quadratic |
|
Planar projective transform
(Homography) |
|
|
|
3D Models: |
|
Instantaneous camera motion
models |
|
Homography+epipole |
|
Plane+Parallax |
|
|
Motion models
Example: Affine Motion
|
|
|
Substituting into the B.C.
Equation: |
Other 2D Motion Models
3D Motion Models
Patch matching
(revisited)
|
|
|
|
How do we determine
correspondences? |
|
block matching or SSD (sum
squared differences) |
Correlation and SSD
|
|
|
|
For larger displacements, do
template matching |
|
Define a small area around a
pixel as the template |
|
Match the template against each
pixel within a search area in next image. |
|
Use a match measure such as
correlation, normalized correlation, or sum-of-squares difference |
|
Choose the maximum (or minimum)
as the match |
|
Sub-pixel estimate
(Lucas-Kanade) |
Discrete Search vs.
Gradient Based
Shi-Tomasi feature
tracker
|
|
|
Find good features (min
eigenvalue of 2´2 Hessian) |
|
Use Lucas-Kanade to track with
pure translation |
|
Use affine registration with
first feature patch |
|
Terminate tracks whose
dissimilarity gets too large |
|
Start new tracks when needed |
Tracking results
Tracking - dissimilarity
Tracking results
Correlation Window Size
|
|
|
|
Small windows lead to more
false matches |
|
Large windows are better this
way, but… |
|
Neighboring flow vectors will
be more correlated (since the template windows have more in common) |
|
Flow resolution also lower
(same reason) |
|
More expensive to compute |
|
|
|
Small windows are good for
local search:
more detailed and less smooth (noisy?) |
|
Large windows good for global
search:
less detailed and smoother |
Robust Estimation
|
|
|
|
Noise distributions are often
non-Gaussian, having much heavier tails.
Noise samples from the tails are called outliers. |
|
Sources of outliers (multiple
motions): |
|
specularities / highlights |
|
jpeg artifacts / interlacing /
motion blur |
|
multiple motions (occlusion
boundaries, transparency) |
|
|
|
|
Robust Estimation
Robust Estimation
Robust Estimation
Image Morphing
Image Warping –
non-parametric
|
|
|
|
Specify more detailed warp
function |
|
|
|
|
|
|
|
Examples: |
|
splines |
|
triangles |
|
optical flow (per-pixel motion) |
|
|
Image Warping –
non-parametric
|
|
|
Move control points to specify
spline warp |
Image Morphing
|
|
|
|
How can we in-between two
images? |
|
Cross-dissolve
(all examples from [Gomes et al.’99]) |
Image Morphing
|
|
|
|
How can we in-between two
images? |
|
Warp then cross-dissolve = morph |
Warp specification
|
|
|
|
|
How can we specify the warp? |
|
Specify corresponding points |
|
interpolate to a complete
warping function |
|
Nielson, Scattered Data
Modeling, IEEE CG&A’93] |
Warp specification
|
|
|
|
|
How can we specify the warp? |
|
Specify corresponding vectors |
|
interpolate to a complete
warping function |
Warp specification
|
|
|
|
|
How can we specify the warp? |
|
Specify corresponding vectors |
|
interpolate [Beier & Neely,
SIGGRAPH’92] |
Warp specification
|
|
|
|
|
How can we specify the warp? |
|
Specify corresponding spline
control points |
|
interpolate to a complete
warping function |
Final Morph Result
Layered Scene
Representations
Motion representations
|
|
|
How can we describe this scene? |
Block-based motion
prediction
|
|
|
Break image up into square
blocks |
|
Estimate translation for each
block |
|
Use this to predict next frame,
code difference (MPEG-2) |
Layered motion
|
|
|
Break image sequence up into
“layers”: |
|
|
|
¸ = |
|
|
|
|
|
|
|
|
|
Describe each layer’s motion |
Layered motion
|
|
|
Advantages: |
|
can represent occlusions /
disocclusions |
|
each layer’s motion can be
smooth |
|
video segmentation for semantic
processing |
|
Difficulties: |
|
how do we determine the correct
number? |
|
how do we assign pixels? |
|
how do we model the motion? |
Layers for video
summarization
Background modeling
(MPEG-4)
|
|
|
Convert masked images into a
background sprite for layered video coding |
|
|
|
+ +
+ |
|
|
|
= |
|
|
|
|
What are layers?
|
|
|
[Wang & Adelson, 1994] |
|
intensities |
|
alphas |
|
velocities |
How do we form them?
How do we estimate the
layers?
|
|
|
compute coarse-to-fine flow |
|
estimate affine motion in
blocks (regression) |
|
cluster with k-means |
|
assign pixels to best fitting
affine region |
|
re-estimate affine motions in
each region… |
Layer synthesis
|
|
|
For each layer: |
|
stabilize the sequence with the
affine motion |
|
compute median value at each
pixel |
|
Determine occlusion
relationships |
Results
Bibliography
|
|
|
L. Williams. Pyramidal
parametrics. Computer Graphics,
17(3):1--11, July 1983. |
|
|
|
L. G. Brown. A survey of image
registration techniques. Computing
Surveys, 24(4):325--376, December 1992. |
|
|
|
C. D. Kuglin and D. C. Hines. The
phase correlation image alignment method.
In IEEE 1975 Conference on Cybernetics and Society, pages 163--165,
New York, September 1975. |
|
|
|
J. Gomes, L. Darsa, B. Costa,
and L. Velho. Warping and Morphing of Graphical Objects. Morgan Kaufmann, 1999. |
|
|
|
T. Beier and S. Neely. Feature-based
image metamorphosis. Computer Graphics (SIGGRAPH'92), 26(2):35--42, July
1992. |
Bibliography
|
|
|
J. R. Bergen, P. Anandan, K. J.
Hanna, and R. Hingorani. Hierarchical
model-based motion estimation. In ECCV’92,
pp. 237–252, Italy, May 1992. |
|
|
|
M. J. Black and P. Anandan. The
robust estimation of multiple motions: Parametric and piecewise-smooth flow
fields. Comp. Vis. Image Understanding, 63(1):75–104, 1996. |
|
|
|
Shi, J. and Tomasi, C. (1994).
Good features to track. In CVPR’94, pages 593–600, IEEE Computer Society,
Seattle. |
|
|
|
Baker, S. and Matthews, I.
(2004). Lucas-kanade 20 years on: A unifying framework: Part 1: The quantity
approximated, the warp update rule, and the gradient descent
approximation. IJCV, 56(3), 221–255. |
Bibliography
|
|
|
H. S. Sawhney and S. Ayer.
Compact representation of videos through dominant multiple motion estimation.
IEEE Trans. Patt. Anal. Mach. Intel., 18(8):814–830, Aug. 1996. |
|
|
|
Y. Weiss. Smoothness in layers:
Motion segmentation using nonparametric mixture estimation. In CVPR’97, pp.
520–526, June 1997. |
|
|
|
J. Y. A. Wang and E. H.
Adelson. Representing moving images
with layers. IEEE Transactions on
Image Processing, 3(5):625--638, September 1994. |
|
|
Bibliography
|
|
|
Y. Weiss and E. H.
Adelson. A unified mixture framework
for motion segmentation: Incorporating spatial coherence and estimating the
number of models. In IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR'96),
pages 321--326, San Francisco, California, June 1996. |
|
Y. Weiss. Smoothness in layers: Motion segmentation
using nonparametric mixture estimation.
In IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR'97), pages 520--526, San Juan, Puerto Rico, June 1997. |
|
P. R. Hsu, P. Anandan, and S.
Peleg. Accurate computation of optical
flow by using layered motion representations.
In Twelfth International Conference on Pattern Recognition (ICPR'94),
pages 743--746, Jerusalem, Israel, October 1994. IEEE Computer Society Press |
Bibliography
|
|
|
T. Darrell and A.
Pentland. Cooperative robust
estimation using layers of support.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
17(5):474--487, May 1995. |
|
S. X. Ju, M. J. Black, and A.
D. Jepson. Skin and bones:
Multi-layer, locally affine, optical flow and regularization with
transparency. In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR'96), pages
307--314, San Francisco, California, June 1996. |
|
M. Irani, B. Rousso, and S.
Peleg. Computing occluding and
transparent motions. International
Journal of Computer Vision, 12(1):5--16, January 1994. |
|
H. S. Sawhney and S. Ayer. Compact representation of videos through
dominant multiple motion estimation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
18(8):814--830, August 1996. |
|
M.-C. Lee et al. A layered video object coding system using
sprite and affine motion model. IEEE
Transactions on Circuits and Systems for Video Technology, 7(1):130--145,
February 1997. |
Bibliography
|
|
|
S. Baker, R. Szeliski, and P.
Anandan. A layered approach to stereo
reconstruction. In IEEE CVPR'98, pages
434--441, Santa Barbara, June 1998. |
|
R. Szeliski, S. Avidan, and P.
Anandan. Layer extraction from
multiple images containing reflections and transparency. In IEEE CVPR'2000, volume 1, pages
246--253, Hilton Head Island, June 2000. |
|
J. Shade, S. Gortler, L.-W. He,
and R. Szeliski. Layered depth
images. In Computer Graphics
(SIGGRAPH'98) Proceedings, pages 231--242, Orlando, July 1998. ACM SIGGRAPH. |
|
S. Laveau and O. D.
Faugeras. 3-d scene representation as
a collection of images. In Twelfth
International Conference on Pattern Recognition (ICPR'94), volume A, pages
689--691, Jerusalem, Israel, October 1994. IEEE Computer Society Press. |
|
P. H. S. Torr, R. Szeliski, and
P. Anandan. An integrated Bayesian
approach to layer extraction from image sequences. In Seventh ICCV'98, pages 983--990,
Kerkyra, Greece, September 1999. |