|
1
|
- Computer Vision
CSE576, Spring 2005
Richard Szeliski
|
|
2
|
- Visual Motion can be annoying
- Camera instabilities, jitter
- Measure it; remove it (stabilize)
- Visual Motion indicates dynamics in the scene
- Moving objects, behavior
- Track objects and analyze trajectories
- Visual Motion reveals spatial layout
|
|
3
|
- Motion estimation
- image warping (skip: see handout)
- patch-based motion (optic flow)
- parametric (global) motion
- application: image morphing
- advanced: layered motion models
|
|
4
|
- Bergen et al. Hierarchical
model-based motion estimation. ECCV’92,
pp. 237–252.
- Szeliski, R. Image Alignment and
Stitching: A Tutorial,
MSR-TR-2004-92, Sec. 3.4 & 3.5.
- Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pp.
593–600.
- Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying
framework. IJCV, 56(3), 221–255.
|
|
5
|
|
|
6
|
- Feature-based methods
- Extract visual features (corners, textured areas) and track them over
multiple frames
- Sparse motion fields, but possibly robust tracking
- Suitable especially when image motion is large (10-s of pixels)
- Direct-methods
- Directly recover image motion from spatio-temporal image brightness
variations
- Global motion parameters directly recovered without an intermediate
feature motion calculation
- Dense motion fields, but more sensitive to appearance variations
- Suitable for video and when image motion is small (< 10 pixels)
|
|
7
|
- How do we determine correspondences?
- block matching or SSD (sum squared differences)
|
|
8
|
- Brightness Constancy Equation:
|
|
9
|
- Brightness Constancy Equation:
|
|
10
|
|
|
11
|
|
|
12
|
- How certain are the motion estimates?
|
|
13
|
|
|
14
|
|
|
15
|
|
|
16
|
|
|
17
|
- Estimate velocity at each pixel using one iteration of Lucas and Kanade
estimation
- Warp one image toward the other using the estimated flow field
- Refine estimate by repeating the process
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
- Some Implementation Issues:
- Warping is not easy (ensure that errors in warping are smaller than the
estimate refinement)
- Warp one image, take derivatives of the other so you don’t need to
re-compute the gradient after each iteration.
- Often useful to low-pass filter the images before motion estimation
(for better derivative estimation, and linear approximations to image
intensity)
|
|
23
|
|
|
24
|
|
|
25
|
|
|
26
|
|
|
27
|
- 2D Models:
- Affine
- Quadratic
- Planar projective transform (Homography)
- 3D Models:
- Instantaneous camera motion models
- Homography+epipole
- Plane+Parallax
|
|
28
|
|
|
29
|
- Substituting into the B.C. Equation:
|
|
30
|
|
|
31
|
|
|
32
|
- How do we determine correspondences?
- block matching or SSD (sum squared differences)
|
|
33
|
- For larger displacements, do template matching
- Define a small area around a pixel as the template
- Match the template against each pixel within a search area in next
image.
- Use a match measure such as correlation, normalized correlation, or
sum-of-squares difference
- Choose the maximum (or minimum) as the match
- Sub-pixel estimate (Lucas-Kanade)
|
|
34
|
|
|
35
|
- Find good features (min eigenvalue of 2´2 Hessian)
- Use Lucas-Kanade to track with pure translation
- Use affine registration with first feature patch
- Terminate tracks whose dissimilarity gets too large
- Start new tracks when needed
|
|
36
|
|
|
37
|
|
|
38
|
|
|
39
|
- Small windows lead to more false matches
- Large windows are better this way, but…
- Neighboring flow vectors will be more correlated (since the template
windows have more in common)
- Flow resolution also lower (same reason)
- More expensive to compute
- Small windows are good for local search:
more detailed and less smooth (noisy?)
- Large windows good for global search:
less detailed and smoother
|
|
40
|
- Noise distributions are often non-Gaussian, having much heavier
tails. Noise samples from the
tails are called outliers.
- Sources of outliers (multiple motions):
- specularities / highlights
- jpeg artifacts / interlacing / motion blur
- multiple motions (occlusion boundaries, transparency)
|
|
41
|
|
|
42
|
|
|
43
|
|
|
44
|
|
|
45
|
- Specify more detailed warp function
- Examples:
- splines
- triangles
- optical flow (per-pixel motion)
|
|
46
|
- Move control points to specify spline warp
|
|
47
|
- How can we in-between two images?
- Cross-dissolve
(all examples from [Gomes et al.’99])
|
|
48
|
- How can we in-between two images?
- Warp then cross-dissolve = morph
|
|
49
|
- How can we specify the warp?
- Specify corresponding points
- interpolate to a complete warping function
- Nielson, Scattered Data Modeling, IEEE CG&A’93]
|
|
50
|
- How can we specify the warp?
- Specify corresponding vectors
- interpolate to a complete warping function
|
|
51
|
- How can we specify the warp?
- Specify corresponding vectors
- interpolate [Beier & Neely, SIGGRAPH’92]
|
|
52
|
- How can we specify the warp?
- Specify corresponding spline control points
- interpolate to a complete warping function
|
|
53
|
|
|
54
|
|
|
55
|
- How can we describe this scene?
|
|
56
|
- Break image up into square blocks
- Estimate translation for each block
- Use this to predict next frame, code difference (MPEG-2)
|
|
57
|
- Break image sequence up into “layers”:
- ¸ =
- Describe each layer’s motion
|
|
58
|
- Advantages:
- can represent occlusions / disocclusions
- each layer’s motion can be smooth
- video segmentation for semantic processing
- Difficulties:
- how do we determine the correct number?
- how do we assign pixels?
- how do we model the motion?
|
|
59
|
|
|
60
|
- Convert masked images into a background sprite for layered video coding
- + + +
=
|
|
61
|
- [Wang & Adelson, 1994]
- intensities
- alphas
- velocities
|
|
62
|
|
|
63
|
- compute coarse-to-fine flow
- estimate affine motion in blocks (regression)
- cluster with k-means
- assign pixels to best fitting affine region
- re-estimate affine motions in each region…
|
|
64
|
- For each layer:
- stabilize the sequence with the affine motion
- compute median value at each pixel
- Determine occlusion relationships
|
|
65
|
|
|
66
|
- L. Williams. Pyramidal parametrics.
Computer Graphics, 17(3):1--11, July 1983.
- L. G. Brown. A survey of image registration techniques. Computing Surveys, 24(4):325--376,
December 1992.
- C. D. Kuglin and D. C. Hines. The phase correlation image alignment
method. In IEEE 1975 Conference
on Cybernetics and Society, pages 163--165, New York, September 1975.
- J. Gomes, L. Darsa, B. Costa, and L. Velho. Warping and Morphing of
Graphical Objects. Morgan
Kaufmann, 1999.
- T. Beier and S. Neely. Feature-based image metamorphosis. Computer
Graphics (SIGGRAPH'92), 26(2):35--42, July 1992.
|
|
67
|
- J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani. Hierarchical model-based motion
estimation. In ECCV’92, pp.
237–252, Italy, May 1992.
- M. J. Black and P. Anandan. The robust estimation of multiple motions:
Parametric and piecewise-smooth flow fields. Comp. Vis. Image
Understanding, 63(1):75–104, 1996.
- Shi, J. and Tomasi, C. (1994). Good features to track. In CVPR’94, pages
593–600, IEEE Computer Society, Seattle.
- Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifying
framework: Part 1: The quantity approximated, the warp update rule, and
the gradient descent approximation.
IJCV, 56(3), 221–255.
|
|
68
|
- H. S. Sawhney and S. Ayer. Compact representation of videos through
dominant multiple motion estimation. IEEE Trans. Patt. Anal. Mach.
Intel., 18(8):814–830, Aug. 1996.
- Y. Weiss. Smoothness in layers: Motion segmentation using nonparametric
mixture estimation. In CVPR’97, pp. 520–526, June 1997.
- J. Y. A. Wang and E. H. Adelson.
Representing moving images with layers. IEEE Transactions on Image Processing,
3(5):625--638, September 1994.
|
|
69
|
- Y. Weiss and E. H. Adelson. A
unified mixture framework for motion segmentation: Incorporating spatial
coherence and estimating the number of models. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR'96), pages 321--326, San
Francisco, California, June 1996.
- Y. Weiss. Smoothness in layers:
Motion segmentation using nonparametric mixture estimation. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR'97), pages 520--526, San
Juan, Puerto Rico, June 1997.
- P. R. Hsu, P. Anandan, and S. Peleg.
Accurate computation of optical flow by using layered motion
representations. In Twelfth
International Conference on Pattern Recognition (ICPR'94), pages
743--746, Jerusalem, Israel, October 1994. IEEE Computer Society Press
|
|
70
|
- T. Darrell and A. Pentland.
Cooperative robust estimation using layers of support. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 17(5):474--487, May 1995.
- S. X. Ju, M. J. Black, and A. D. Jepson.
Skin and bones: Multi-layer, locally affine, optical flow and
regularization with transparency.
In IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR'96), pages 307--314, San Francisco,
California, June 1996.
- M. Irani, B. Rousso, and S. Peleg.
Computing occluding and transparent motions. International Journal of Computer
Vision, 12(1):5--16, January 1994.
- H. S. Sawhney and S. Ayer.
Compact representation of videos through dominant multiple motion
estimation. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 18(8):814--830, August 1996.
- M.-C. Lee et al. A layered video
object coding system using sprite and affine motion model. IEEE Transactions on Circuits and
Systems for Video Technology, 7(1):130--145, February 1997.
|
|
71
|
- S. Baker, R. Szeliski, and P. Anandan.
A layered approach to stereo reconstruction. In IEEE CVPR'98, pages 434--441, Santa
Barbara, June 1998.
- R. Szeliski, S. Avidan, and P. Anandan.
Layer extraction from multiple images containing reflections and
transparency. In IEEE CVPR'2000,
volume 1, pages 246--253, Hilton Head Island, June 2000.
- J. Shade, S. Gortler, L.-W. He, and R. Szeliski. Layered depth images. In Computer Graphics (SIGGRAPH'98)
Proceedings, pages 231--242, Orlando, July 1998. ACM SIGGRAPH.
- S. Laveau and O. D. Faugeras. 3-d
scene representation as a collection of images. In Twelfth International Conference on
Pattern Recognition (ICPR'94), volume A, pages 689--691, Jerusalem,
Israel, October 1994. IEEE Computer Society Press.
- P. H. S. Torr, R. Szeliski, and P. Anandan. An integrated Bayesian approach to
layer extraction from image sequences.
In Seventh ICCV'98, pages 983--990, Kerkyra, Greece, September
1999.
|