|
1
|
- Computer Vision
CSE576, Spring 2005
Richard Szeliski
|
|
2
|
- Image alignment and stitching
- motion models
- cylindrical and spherical warping
- point-based alignment
- global alignment
- automated stitching (recognizing panoramas)
- ghost and parallax removal
- compositing and blending
|
|
3
|
- Szeliski & Shum, SIGGRAPH'97
(Sections 1-4).
- Szeliski, Image Alignment and Stitching, MSR-TR-2004-92 (Sections 2, 4,
5).
- Recognizing Panoramas, Brown & Lowe, ICCV’2003
|
|
4
|
|
|
5
|
- What happens when we take two images with a camera and try to align
them?
- translation?
- rotation?
- scale?
- affine?
- perspective?
- … see interactive demo (VideoMosaic)
|
|
6
|
|
|
7
|
|
|
8
|
- Perspective projection of a plane
- Lots of names for this:
- homography, texture-map, colineation, planar projective map
- Modeled as a 2D warp using homogeneous coordinates
|
|
9
|
- 8-parameter generalization of affine motion
- works for pure rotation or planar surfaces
- Limitations:
- local minima
- slow convergence
- difficult to control interactively
|
|
10
|
- Directly optimize rotation and focal length
- Advantages:
- ability to build full-view
panoramas
- easier to control interactively
- more stable and accurate
estimates
|
|
11
|
|
|
12
|
- Projection equations
- Project from image to 3D ray
- (x0,y0,z0) = (u0-uc,v0-vc,f)
- Rotate the ray by camera motion
- (x1,y1,z1) = R01 (x0,y0,z0)
- Project back into new (source) image
- (u1,v1) = (fx1/z1+uc,fy1/z1+vc)
|
|
13
|
- [Szeliski & Shum, SIGGRAPH’97]
- [Szeliski, MSR-TR-2004-92]
|
|
14
|
|
|
15
|
- Blend together several overlapping images into one seamless mosaic
(composite)
+ + …
+ =
|
|
16
|
- Convert masked images into a background sprite for content-based coding
- + + +
=
|
|
17
|
- Direct method:
- Use generalization of affine motion model
[Szeliski & Shum ’97]
- Feature-based method
- Compute feature-based correspondence
[Lowe ICCV’99; Schmid ICCV’98,
Brown&Lowe ICCV’2003]
- Compute R from correspondences
(absolute orientation)
|
|
18
|
|
|
19
|
- What if you want a 360° field
of view?
|
|
20
|
- Steps
- Reproject each image onto a cylinder
- Blend
- Output the resulting mosaic
|
|
21
|
- Map image to cylindrical or spherical coordinates
|
|
22
|
- Map 3D point (X,Y,Z) onto cylinder
|
|
23
|
- Given focal length f and image center (xc,yc)
|
|
24
|
- Given focal length f and image center (xc,yc)
|
|
25
|
- Rotate image before placing on unrolled sphere
|
|
26
|
- Correct for “bending” in wide field of view lenses
|
|
27
|
- Extreme “bending” in ultra-wide fields of view
|
|
28
|
- Get each pixel I0(u0) from its corresponding
location u1 = h(u0) in I1(u1)
|
|
29
|
- Align the images over each other
- camera pan ↔ translation on cylinder!
- Blend the images together (demo)
|
|
30
|
- Take pictures on a tripod (or handheld)
- Warp images to spherical coordinates
- Extract features
- Align neighboring pairs using RANSAC
- Write out list of neighboring translations
- Correct for drift
- Read in warped images and blend them
- Crop the result and import into a viewer
|
|
31
|
|
|
32
|
|
|
33
|
|
|
34
|
|
|
35
|
- Stitch pairs together, blend, then crop
|
|
36
|
- Error accumulation
- small (vertical) errors accumulate over time
- apply correction so that sum = 0 (for 360° pan.)
|
|
37
|
|
|
38
|
|
|
39
|
- Register all pairwise overlapping images
- Use a 3D rotation model (one R per image)
- Use feature based registration of unwarped images
- Discover which images overlap other images using feature selection
(RANSAC)
- Chain together inter-frame rotations
- Optimize all R estimates together (next time)
|
|
40
|
- Projection equations
- Project from image to 3D ray
- (x0,y0,z0) = (u0-uc,v0-vc,f)
- Rotate the ray by camera motion
- (x1,y1,z1) = R01 (x0,y0,z0)
- Project back into new (source) image
- (u1,v1) = (fx1/z1+uc,fy1/z1+vc)
|
|
41
|
- [Arun et al., PAMI 1987] [Horn et al., JOSA A 1988]
Procrustes Algorithm [Golub & VanLoan]
- Given two sets of matching points, compute R
- pi’ = R pi with
3D rays
- pi = (xi,yi,zi) = (ui-uc,vi-vc,f)
- A = Σi pi pi’T = Σi
pi piT RT = U S VT
= (U S UT) RT
- VT = UT RT
- R = V UT
|
|
42
|
|
|
43
|
|
|
44
|
|
|
45
|
- Matthew Brown & David Lowe
- ICCV’2003
|
|
46
|
|
|
47
|
|
|
48
|
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
|
|
53
|
- Feature detection and description
- more uniform point density
- Fast matching (hash table)
- RANSAC filtering of matches
- Intensity-based verification
- Incremental bundle adjustment
- [Brown, Szeliski, Winder, CVPR’05]
|
|
54
|
|
|
55
|
|
|
56
|
|
|
57
|
|
|
58
|
|
|
59
|
- Test on 100s of examples…
|
|
60
|
- Test on 100s of examples…
- …still too many failures (5-10%)
for consumer application
|
|
61
|
|
|
62
|
|
|
63
|
- Moving objects: large areas of disagreement
|
|
64
|
- Accidental alignment
- repeated / similar regions
- Failed alignments
- moving objects / parallax
- low overlap
- “feature-less” regions
(more variety?)
- No 100% reliable algorithm?
|
|
65
|
- Tune the feature detector
- Tune the feature matcher (cost metric)
- Tune the RANSAC stage (motion model)
- Tune the verification stage
- Use “higher-level” knowledge
- e.g., typical camera motions
- → Sounds like a big “learning” problem
- Need a large training/test data set (panoramas)
|
|
66
|
|
|
67
|
- Use local optic flow to compensate for small motions [Shum &
Szeliski, ICCV’98]
|
|
68
|
- Use local optic flow to compensate for radial distortion [Shum &
Szeliski, ICCV’98]
|
|
69
|
- Weight each image proportional to its distance from the edge
(distance map [Danielsson,
CVGIP 1980]
- Cut out the appropriate region from each image
and then blend together
|
|
70
|
- Select only one image in regions-of-difference using weighted vertex
cover
[Uyttendaele et al., CVPR’01]
|
|
71
|
- Select only one image in regions-of-difference using weighted vertex
cover
[Uyttendaele et al., CVPR’01]
|
|
72
|
- Select only one image per output pixel, using spatial continuity
- Blend across seams using gradient continuity (“Poisson blending”)
[Agarwala et al., SG’2004]
|
|
73
|
- Photomontage [Agarwala et al., SG’2004]
- Interactively blend different images:
group portraits
|
|
74
|
- Photomontage [Agarwala et al., SG’2004]
- Interactively blend different images:
focus settings
|
|
75
|
- Photomontage [Agarwala et al., SG’2004]
- Interactively blend different images:
people’s faces
|
|
76
|
- Tracking a subject
- Repeated (best) shots
- Multiple exposures
- “Infer” what photographer wants?
|