Image Alignment and Stitching
Computer Vision
CSE576, Spring 2005
Richard Szeliski

Today’s lecture
Image alignment and stitching
motion models
cylindrical and spherical warping
point-based alignment
global alignment
automated stitching (recognizing panoramas)
ghost and parallax removal
compositing and blending

Readings
Szeliski & Shum, SIGGRAPH'97
(Sections 1-4).
Szeliski, Image Alignment and Stitching, MSR-TR-2004-92 (Sections 2, 4, 5).
Recognizing Panoramas, Brown & Lowe, ICCV’2003

Motion models

Motion models
What happens when we take two images with a camera and try to align them?
translation?
rotation?
scale?
affine?
perspective?
… see interactive demo (VideoMosaic)

Motion models

Motion models

Homographies
Perspective projection of a plane
Lots of names for this:
homography, texture-map, colineation, planar projective map
Modeled as a 2D warp using homogeneous coordinates

Plane perspective mosaics
8-parameter generalization of affine motion
works for pure rotation or planar surfaces
Limitations:
local minima
slow convergence
difficult to control interactively

Rotational mosaics
Directly optimize rotation and focal length
Advantages:
ability to build full-view
panoramas
easier to control interactively
more stable and accurate
estimates

3D → 2D Perspective Projection

3D Rotation Model
Projection equations
Project from image to 3D ray
(x0,y0,z0) = (u0-uc,v0-vc,f)
Rotate the ray by camera motion
(x1,y1,z1) = R01 (x0,y0,z0)
Project back into new (source) image
(u1,v1) = (fx1/z1+uc,fy1/z1+vc)

Image Mosaics (Stitching)
[Szeliski & Shum, SIGGRAPH’97]
[Szeliski, MSR-TR-2004-92]

Image Mosaics (Stitching)

Image Mosaics (stitching)
Blend together several overlapping images into one seamless mosaic (composite)


   +        +      + =

Mosaics for Video Coding
Convert masked images into a background sprite for content-based coding
+   +     +

=

Establishing correspondences
Direct method:
Use generalization of affine motion model
[Szeliski & Shum ’97]
Feature-based method
Compute feature-based correspondence
[Lowe ICCV’99; Schmid ICCV’98,
Brown&Lowe ICCV’2003]
Compute R from correspondences
(absolute orientation)

Stitching demo

Panoramas
What if you want a 360° field of view?

Cylindrical panoramas
Steps
Reproject each image onto a cylinder
Blend
Output the resulting mosaic

Cylindrical Panoramas
Map image to cylindrical or spherical coordinates
need known focal length

Cylindrical projection
Map 3D point (X,Y,Z) onto cylinder

Cylindrical warping
Given focal length f and image center (xc,yc)

Spherical warping
Given focal length f and image center (xc,yc)

3D rotation
Rotate image before placing on unrolled sphere

Radial distortion
Correct for “bending” in wide field of view lenses

Fisheye lens
Extreme “bending” in ultra-wide fields of view

Inverse Warping
Get each pixel I0(u0) from its corresponding location u1 = h(u0) in I1(u1)

Image Stitching
Align the images over each other
camera pan ↔ translation on cylinder!
Blend the images together  (demo)

Project 2 – image stitching
Take pictures on a tripod (or handheld)
Warp images to spherical coordinates
Extract features
Align neighboring pairs using RANSAC
Write out list of neighboring translations
Correct for drift
Read in warped images and blend them
Crop the result and import into a viewer

Matching features

RAndom SAmple Consensus

RAndom SAmple Consensus

Least squares fit

Assembling the panorama
Stitch pairs together, blend, then crop

Problem:  Drift
Error accumulation
small (vertical) errors accumulate over time
apply correction so that sum = 0 (for 360° pan.)

Full-view (360° spherical) panoramas

Full-view Panoramas

Global alignment
Register all pairwise overlapping images
Use a 3D rotation model (one R per image)
Use feature based registration of unwarped images
Discover which images overlap other images using feature selection (RANSAC)
Chain together inter-frame rotations
Optimize all R estimates together (next time)

3D Rotation Model
Projection equations
Project from image to 3D ray
(x0,y0,z0) = (u0-uc,v0-vc,f)
Rotate the ray by camera motion
(x1,y1,z1) = R01 (x0,y0,z0)
Project back into new (source) image
(u1,v1) = (fx1/z1+uc,fy1/z1+vc)

Absolute orientation
[Arun et al., PAMI 1987] [Horn et al., JOSA A 1988]
Procrustes Algorithm [Golub & VanLoan]
Given two sets of matching points, compute R
pi’ = R pi     with 3D rays
pi = (xi,yi,zi) = (ui-uc,vi-vc,f)
A = Σi pi piT = Σi pi piT RT = U S VT = (U S UT) RT
VT = UT RT
R = V UT

Stitching demo

Texture Mapped Model (sphere)

Texture Mapped Model (cubical)

Recognizing Panoramas
Matthew Brown & David Lowe
ICCV’2003

Recognizing Panoramas

Finding the panoramas

Finding the panoramas

Finding the panoramas

Finding the panoramas

Fully automated 2D stitching

Get you own copy!

System components
Feature detection and description
more uniform point density
Fast matching (hash table)
RANSAC filtering of matches
Intensity-based verification
Incremental bundle adjustment
[Brown, Szeliski, Winder, CVPR’05]

Probabilistic Feature Matching

RANSAC motion model

RANSAC motion model

RANSAC motion model

Probabilistic model for verification

How well does this work?
Test on 100s of examples…

How well does this work?
Test on 100s of examples…
…still too many failures (5-10%)
for consumer application

Matching Mistakes: False Positive

Matching Mistakes: False Positive

Matching Mistakes: False Negative
Moving objects: large areas of disagreement

Matching Mistakes
Accidental alignment
repeated / similar regions
Failed alignments
moving objects / parallax
low overlap
“feature-less” regions
(more variety?)
No 100% reliable algorithm?

How can we fix these?
Tune the feature detector
Tune the feature matcher (cost metric)
Tune the RANSAC stage (motion model)
Tune the verification stage
Use “higher-level” knowledge
e.g., typical camera motions
→ Sounds like a big “learning” problem
Need a large training/test data set (panoramas)

Deghosting and blending
(optional material)

Local alignment (deghosting)
Use local optic flow to compensate for small motions [Shum & Szeliski, ICCV’98]

Local alignment (deghosting)
Use local optic flow to compensate for radial distortion [Shum & Szeliski, ICCV’98]

Image feathering
Weight each image proportional to its distance from the edge
 (distance map [Danielsson, CVGIP 1980]
Cut out the appropriate region from each image
and then blend together

Region-based de-ghosting
Select only one image in regions-of-difference using weighted vertex cover
[Uyttendaele et al., CVPR’01]

Region-based de-ghosting
Select only one image in regions-of-difference using weighted vertex cover
[Uyttendaele et al., CVPR’01]

Cutout-based de-ghosting
Select only one image per output pixel, using spatial continuity
Blend across seams using gradient continuity (“Poisson blending”)



[Agarwala et al., SG’2004]

Cutout-based compositing
Photomontage [Agarwala et al., SG’2004]
Interactively blend different images:
group portraits

Cutout-based compositing
Photomontage [Agarwala et al., SG’2004]
Interactively blend different images:
focus settings

Cutout-based compositing
Photomontage [Agarwala et al., SG’2004]
Interactively blend different images:
people’s faces

Final thought:  What is a “panorama”?
Tracking a subject
Repeated (best) shots
Multiple exposures
“Infer” what photographer wants?