Computer Vision (CSE 455), Winter 2012

Project 2: Panorama Mosaic Stitching Synopsis

(Return to main page)

In this component, you will use the feature detection and matching component to combine a series of photographs into a 360° panorama. Your software will automatically align the photographs (determine their overlap and relative positions) and then blend the resulting photos into a single seamless panorama. You will then be able to view the resulting panorama inside an interactive Web viewer.

Skeleton Code

You'll need the following files for this section:

  1. The "Panorama" project skeleton and its solution executable
  2. The "Panorama" image set for testing. It will also go in your artifact.

Taking the Pictures

Each group will be checking out a panorama kit (camera, tripod, and Kaidan head). Remember to bring extra batteries with you, these cameras drain batteries.

Taking photos

Take a series of photos with a digital camera mounted on a tripod. Please read this web page explaining how to use the equipment before you go out to shoot. Then you should borrow the Kaidan head that lets you make precise rotations and the Canon PowerShot A10 camera. For best results, overlap each image by 50% with the previous one, and keep the camera level using the levelers on the Kaidan head.

Also take a series of images with a handheld camera. You can use your own or use the Canon PowerShot A10 camera that you signed up for. If you are using the Canon camera, it has a "stitch assist" mode you can use to overlap your images correctly, which only works in regular landscape mode.

Make sure the images are right side up (rotate the images by 90° if you took them in landscape mode), and reduce them to a more workable size (480x640 recommended). You can use external software such as PhotoShop or Gimp to do this. Or you may want to set the camera to 640x480 resolution from the start, by following the steps below:

  1. Turn the mode dial on the back of the camera to one of the 3 shooting modes--auto (camera icon), manual (camera icon + M) or stitch assist (overlaid rectangles).
  2. Press MENU button.
  3. Press the left/right arrow to choose Resolution, then press SET.
  4. Press the left/right arrow and choose S (640x480).
  5. Press MENU again.

Camera Parameters

The following focal lengths are valid only if the camera is zoomed out all the way:

Camera Resolution Focal Length k1 k2
"Panorama" test images 384x512 595.00000 pixels -0.15000 0.00000
Canon Powershot A10, tag CS30012716 480x640 678.21239 pixels -0.21001 0.26169
Canon Powershot A10, tag CS30012717 480x640 677.50487 pixels -0.20406 0.23276
Canon Powershot A10, tag CS30012718 480x640 676.48417 pixels -0.20845 0.25624
Canon Powershot A10, tag CS30012927 480x640 671.16649 pixels -0.19270 0.30098
Canon Powershot A10, tag CS30012928 480x640 674.82258 pixels -0.21528 0.30098
Canon Powershot A10, tag CS30012929 480x640 674.79106 pixels -0.21483 0.32286

If you are using your own camera, you have to estimate the focal length. The simplest way to do this is through the EXIF tags of the images, as described by Noah Snavely (a previous TA). Alternatively, you can use a camera calibration toolkit to get more precise focal length and radial distortion coefficients. Finally, Brett Allen describes one creative way to measure rough focal length using just a book and a box.

Image formatting

To Do

Note: The skeleton code includes an image library, ImageLib, that is fairly general and complex. It is NOT necessary for you to peek extensively into this library! We have created some notes for you here.

  1. Warp each image into spherical coordinates. (file: WarpSpherical.cpp, routine: warpSphericalField)

    [TODO] Compute the inverse map to warp the image by filling in the skeleton code in the warpSphericalField routine to:

    1. convert the given spherical image coordinate into the corresponding planar image coordinate using the coordinate transformation equation from the lecture notes
    2. apply radial distortion using the equation from the lecture notes

    You will have to use the focal length f estimates for the half-resolution images provided above. You can either take pictures and save them in small files or save them in large files and reduce them afterwards. If you use a different image size, remember to scale f according to the image size.

  2. Compute the alignment of the images in pairs. (file: FeatureAlign.cpp, routines: alignPair, countInliers, and leastSquaresFit)

    To do this, you will have to implement a feature-based translational motion estimation. The skeleton for this code is provided in FeatureAlign.cpp. The main routines that you will be implementing are:

    int alignPair(const FeatureSet &f1, const FeatureSet &f2, const vector &matches, MotionModel m, float f, int nRANSAC, double RANSACthresh, CTransform3x3& M);

    int countInliers(const FeatureSet &f1, const FeatureSet &f2, const vector &matches, MotionModel m, float f, CTransform3x3 M, double RANSACthresh, vector &inliers);

    int leastSquaresFit(const FeatureSet &f1, const FeatureSet &f2, const vector &matches, MotionModel m, float f, const vector &inliers, CTransform3x3& M);

    AlignPair takes two feature sets, f1 and f2, the list of feature matches obtained from the feature detecting and matching component (described in the first part of the project), a motion model (described below), and estimates and inter-image transform matrix M. For this project, the enum MotionModel only takes on the value eTranslate.

    AlignPair uses RANSAC (RAndom SAmpling Consensus) to pull out a minimal set of feature matches (one match for this project), estimates the corresponding motion (alignment) and then invokes countInliers to count how many of the feature matches agree with the current motion estimate. After repeated trials, the motion estimate with the largest number of inliers is used to compute a least squares estimate for the motion, which is then returned in the motion estimate M.

    CountInliers computes the number of matches that have a distance below RANSACthresh is computed. It also returns a list of inlier match ids.

    LeastSquaresFit computes a least squares estimate for the translation using all of the matches previously estimated as inliers. It returns the resulting translation estimate in the last column of M.

    [TODO] You will have to fill in the missing code in alignPair to:

    • Randomly select a valid matching pair and compute the translation between the two feature locations.
    • Call countInliers to count how many matches agree with this estimate.
    • Repeat the above random selection nRANSAC times and keep the estimate with the largest number of inliers.
    • Write the body of countInliers to count the number of feature matches where the SSD distance after applying the estimated transform (i.e. the distance from the match to its correct position in the image) is below the threshold. Don't forget to create the list of inlier ids.
    • Write the body of leastSquaresFit, which for the simple translational case is just the average displacement between the matching feature positions.
  3. Stitch and crop the resulting aligned images. (file: BlendImages.cpp, routines: BlendImages, AccumulateBlend, NormalizeBlend)

    [TODO] Given the warped images and their relative displacements, figure out how large the final stitched image will be and their absolute displacements in the panorama (BlendImages).

    [TODO] Then, resample each image to its final location and blend it with its neighbors (AccumulateBlend, NormalizeBlend). Try a simple feathering function as your weighting function (see mosaics lecture slide on "feathering") (this is a simple 1-D version of the distance map described in Szeliski & Shum '97). For extra credit, you can try other blending functions or figure out some way to compensate for exposure differences. In NormalizeBlend, remember to set the alpha channel of the resultant panorama to opaque!

    [TODO] Crop the resulting image to make the left and right edges seam perfectly (BlendImages). The horizontal extent can be computed in the previous blending routine since the first image occurs at both the left and right end of the stitched sequence (draw the "cut" line halfway through this image). Use a linear warp to the mosaic to remove any vertical "drift" between the first and last image. This warp, of the form y' = y + ax, should transform the y coordinates of the mosaic such that the first image has the same y-coordinate on both the left and right end. Calculate the value of a needed to perform this transformation.

Creating the Panorama

Use the program you wrote to warp, align, and stitch images into the resulting panorama.

You may also refer to the file stitch2.txt provided along with the skeleton code for the appropriate command line syntax. This command-line interface allows you to debug each stage of the program independently.

Debugging Guidelines

You can use the test results included in the "Panorama" set to check whether your program is running correctly. Comparing your output to that of the sample solution is also a good way of debugging your program.

Testing the warping routines:

Testing the alignment routines:

Testing the blending routines:

Extra Credit

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We're always interested in seeing new, unanticipated ways to use this program!