CSE455 Winter 2008 Project 2

Computer Vision (CSE 455), Winter 2008

Project 2: Panorama Mosaic Stitching Using Feature Detection and Matching

Assigned: Thursday, January 24, 2008
Due: Wednesday, Februrary 6, 2008 (by 11:59pm)
Demo: Thursday, February 7, 2008
Artifact Due: Friday, Februrary 8, 2008 (by 11:59pm)

Project Head TA: Ryan Kaminsky (send your questions here first!)
Project Secondary TA: Noah Snavely

Overall Synopsis

In this project, you will implement a system to combine a series of photographs into a 360° panorama (see panorama below). Your software will detect discriminating features in the images, find the best matching features in the other images, automatically align the photographs (determine their overlap and relative positions) and then blend the resulting photos into a single seamless panorama. You will then be able to view the resulting panorama inside an interactive Web viewer. To start your project, you will be supplied with some test images and skeleton code you can use as the basis of your project and instructions on how to use the viewer.

Because this project is more extensive you can work in groups of two. Use this link to register your group on the grouper tool.

Panorama by Loren Meritt

This project can be thought of as two major components:

Feature Detection and Maching
Panorama Mosaic Stitching

The project will consist of a pipeline of EXEs that will operate on images or intermediate results to produce the final panorama output. A complete description of both components follows, but first we'll describe what to turn in.

What to Turn In

In addition to your source code and executables, turn in a web page describing your approach and results. In particular:

Feature Detection and Matching

Describe your feature descriptor in enough detail that some one could implement it from your write-up
Explain why you made the major design choices that you did
Report the performance on the provided benchmark image sets
Compare the performance of the simple window descriptor, your feature descriptor, and SIFT features (provided)
Describe strengths and weaknesses
Take some images yourself and show the performance (include some pictures on your web page!)
Describe any extra credit items that you did (if applicable)

Panorama Mosaic Stitching

This portion of the web page should containin the following:

At least three panoramas: (1) the test sequence, (2), one from the Kaidan head, and (3) one from a hand-held sequence. Each panorama should be shown as (1) a low-res inlined image on the web page, (2) a link that you can click on to show the full-resolution .jpg file, AND (3) embedded in a viewer as described above.
A short description of what worked well and what didn’t. If you tried several variants or did something non-standard, please describe this as well.
Describe any extra credit.

The web-page should be placed in the project2/artifact directory along with all the images in JPEG format. If you are unfamiliar with HTML you can use any web-page editor such as FrontPage, Word, or Visual Studio 7.0 to make your web-page. The KompoZer HTML editor is easy to use and highly recommended. Here are some webpage design tips.

Feature Detection and Matching Synopsis

In this component, you will write code to detect discriminating features in an image and find the best matching features in other images. Because features should be reasonably invariant to translation, rotation, illumination, and scale, you'll use the Multi-Scale Oriented Patch (MOPS) descriptor and you'll evaluate its performance on a suite of benchmark images. As part of the extra credit you'll have the option of creating your own feature descriptors. If there are enough entries we'll rank the performance of features that students in the class come up with, and compare them with the current state-of-the-art.

For the second part of the assignment, you will apply your features to automatically stitch images into a panorama.

To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images. We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.

Description

This component has three parts: feature detection, description, and matching..

Feature detection

In this step, you will identify points of interest in the image using the Harris corner detection method. The steps are as follows (see the lecture slides/readings for more details) For each point in the image, consider a window of pixels around that point. Compute the Harris matrix H for that point, defined as

where the summation is over all pixels p in the window. The weights should be chosen to be circularly symmetric (for rotation invariance). A common choice is to use a 3x3 or 5x5 Gaussian mask.

Note that H is a 2x2 matrix. To find interest points, first compute the corner strength function

Once you've computed c for every point in the image, choose points where c is above a threshold. You also want c to be a local maximum in at least a 3x3 neighborhood.

Feature description

Now that you've identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point. This descriptor will be the representation you'll use to compare features in different images to see if they match.

For this project you'll use the MOPS descriptor.

<Insert MOPS information here>.

Feature matching

Now that you've detected and described your features, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images. This part of the feature detection and matching component is mainly designed to help you test out your feature descriptor. You will implement a more sophisticated feature matching mechanism in the second component when you do the actual image alignment for the panorama.

The simplest approach is the following: write a procedure that compares two features and outputs a score saying how well they match. For example, you could simply sum the absolute value of differences between the descriptor elements. Use this to compute the best match between one feature and a set of other features by evaluating the score for every candidate match. You can optionally explore faster matching algorithms for extra credit.

Your routine should return NULL if there is no good match in the other image(s). This requires that you make a binary decision as to whether a match is good or not. Implement two methods to solve this problem:

1. use a threshold on the match score
2. compute (score of the best feature match)/(score of the second best feature match), and threshold that

Testing

Now you're ready to go! Using the UI and skeleton code that we provide, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes.

We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring). For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches. This is done using a routine that we supply in the skeleton code.

You should also test the matching against the images you will take for your panorama (described in next component).

Skeleton Code

Follow these steps to get started quickly:

Install FLTK.
If you unzip FLTK to somewhere other than C:\, you'll have to change the project settings to look for the include and library files in the correct location. If you're using Linux, you don't need to download FLTK, since you can just use the libraries in uns/lib/.
Download the skeleton code here.
This code should work under both Windows and Linux.
Download some image sets: graf, leuven, bikes, wall
Included with these images are some SIFT feature files and image database files.

After compiling and linking the skeleton code, you will have an executable cse455. This can be run in several ways:

cse455
with no command line options starts the GUI. Inside the GUI, you can load a query image and its corresponding feature file, as well as an image database file, and search the database for the image which best matches the query features. You can use the mouse buttons to select a subset of the features to use in the query.

Until you write your feature matching routine, the features are matched by minimizing the Euclidean distance between feature vectors.
cse455 computeFeatures imagefile featurefile [featuretype]
uses your feature detection routine to compute the features for imagefile, and writes them to featurefile. featuretype specifies which of your types of features (if you choose to implement another feature for extra credit) to compute.
cse455 testMatch featurefile1 featurefile2 homographyfile [matchtype]
uses your feature matching routine to match the features in featurefile1 with the features in featurefile2. homographyfile contains the correct transformation between the points in the two images, specified by a 3-by-3 matrix. matchtype specifies which of your (at least two) types of matching algorithms to use.
cse455 testSIFTMatch featurefile1 featurefile2 homographyfile [matchtype]
is the same as above, but uses the SIFT file format.
cse455 benchmark imagedir [featuretype matchtype]
tests your feature finding and matching for all of the images in one of the four above sets. imagedir is the directory containing the image (and homography) files. This command will return the average pixel error when matching the first image in the set with each of the other five images.

To Do

We have given you a number of classes and methods to help get you started. The only code you need to write is for your feature detection methods and your feature matching methods. Then, you should modify computeFeatures and matchFeatures in the file features.cpp to call the methods you have written.

Extra Credit

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions as well!

Use a fast search algorithm to speed up the matching process. You can use code from the web or write your own (with extra credit proportional to effort). Some possibilities in rough order of difficulty: k-d trees (code available here), wavelet indexing (approach from lecture), locality-sensitive hashing.
Try implementing a better feature descriptor. You can define it however you want, but you should design it to be robust to changes in position, orientation, and illumination. You are welcome to use techniques described in lecture (e.g., detecting dominant orientations, using image pyramids), or come up with your own ideas.
Make your feature detector scale invariant.
Implement a method that outperforms the above ratio test for deciding if a feature is a valid match.
Implement sub-pixel refinement of feature positions (see Rick's CVPR 05 paper)
Implement adaptive non-maximum suppression (see Rick's CVPR 05 paper)
Try the same idea on video (maybe a final project idea?)

Panorama Mosaic Stitching Synopsis

In this component, you will use the feature detection and matching component to combine a series of photographs into a 360° panorama. Your software will automatically align the photographs (determine their overlap and relative positions) and then blend the resulting photos into a single seamless panorama. You will then be able to view the resulting panorama inside an interactive Web viewer. To start this component, you will be supplied with some test images and skeleton code that will guide you.

Getting Things to Run
Taking the Pictures
ToDo
Creating the Panorama
Debugging
Extra Credit
Panorama Links

Getting Things to Run

Running the sample solution

Project2.exe is a command line program that requires arguments to work properly. Thus you need to run it from the command line, or from a shortcut to the executable that has the arguments specified in the "Target" field of the shortcut properties.

Running from the command line

To run from the command line, click the Windows Start button and select "Run". Then enter "cmd" in the "Run" dialog and click "OK". A command window will pop up where you can type DOS commands. Use the DOS "cd" (change directory) command to navigate to the directory where Project2.exe is located. Then type "project2" followed by your arguments. If you do not supply any arguments, the program will print out information on what arguments it expects.

Running from a shortcut

Another way to pass arguments to a program is to create a shortcut to it. To create a shortcut, right-click on the executable and drag to the location where you wish to place the shortcut. A menu will pop up when you let go of the mouse button. From the menu, select "Create Shortcut Here". Now right-click on the short-cut you've created and select "Properties". In the properties dialog select the "Shortcut" tab and add your arguments after the text in the "Target" field. Your arguments must be outside of the quotation marks and separated with spaces.

Running the skeleton program

You can run the skeleton program from inside Visual Studio, just like you could with the last project. However, you will need to tell Visual Studio what arguments to pass. Here's how:

Select the "ImageLib" project in the Solution Explorer (do NOT select the "project2" project, for some reason this won't work).
From the "Project" menu choose "Properties" to bring up the "Property Pages" dialog.
Select the "Debugging" Property page.
Enter your arguments in the "Command Arguments" field.
Click "Ok".
Now when you execute your program from within Visual Studio the arguments you entered will be passed to it automatically.

Taking the Pictures

You will be checking out equipment (camera, tripod, and Kaidan head) in groups two groups (four individuals total). Each group is responsible for writing all code on their own, but only one artifact need be turned in per group. Remember to bring extra batteries with you, these cameras drain batteries.

Skip this step for the test data. Its camera parameters can be found in the sample commands in stitch2.txt, which is provided along with the skeleton code.

Take a series of photos with a digital camera mounted on a tripod. Here is a web page explaining how to use the equipment. Please read it before you go out to shoot. Then you should borrow the Kaidan head that lets you make precise rotations and the Canon PowerShot A10 camera for this purpose. For best results, overlap each image by 50% with the previous one, and keep the camera level using the levelers on the Kaidan head.
Also take a series of images with a handheld camera. You can use your own or use the Canon PowerShot A10 camera that you signed up for. If you are using the Canon camera, it has a “stitch assist” mode you can use to overlap your images correctly, which only works in regular landscape mode. If you are using your own camera, you have to estimate the focal length (Brett Allen describes one creative way to measure rough focal length using just a book and a box, or alternatively use a camera calibration toolkit to get precise focal length and radial distortion coefficients). Noah Snavely (one of your TAs) also describes a way to determine the parameters here. The parameters for the class cameras are given below. The following focal length is valid only if the camera is zoomed out all the way.

Camera	resolution	focal length	k1	k2
Canon Powershot A10, tag CS30012716	480x640	678.21239 pixels	-0.21001	0.26169
Canon Powershot A10, tag CS30012717	480x640	677.50487 pixels	-0.20406	0.23276
Canon Powershot A10, tag CS30012718	480x640	676.48417 pixels	-0.20845	0.25624
Canon Powershot A10, tag CS30012927	480x640	671.16649 pixels	-0.19270	0.14168
Canon Powershot A10, tag CS30012928	480x640	674.82258 pixels	-0.21528	0.30098
Canon Powershot A10, tag CS30012929	480x640	674.79106 pixels	-0.21483	0.32286
test images	384x512	595 pixels	-0.15	0.0

Make sure the images are right side up (rotate the images by 90° if you took them in landscape mode), and reduce them to a more workable size (480x640 recommended). You can use external software such as PhotoShop or the Microsoft Photo Editor to do this. Or you may want to set the camera to 640x480 resolution from the start, by following the steps below:

Turn the mode dial on the back of the camera to one of the 3 shooting modes--auto (camera icon), manual (camera icon + M) or stitch assist (overlaid rectangles).
Press MENU button.
Press the left/right arrow to choose Resolution, then press SET.
Press the left/right arrow and choose S (640x480).
Press MENU again.

(Note: If you are using the skeleton software, save your images in (TrueVision) Targa format (.tga), since this is the only format the skeleton software can currently read. Also make sure the aspect ratio of the image (width vs. height) is either 4:3 or 3:4 (480x640 will do) which is the only aspect ratio supported by the skeleton software.)

ToDo

Note: The skeleton code includes an image library, ImageLib, that is fairly general and complex. It is NOT necessary for you to peek extensively into this library! We have created some notes for you here.

Warp each image into spherical coordinates. (file: WarpSpherical.cpp, routine: warpSphericalField)

[TODO] Compute the inverse map to warp the image by filling in the skeleton code in the warpSphericalField routine to:

convert the given spherical image coordinate into the corresponding planar image coordinate using the coordinate transformation equation from the lecture notes
apply radial distortion using the equation from the lecture notes

(Note: You will have to use the focal length f estimates for the half-resolution images provided above (you can either take pictures and save them in small files or save them in large files and reduce them afterwards) . If you use a different image size, do remember to scale f according to the image size.)

Compute the alignment of the images in pairs. (file: FeatureAlign.cpp, routines: alignPair, countInliers, and leastSquaresFit)

To do this, you will have to implement a feature-based translational motion estimation. The skeleton for this code is provided in FeatureAlign.cpp. The main routines that you will be implementing are:

int alignPair(const FeatureSet &f1, const FeatureSet &f2, const vector<FeatureMatch> &matches, MotionModel m, float f, int nRANSAC, double RANSACthresh, CTransform3x3& M);

int countInliers(const FeatureSet &f1, const FeatureSet &f2, const vector<FeatureMatch> &matches, MotionModel m, float f, CTransform3x3 M, double RANSACthresh, vector<int> &inliers);

int leastSquaresFit(const FeatureSet &f1, const FeatureSet &f2, const vector<FeatureMatch> &matches, MotionModel m, float f, const vector<int> &inliers, CTransform3x3& M);

AlignPair takes two feature sets, f1 and f2, the list of feature matches obtained from the feature detecting and matching component (described above), and a motion model (described below), and estimates and inter-image transform matrix M. It is therefore similar to the evaluateMatch function in Project 1, except that now the transformation is being computed rather than evaluated. For this project, the enum MotionModel only takes on the value eTranslate.

AlignPair uses RANSAC (RAndom SAmpling Consensus) to pull out a minimal set of feature matches (one match for this project), estimates the corresponding motion (alignment) and then invokes countInliers to count how many of the feature matches agree with the current motion estimate. After repeated trials, the motion estimate with the largest number of inliers is used to compute a least squares estimate for the motion, which is then returned in the motion estimate M.

CountInliers is similar to evaluateMatch except that rather than computing the average Euclidean distance, the number of matches that have a distance below RANSACthresh is computed. It also returns a list of inlier match ids.

LeastSquaresFit computes a least squares estimate for the translation using all of the matches previously estimated as inliers. It returns the resulting translation estimate in the last column of M.

[TODO] You will have to fill in the missing code in alignPair to:

Randomly select a valid matching pair and compute the translation between the two feature locations.
Call countInliers to count how many matches agree with this estimate.
Repeat the above random selection nRANSAC times and keep the estimate with the largest number of inliers.
Write the body of countInliers to count the number of feature matches where the Euclidean distance after applying the estimated transform is below the threshold. (Use the code in evaluateMatch as a guide, and don’t forget to create the list of inlier ids.)
Write the body of leastSquaresFit, which for the simple translational case is just the average displacement between the matching feature positions.

Stitch and crop the resulting aligned images. (file: BlendImages.cpp, routines: BlendImages, AccumulateBlend, NormalizeBlend)

[TODO] Given the warped images and their relative displacements, figure out how large the final stitched image will be and their absolute displacements in the panorama (BlendImages).

[TODO] Then, resample each image to its final location and blend it with its neighbors (AccumulateBlend, NormalizeBlend). Try a simple feathering function as your weighting function (see mosaics lecture slide on "feathering") (this is a simple 1-D version of the distance map described in [Szeliski & Shum]). For extra credit, you can try other blending functions or figure out some way to compensate for exposure differences. In NormalizeBlend, remember to set the alpha channel of the resultant panorama to opaque!

[TODO] Crop the resulting image to make the left and right edges seam perfectly (BlendImages). The horizontal extent can be computed in the previous blending routine since the first image occurs at both the left and right end of the stitched sequence (draw the “cut” line halfway through this image). Use a linear warp to the mosaic to remove any vertical “drift” between the first and last image. This warp, of the form y' = y + ax, should transform the y coordinates of the mosaic such that the first image has the same y-coordinate on both the left and right end. Calculate the value of 'a' needed to perform this transformation.

Creating the Panorama

Use the above program you wrote to warp/align/stitch images into the resulting panorama.

To remove the radial distortion and warp the image input1.tga into spherical coordinate with focal length = 600, radial distortion coefficients k1=-0.21 and k2=0.25 (project2 is the name of the program):

project2 sphrWarp input1.tga warp1.tga 600 -0.21 0.25
Then, use the feature detecting and matching component to compute the features in the warped images. To align two feature sets warp1.f and warp2.f using 200 iterations of RANSAC with an outlier threshold distance of 1 pixel:

project2 alignPair warp1.f warp2.f 200 1
Run the previous step for all adjacent pairs of images and save the output into a separate file pairlist.txt which may look like this:

    warp1.tga warp2.tga 213.49 -5.12
    warp2.tga warp3.tga 208.19 2.82
    ......
    warp9.tga warp1.tga 194.76 -3.88
Then stitch the images into the final panorama pano.tga:

project2 blendPairs pairlist.txt pano.tga

You may also refer to the file

stitch2.txt provided along with the skeleton code for the appropriate command line syntax. This command-line interface allows you to debug each stage of the program independently.

Convert your resulting image to a JPEG and paste it on a Web page along with code to run the interactive viewer. Click here for instructions on how to do this.

Debugging Guidelines

You can use the test results included in the images/ folder to check whether your program is running correctly. Comparing your output to that of the sample solution is also a good way of debugging your program.

Testing the warping routines:

In the images/ folder in the skeleton code, a few example warped images are provided for test purposes. The camera parameters used for these examples can be found in the sample command file stitch2.txt. See if your program produces the same output.
You may also test with different input images and/or camera parameter values by comparing the results with those of the sample solution.

Testing the alignment routines:

A few example alignment results are provided in the file pairlist2/4.txt. The corresponding shell commands can be found in stitch2/4.txt.
To test alignPair only, try passing in an image that has been cropped with two different rectangles (and maybe rotated by a tiny amount, say 2 degrees).

Testing the blending routines:

An example panorama is included in the images/ folder. Compare the resulting panorama with this image.
You may also test with other panoramas by running the sample solution on different inputs.

Extra Credit

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We're always interested in seeing new, unanticipated ways to use this program!

Although the feature-based aligner gives sub-pixel motion estimation (because of least squares), the motion vectors are rounded to integers when blending the images into the mosaic in BlendImages.cpp. Try to blend images with sub-pixel localization.
Sometimes, there exists exposure difference between images, which results in brightness fluctuation in the final mosaic. Try to get rid of this artifact.
Try shooting a sequence with some objects moving. What did you do to remove “ghosted” versions of the objects?
Try a sequence in which the same person appears multiple times, as in this example.
Implement a better blending technique, e.g., pyramid blending, poisson imaging blending,or graph cuts.

Panorama Links

Panoramas.dk: weekly archive of full-screen, high-quality panoramas worldwide
VR Seattle: Seattle & Washington panorama
Peru Panoramas
A complete set of test images

Last modified on April 11, 2005