version 2, modified 4/7/05 (changes from original version are marked in red)
Assigned: Thursday, March 31, 2005
Due: Wednesday,
April 13, 2005 (by 11:59pm)
In this project, you will write code to detect discriminating features in an image and find the best matching features in other images. Your features should be reasonably invariant to translation, rotation, illumination, and scale, and you'll evaluate their performance on a suite of benchmark images. We'll rank the performance of features that students in the class come up with, and compare them with the current state-of-the-art.
In project 2, you will apply your features to automatically stitch images into a panorama.
To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images. We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.
The project has three parts: feature detection, description, and matching..
In this step, you will identify points of interest in the image using the Harris corner detection method. The steps are as follows (see the lecture slides/readings for more details) For each point in the image, consider a window of pixels around that point. Compute the Harris matrix H for that point, defined as
where the summation is over all pixels p in the window.
The weights should
be chosen to be circularly symmetric (for rotation invariance). A common choice
is to use a 3x3 or
5x5 Gaussian mask.
Note that H is a 2x2 matrix. To find interest points, first compute the corner strength function
Once you've computed c for every point in the image, choose points where c is above a threshold. You also want c to be a local maximum in at least a 3x3 neighborhood.
Now that you've identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point. This descriptor will be the representation you'll use to compare features in different images to see if they match.
For starters, try using a small square window (say 5x5) as the feature descriptor. This should be very easy to implement and should work well when the images you're comparing are related by a translation.
Next, try implementing a better feature descriptor. You can define it however you want, but you should design it to be robust to changes in position, orientation, and illumination. You are welcome to use techniques described in lecture (e.g., detecting dominant orientations, using image pyramids), or come up with your own ideas.
Now that you've detected and described your features, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images.
The simplest approach is the following: write a procedure that compares two features and outputs a score saying how well they match. For example, you could simply sum the absolute value of differences between the descriptor elements. Use this to compute the best match between one feature and a set of other features by evaluating the score for every candidate match. You can optionally explore faster matching algorithms for extra credit.
Your routine should return NULL if there is no good match in the other image(s). This requires that you make a binary decision as to whether a match is good or not. Implement two methods to solve this problem:
1. use a threshold on the match score
2. compute (score of the best feature match)/(score of the second best
feature match), and threshold that
Now you're ready to go! Using the UI and skeleton code that we provide, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes.
We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring). For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches. This is done using a routine that we supply in the skeleton code.
You should also go out and take some photos of your own to see how well your approach works on more interesting data sets. For example, you could take images of a few different objects (e.g., books, offices, buildings, etc.) and see if it can "recognize" new images.
Follow these steps to get started quickly:
Install FLTK.
If you unzip FLTK to somewhere other than C:\, you'll have to change the
project settings to look for the include and library files in the correct
location. If you're using Linux, you don't need to download FLTK, since
you can just use the libraries in uns/lib/.
Download the skeleton code here.
This code should work under both Windows and Linux.
Download some image sets: graf, leuven, bikes, wall
Included with these images are some SIFT feature files and image
database files.
After compiling and linking the skeleton code, you will have an executable cse576. This can be run in several ways:
cse576
with no command line options starts the GUI.
Inside the GUI, you can load a query image and its corresponding feature
file, as well as an image database file, and search the database for the
image which best matches the query features. You can use the mouse
buttons to select a subset of the features to use in the query.
Until you write your feature matching routine, the features are matched by minimizing the Euclidean distance between feature vectors.
cse576 computeFeatures imagefile featurefile
[featuretype]
uses your feature detection routine to compute the
features for imagefile, and writes them to
featurefile. featuretype specifies which of your
(at least two) types of features to compute.
cse576 testMatch featurefile1 featurefile2 homographyfile
[matchtype]
uses your feature matching routine to match the
features in featurefile1 with the features in
featurefile2. homographyfile contains the
correct transformation between the points in the two images, specified by
a 3-by-3 matrix. matchtype specifies which of your (at least
two) types of matching algorithms to use.
cse576 testSIFTMatch featurefile1 featurefile2 homographyfile
[matchtype]
is the same as above, but uses the SIFT file
format.
cse576 benchmark imagedir [featuretype matchtype]
tests
your feature finding and matching for all of the images in one of the four
above sets. imagedir is the directory containing the image
(and homography) files. This command will return the average pixel error
when matching the first image in the set with each of the other five
images.
We have given you a number of classes and methods to help get you started. The only code you need to write is for your feature detection methods and your feature matching methods. Then, you should modify computeFeatures and matchFeatures in the file features.cpp to call the methods you have written.
In addition to your source code and executable, turn in a web page describing your approach and results. In particular:
We'll tabulate the best performing features and present them to the class.
The web-page should be placed in the project1/artifact directory along with all the images in JPEG format. If you are unfamiliar with HTML you can use any web-page editor such as FrontPage, Word, or Visual Studio 7.0 to make your web-page. Here are some tips
Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions as well!
Last modified on April 11, 2005