Computer Vision (CSE 576), Spring 2006

Project 2:  Feature Detection and Matching

Due:  Monday, May 1. Like last time, the hour of turnin doesn't really matter so long as I have it the next day.


Synopsis

In this project, you will write code to detect discriminating features in an image and find the best matching features in other images.  Your features should be reasonably invariant to translation, rotation, and illumination, and (to a lesser degree) scale, and you'll evaluate their performance on a suite of benchmark images.  Scale is less important because it's a lot harder - however, any descriptor that is invariant to the other factors will be slightly scale invariant as well. You're not expected to implement SIFT!

To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images.  We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.

Description

The project has three parts:  feature detection, description, and matching..

Feature detection

In this step, you will identify points of interest in the image using the Harris corner detection method.  (The description of the Harris matrix which was previously here was confusing to many students. It is being replaced on April 19.)

For each point in the image, consider a window of pixels around that point.  Compute the Harris matrix M for that point, defined as

M = Sum over x,y of w(s,y)[I_x^2  I_xI_y; I_xI_y  I_y^2 ]
where the summation is over all pixels (x,y) in the window w.   The weights w(x,y) should be chosen to be circularly symmetric (for rotation invariance).  A common choice is to use a 3x3 or 5x5 Gaussian mask. Ix and Iy indicate the partial derivatives in x and y.

More expansive notes on computing the Harris matrix, and on the Harris Detector in general, are provided in these lecture notes, in ppt and pdf. Here also are some more expansive notes on the Harris math presented in class.

To find interest points, first compute the corner strength function

c(M) = determinant(M)/trace(M)

Alternately, you can calculate the corner response R
R = det M - k(trace M)^2
Though this is somewhat trickier because you have to select a value for k, typically between 0.04 and 0.06.

Once you've computed c or R for every point in the image, choose points where c pr R is above a threshold.  You also want it to be a local maximum in at least a 3x3 neighborhood.

Feel free to implement other features as well if you want.

Feature description

Now that you've identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point.  This descriptor will be the representation you'll use to compare features in different images to see if they match.

For starters, try using a small square window (say 5x5) as the feature descriptor.  This should be very easy to implement and should work well when the images you're comparing are related by a translation.

Next, try implementing a better feature descriptor.  You can define it however you want, but you should design it to be robust to changes in position, orientation (i.e., rotation), and illumination.  You are welcome to use techniques described in lecture (e.g., detecting dominant orientations, using image pyramids), or come up with your own ideas.

Feature matching

Now that you've detected and described your features, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images.

The simplest approach is the following:  write a procedure that compares two features and outputs a score saying how well they match.  For example, you could simply sum the absolute value of differences between the descriptor elements.  Use this to compute the best match between one feature and a set of other features by evaluating the score for every candidate match.  You can optionally explore faster matching algorithms for extra credit.

Your routine should return NULL or some other indicator of failure if there is no good match in the other image(s).  This requires that you make a binary decision as to whether a match is good or not.  There are two methods you might use to solve this problem: 

1.  use a threshold on the match score
2.  compute (score of the best feature match)/(score of the second best feature match), and threshold on that

Note: at least one of your matching routines should be general, so that it makes no assumptions about the length of the feature descriptor (which is likely to be just a vector). This way, you'll be able to directly compare your feature detection and description with SIFT's (we provide files of SIFT features for each benchmark image). If you would like, consider also making a smarter feature matcher which makes some assumptions about the feature descriptor, such as which elements matter more.

Testing

Now you're ready to go!  Using the UI and skeleton code that we provide, or your own matlab code, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes. Matlab users may want to scope out the C++ code for tips on comparing the features.

We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring).  For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches.  This is done using a routine that we supply in the skeleton code.

If you like, consider taking some photos of your own to see how well your approach works on more interesting data sets.

Everybody

  1. Download some image sets: graf, leuven, bikes, wall
    Included with these images are

    • SIFT feature files, with extension .key
    • database files used by the skeleton code, .kdb
    • homography files, containing the correct transformation between each pair of images, named HXtoYp where X and Y are the indeces of the images. For example, the transformation between img2.ppm and img3.ppm is found in H2to3p.

C++ Users

  1. Get access to FLTK. (Here are local copies for windows and linux, but see below about using it on linux.) **disclaimer** the skeleton code was designed to be used with FLTK-1. It has not been tested with FLTK-2.
    On Windows you'll need to install it yourself. (If you unzip FLTK to somewhere other than C:\, you'll have to change the project settings to look for the include and library files in the correct location.) If you're using Linux on one of the CSE department machines, you don't need to download FLTK, since you can just use the libraries in /uns/lib/. The Makefile included in the zip expects to find fltk in /uns. If you already have it or are installing it in /usr/local, here are fltk installation instructions and an alternate Makefile for the skeleton code.

  2. Download the C++ skeleton code here.
    This code should work under both Windows and Linux.

After compiling and linking the skeleton code, you will have an executable cse576. This can be run in several ways:

Matlab users

Sorry you're a little more on your own here; you'll probably find it useful to have the testing methods provided in the skeleton code. The key element of the testing (and in order to produce the results we're asking for) is to take two images, generate lists of features descriptors for them (and probably dump those lists to files), match them, and for each feature match, see how far off from the actual transformation it is. (In order to do this, features will need to record where in the image they are, as well as the rest of the descriptor.) Return the average Euclidean distance between the matched feature points and the actual transformed positions. This will be the main metric of evalutaion.

Where h is the 3x3 matrix describing the image homography (the real transformation between the first image and the second), here's how to apply it to the coordinates of a pixel or feature:

// Transform point by homography.
void applyHomography(double x, double y, double &xNew, double &yNew, double h[9]) {
	double d = h[6]*x + h[7]*y + h[8];

	xNew = (h[0]*x + h[1]*y + h[2]) / d;
	yNew = (h[3]*x + h[4]*y + h[5]) / d;
}
To say this again in a different way, once you've figured out the best match for each feature in the first image to a feature in the second image (or to null if there is no good match), test your detection and matching by, for each feature in the first image, measuring the distance between the matched feature in the other image, and the real transformation of the feature's location. Report the average of these distances for this pair of images.

To Do

The only code you need to write is for your feature detection methods and your feature matching methods. C++ users should fill in the skeleton classes, then modify computeFeatures and matchFeatures in the file features.cpp to call the methods you have written.

What to Turn In

In addition to your source code and executable, turn in a report describing your approach and results.  In particular:

This report can be a Word document, pdf document or webpage. Email me (Lillie) your writeup and your source code by Monday May 1. (Whenever.) Please include "576 project 2" in the subject of the email. Please include your name in the name of your writeup file.

Extensions

For those who would like to challenge themselves, here is a list of suggestions for extending the program for a small amount of extra credit. You are encouraged to come up with your own extensions as well!