CSEP 576 Computer Vision  Winter 2008

Project 2:  Feature Detection and Matching

Date released: Monday, January 28, 2008

Date due: Sunday, February 17, 2008 11.59pm
Late policy: 5% off per day late till Wednesday 02/20/2008

Download Indri's slides (results from her project)


Synopsis

In this project, you will write code to detect discriminating features in an image and find the best matching features in other images.  Your features should be reasonably invariant to translation, rotation, and illumination, and (to a lesser degree) scale, and you'll evaluate their performance on a suite of benchmark images.  Scale is less important because it's a lot harder - however, any descriptor that is invariant to the other factors will be slightly scale invariant as well. You're not expected to implement SIFT!

To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images.  We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.

Description

The project has three parts:  feature detection, description, and matching.
The Harris operator and descriptors work with gray-scale images, so the input images will need to be converted. The skeleton code has the function ConvertToGrey in features.cpp to convert the images to grayscale.

Feature detection

In this step, you will identify points of interest in the image using the Harris corner detection method.  

For each point in the image, consider a 5 x5  window of pixels around that point.  Compute the Harris matrix M for that point, defined as


where the summation is over all pixels (xk,yk) in the window W centered at location (x,y).   For the weights w(xk,yk) you can use a 5x5 Gaussian mask 

Ix and Iy indicate the x and y-gradients of a pixel, which you can approximate using the Sobel operator.

To find interest points, first compute the corner response R


(Try k = 0.05)

Once you've computed R for every point in the image, choose points where R is above a threshold (Try threshold = 0.001).  You also want it to be a local maximum in at least a 3x3 neighborhood.
To calculate the trace and determinant of M, you can find the formula in slide 23 of the Interest Operator lecture.

Feature description

Now that you've identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point.  This descriptor will be the representation you'll use to compare features in different images to see if they match.

1. Simple descriptor
Use a small square window (say 5x5) as the feature descriptor.  This should be very easy to implement and should work well when the images you're comparing are related by a translation. You can test this simple descriptor on the images in the easy dataset (http://cat.middlebury.edu/stereo/data.html)
You might also want to try to normalize the brightness values by dividing by the length of the descriptor vector (square root of the sum of the squares).

2. Advanced descriptor
You want the descriptor to be robust to changes in position, orientation (i.e. rotation) and illumination. One way to do this is to detect the dominant orientation of the pixels in a square window. We will implement a very simple version of SIFT.
Given a square window (say 9x9), compute the gradient magnitude and orientation of each pixel in the window. The magnitude and direction of a pixel gradient can be calculated as follows:


Create a histogram of the gradient direction of all the pixels in the window (say a 36-bin histogram so that each bin is of 10-degree increment). Accumulate the histograms bins by the magnitude of thepixel gradient.
The dominent orientation of the pixel would correspond to the bin with the highest count in  the histogram.
Rotate the descriptor window according to the dominant direction  with the rotation matrix  
The goal is to find the pixel location (u,v) that correspond to pixel location (x,y) rotated with the dominant direction. where angle theta is measured counterclockwise.
Note that the interest point, which is the center of the descriptor window, has to be at the origin (0,0) when doing the rotation. So you will first have to translate the descriptor window so that the interest point (center of the descriptor window) is at the origin (0,0), rotate the window by the dominant direction, and then translate the window back to its original position.

However, the grid may not be perfectly aligned:
To find values between the pixel samples, use bilinear interpolation:
Value of pixel p is a weighted average of the values of its four closest pixels:

Feature matching

Now that you've detected and described your features, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images.

The skeleton code provided finds the SSD between all feature descriptors in a pair of images. The code declares a match between each feature and it's best match (nearest neighbor) in the second image.

For each feature in the first image, use SSD to find the best match (or no match) in the second image. The idea here is to find a good threshold for deciding if a match exists. To do this, compute (score of the best feature match)/(score of the second best feature match), and threshold on that.

Testing

Now you're ready to go!  Using the UI and skeleton code that we provide, or your own matlab code, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes. Matlab users may want to scope out the C++ code for tips on comparing the features.

We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring).  For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches.  This is done using a routine that we supply in the skeleton code.

Everybody

  1. Download some image sets: leuven, bikeswall
    Included with these images are

    • SIFT feature files, with extension .key
    • database files used by the skeleton code, .kdb
    • homography files, containing the correct transformation between each pair of images, named HXtoYp where X and Y are the indeces of the images. For example, the transformation between img2.ppm and img3.ppm is found in H2to3p.
  2. Easier translation sequences can be found at http://cat.middlebury.edu/stereo/data.html. This dataset can be used to test your simple descriptor where the descriptor is invariant to translation.

C++ Users

Running the program

After compiling and linking the skeleton code, you will have an executable cse576. This can be run in several ways:

          To use the GUI:                     You can also use the mouse buttons to select a subset of the features to use in the query, and then perform the query
            Until you write your feature matching routine, the features are matched by minimizing the Euclidean distance between feature vectors.
            The GUI is used to visualize the location of your feature points and the matching. You will need to compute your features first (using the computeFeatures argument in the command line) to compute              your Harris features and save them to a file. Then only you can load your feature file and visualize it in the GUI.

Matlab Users

You are a little more on your own here, you will probably find it useful to look at the skeleton code. The key element of testing is to take two images, generate lists of feature descriptors for them, and match them. For each feature match, see how far off the actual transformation is.  Look at the routine that applies the transformation matrix (applyHomography in C++ code).

What to Turn In

Download the .doc template for the report here. You are free to use other text processing tools like latex etc, however make sure that you have the same sections in your report.
The grading guidelines for project 2 is here.

In addition to your source code and executable, turn in a report describing your approach and results.  In particular:

This report can be a Word document, or pdf document.
Email Indri writeup and your source code by Sunday, February 17, 2008 11:59pm.. Zip up your report, source code and images into a file with your name as the name of the file , eg. JohnDoe.zip.