Project 1: Feature Matching

Eric Holk
CSE 576 - Computer Vision
April 16, 2008

Description of Feature Descriptor

The process starts by using a Harris Corner Detector to find interest points. The Harris Corner Detector simply returns all corners that are a local maximum in a 3x3 window. It does not provide any additional thresholding, and thus normally reports 15 to 20 thousand features per image.

Since this is clearly far more features than are needed, these are filtered using an adaptive non-maximal suppression (ANMS) algorithm, based on the one described in the MOPS paper. The idea is that at each point i identified as a corner, the algorithm finds the radius ri such that the Harris value at i is larger than any corner point which is nearer to i than ri. The algorithm calculates the minimum such ri for each interest point, and then sorts the list of interest points by their minimum radius. The list is then truncated so that only the top n strongest points are kept, where n is the desired number of features for a given image.

Once the interest points have been selected, the program creates a feature descriptor for each interest point. It starts by pulling out a 16x16 window centered on the interest point. The 16x16 window is weighted by a two dimensional Gaussian with a standard deviation of 8/3, so that the pixels at the edge of the window count much less than the center pixels.

Next, the gradiant at each point in the 16x16 patch is calculated, and the resulting gradiant image is blurred with a 5x5 Gaussian. The sum of all these blurred gradiants is used to assign a direction to the patch. Ideally, the patch would be rotated by this direction to provide rotation invarious, but this rotation is not currently implemented.

Finally the 16x16 window is blurred with a 5x5 Gaussian and every other row and column is discarded. The mean and standard deviation for luminance values on this resulting 8x8 window are calculated and then used to normalize the feature descriptor for changes in illumination.

Design Rationale

There are several important decisions in the design of this feature descriptor.

The first of these was the decision to use ANMS. This was because it automatically finds a good threshold for Harris corner values. I found that it was difficult to find a single threshold that would work for all of the provided test images. A given threshold might produce 6000 features on one image, and only 30 on another. Clearly, we do not want too many features, but it is impossible to get good matching results without a sufficient number of features. With ANMS, you specify the number of features you want, and it automatically finds a good set of features which are relatively evenly distributed around the image.

Using ANMS motivated a few other design decisions. For example, the radius threshold selected by the algorithm was normally around 20 pixels. For this reason, I chose to start building a feature from a 16x16 window. This gives a fairly big size which reduces false positives, but also helps to keep features from overlapping too much.

The feature descriptor was ultimately blurred and downsampled to an 8x8 window, based on what happened in the MOPS paper. This has the effect of making the feature robust to a few more changes, such as noise in the image, or small differences in where the feature center-point was placed. Intuitively, downsampling the feature descriptor leaves you with more of the overall shape of the feature, rather than as many specific details, which makes it more likely to find good matches.

Performance

ROC Curves

Yosemite Graf

Harris Images

Yosemite1.jpg graf1.ppm

Average Area-Under-Curve

graf 0.505859
leuven 0.586786
bikes 0.559912
wall 0.608455

Analysis

Overall, my feature descriptor performs far worse than SIFT. The four main benchmark tests each seem to test a particular aspect of feature matching. The Wall set is mostly images related by translation, and as expected, my algorithm does best there. Leuven and Bikes are primarily testing changes in illumination. My algorithm handles these reasonably well, which shows that the intensity normalization step is working well.

For the Graf set, my algorithm does about the same as random guessing. These scores would probably improve greatly by introducing a proper rotation normalization step.

Additional Images

I chose two photos I took near Mt. St. Helens in the Summer of 2006. The images overlap, but they have different focal lengths, which shows my algorithm's performance under different scales. Below are the images with the features marked on them.