Chandrika Jayant
Vision Project # 1
4/14/05

FEATURE DETECTION, DESCRIPTION, AND MATCHING

Feature Descriptor

My first feature descriptor is a simple 5 X 5 window descriptor. At each pixel, I record the r,g,b as a converted grey value using the equation in the skeleton code Convert.cpp. The center pixel of the window (feature pixel) has a data vector, in which I store all of the neighboring pixels' (in the window) intensity values (grey)-- this is the feature descriptor.

My second feature descriptor is a simplified version of SIFT, and doesn't deal with scale invariance. There are 8 bins that I make for the descriptor for each pixel, and use as follows. The feature pixel has a principal gradient direction and magnitude, calculated from its Harris Matrix. The gradient magnitude is the largest eigenvalue of the 2 x 2 matrix, and the gradient direction is x= [u,v] where x is the eigenvector of the matrix. Each pixel in the window of this feature has an individual gradient, which I have already calculated earlier. It is [Ix,Iy] where Ix = x intensity gradient, and iy = y intensity gradient. I calculate Ix by Intensity(x+1,y) - Intensity(x-1,y), and similar for Iy. I want to find out the angle between the neighboring pixel gradient direction and the feature's principal gradient direction. To find the angle of each pixel, I take arctan(v/u) where the gradient vector is [u,v]. I take the difference between the 2 angles and place in the appropriate of the 8 bins, where the bins are equally divided by angle from 0 to 360 degrees. In the bins, I add the magnitude of the gradient that falls into that bin. These 8 bins are the descriptor of the feature in this descriptor.

Major Design Choices

Feature Detector
I use the Harris feature descriptor discussed in class. I use a 5 x 5 window. The only real decision I had to make here was how to calculate the corner strength. There was the way written in the paper, det/trace, and the way discussed in class, det - (k*trace^2). With some cases, the harmonic version performed better than the Harris version, but sometimes they were switched, so I just chose this one. There seems to be no official verdict on which performs the best, as far as I know. I called something a "feature" when it was a max in a 20 * 20 window (to space out my features) and over a threshold corner strength. To find this threshold I just tested out a bunch of values until I could eyeball what I considered a good number of features- not too many but not too few.

Here is an example of what features my detector picks up:
Feature Descriptor
I described the 2 feature descriptors above. The first one was suggested in the homework assignment, and the 2nd seemed natural for me to try out since we didn't need to do scale invariance, but I had read the SIFT paper and wanted to test out a simplified version of their idea.
Feature Matcher
I implemented the 2 feature matching algorithms mentioned in the assignment. The second one works better because it bases its threshold on the ratio between the score of the best match over the score of the 2nd best match.

Test Results

Simple Window Descriptor:
This descriptor works well for translational changes in images. I used this descriptor minimally since the other one covered translational variation as well as orientation and illumination variance.
More Advanced Feature Descriptor:
My average error with this descriptor on the graf images was between 280 and 340. As compared to the simpler window descriptor which was around 350. SIFT, on the graf images, was around 170, obviously better and much more sophisticated. On the bike images, SIFT performed similar, around 150. My methods did worse here- the more advanced feature descriptor ranging from 315 to 370, and the simple descriptor going up to 470.The images were more noisy which may have something to do with it. To improve my results, if I had more time I would have incorporated blurring which I think would have helped a lot. First, I would slightly blur the image to reduce noise in the image (before taking any pixel intensity gradients). Also, it might help to blur the features while looking for matches. It also might have helped to have more features detected, but I didn't want to detect too many and have the features become meaningless.

Strengths and Weaknesses

My code works well with translational variance, which is much easier to deal with. My code for the rotational and illumination variance performs decently, although there are quite a few outliers though major features are matched. The simplified SIFT procedure has the advantage of being much less computationally expensive, and may be good for finding major features, but falls short (at least in my implementation) for robust matching.

Results on My Own Photos

I had good results on the photos I took. The matching is showed below- I tried to match the cat's face from 2 different angles.