Chandrika Jayant
Vision Project # 1
4/14/05
FEATURE DETECTION, DESCRIPTION, AND MATCHING
Feature Descriptor
My first feature descriptor is a simple 5 X 5 window descriptor. At each pixel, I
record the r,g,b as a converted grey value using the equation in the skeleton code Convert.cpp.
The center pixel of the window (feature pixel) has a data vector, in which I store
all of the neighboring pixels' (in the window) intensity values (grey)-- this is the feature
descriptor.
My second feature descriptor is a simplified version of SIFT, and doesn't
deal with scale invariance. There are 8 bins that I make for the descriptor for each
pixel, and use as follows. The feature pixel has a principal gradient direction and
magnitude, calculated from its Harris Matrix. The gradient magnitude is the
largest eigenvalue of the 2 x 2 matrix, and the gradient direction is
x= [u,v] where x is the eigenvector of the matrix. Each pixel in the
window of this feature has an individual gradient, which I have already
calculated earlier. It is [Ix,Iy] where Ix = x intensity gradient, and iy = y intensity
gradient. I calculate Ix by Intensity(x+1,y) - Intensity(x-1,y), and similar for Iy.
I want to find out the angle between the neighboring pixel gradient direction and the
feature's principal gradient direction. To find the angle of each pixel,
I take arctan(v/u) where the gradient vector is [u,v]. I take the difference
between the 2 angles and place in the appropriate of the 8 bins, where the bins are equally divided by angle
from 0 to 360 degrees. In the bins, I add the magnitude of the gradient that
falls into that bin. These 8 bins are the descriptor of the feature in this descriptor.
Major Design Choices
- Feature Detector
I use the Harris feature descriptor discussed in class. I use a 5 x 5 window.
The only real decision I had to make here was how to calculate the corner strength.
There was the way written in the paper, det/trace, and the way discussed in class,
det - (k*trace^2). With some cases, the harmonic version performed better than the Harris
version, but sometimes they were switched, so I just chose this one. There seems to be
no official verdict on which performs the best, as far as I know. I called something a
"feature" when it was a max in a 20 * 20 window (to space out my features) and
over a threshold corner strength. To find this threshold I just tested out a bunch
of values until I could eyeball what I considered a good number of features- not too
many but not too few.
Here is an example of what features my detector picks up:
- Feature Descriptor
I described the 2 feature descriptors above. The first one was suggested in the homework
assignment, and the 2nd seemed natural for me to try out since we didn't need to do
scale invariance, but I had read the SIFT paper and wanted to test out a simplified version
of their idea.
- Feature Matcher
I implemented the 2 feature matching algorithms mentioned in the assignment. The
second one works better because it bases its threshold on the ratio between the
score of the best match over the score of the 2nd best match.
Test Results
- Simple Window Descriptor:
This descriptor works well for translational changes in images. I used this
descriptor minimally since the other one covered translational variation as well
as orientation and illumination variance.
- More Advanced Feature Descriptor:
My average error with this descriptor on the graf images was between 280 and
340. As compared to the simpler window descriptor which was around 350. SIFT,
on the graf images, was around 170, obviously better and much more sophisticated.
On the bike images, SIFT performed similar, around 150. My methods did worse here-
the more advanced feature descriptor ranging from 315 to 370, and the simple
descriptor going up to 470.The images were more noisy which may have something to do with it.
To improve my results, if I had more time I would have incorporated blurring
which I think would have helped a lot. First, I would slightly blur the image
to reduce noise in the image (before taking any pixel intensity gradients).
Also, it might help to blur the features while looking for matches.
It also might have helped to have more features detected, but I didn't want to
detect too many and have the features become meaningless.
Strengths and Weaknesses
My code works well with translational variance, which is much easier to deal
with. My code for the rotational and illumination variance performs decently,
although there are quite a few outliers though major features are matched.
The simplified SIFT procedure has the advantage of being much less computationally
expensive, and may be good for finding major features, but falls short (at least
in my implementation) for robust matching.
Results on My Own Photos
I had good results on the photos I took. The matching is showed below- I tried
to match the cat's face from 2 different angles.