Feature Detection
For feature detection, I used a Harris corner detector over a circular
Gaussian window. The actuall function I used as a measure of the
"cornerness" is the following:
det(M)-.2(trace(M))^2
as given in the class notes. The value of .2 was chosen after
examining the effects fo different coefficients using gnuplot.
This value was chocen because I felt it was a good trade off
accentuating "corner-like" region and discounting the "edge-like"
region.
Feature Descriptor
The simple feature was simply a 5x5 block of pixels from the image
around the feature-detector indicated point of interest. My more
advanced feature descriptor data scructure is defined to be the
following:
typedef struct _invariantDescriptor{
double topWeight;
double leftWeight;
double crossWeight;
double patch[PATCH_WIDTH][PATCH_WIDTH];
double xdir;
double ydir;
double normScale;
double normOffset;
} invariantDescriptor;
For this descriptor, I took a 5x5 patch around the feature-detector
indicated point of interest and converted it to grey scale. Because hue
compensation and whitebalance considerations can be quite complicated,
I though this to be an efficient way to normalize for color.
The first section, the weights, are three values that are computed and
intended to be used as a hash for a faster matching algorithm.
They are calculated as the fraction of the total patch value found in
the top half, left half, and top left corner plus the bottom right
corner of the patch of pixels.
The second section is the real descriptor itself. A window of the
image is taken at the location of an interesting point. This is
then multiplied by a circular mask to reduce the problems of having a
square patch when trying to make the feature rotation invariant.
The values in the patch are then shifted and
scaled so that the minimum value falls on 0, and the maximum is 1. From
this, the cardinal direction is computed. First we assume the
origin is the middle of the patch. We then do something analogous
to the moment of inertia around the origin by doing the following,
where x, y are the coordinates:
for(x=0;x<PATCH_WIDTH;x++)
for(y=0; y<PATCH_WIDTH;
y++)
{
xComponent += x * Pixel(x,y);
yComponent += y* Pixel(x,y);
}
This gives a vector that is used as a cardinal direction. In
order to "normalize" for rotation, the patch
is rotated using a bilinear subsampling so that the cardinal direction
is coincident with the positive x axis.
Because it is a small patch, it is largly translation invariant.
The amplitude normalization provides intensity invariance. The
rotation is an attempt at providing rotational invariance, though the
billinear subsampling is not optiman and introduces a large amount of
aliasing artifacts, especially relative to the small window
chosen. I attempted to make it scale invariant using image
pyramids, but was unable to get it to properly work in the timeframe of
this project.
The third section is simply a notation of the transform used to
normalize this feature. This information could later be used in a
matching algorithm to construct an overall transform from one feature
to a matched one, but due to time constraints, this was not implemented.
Design Ratonale
I had three main goals in
designing a feature descriptor, and these are reflected in the three
sections of the descriptor above. They were:
- Provide for an efficient way to search for matches by providing
summarizing metrics which retain the nearness property for similar
patches.
- Provide an easily compared light, translation, and rotation
invariant description of the region of interest that allows for fine
descrimination.
- Keep the information removed in the normalization for invariance
portion, so that it may be later combined to provide for more global
matching, such as deducing the overal transform between two complete
images.
I hope that the first and third goal were acheived, though they remain
untested. The descriptor performs resonably well under the
consideration of the second goal, though it's main weakness is in the
rotation invariant portion. Aliasing effects from the bilinear sampling
were not mitigated, thus the rotation to normalize added noise which
could vary quite wildly with the amount of rotation. Despite
this, the matching still worked fairly well in the face of odd
rotations, as can be seen in the image below:
Also, more features are not always better. While none is quite
worthless, having too many can be just as bad. Here I demonstrate
this in a picture of a lizard with many "corners." The one-to-one
matching is not very good here, and it would be better to look at a
more global view. In this case, it would help a lot to smooth the
texture of the scales in order to see more corse grained corners, such
as those made by toes and claws.
Benchmarks
Image Set
|
Average Error(pixels)
|
graf
|
292.537856 |
wall
|
265.383286 |
leuven
|
416.085338 |
I was not able to obtain any benchmarks for the bikes image set.
The images were quite blurry, and the implementation of the image
pyramid was not working as of the running of the benchmarks.
Thus, the corner dectector concluded there were no features.
Comparisons
The simple window descriptor performance was quite abysmal. It
would match 100% if you used the same image twice, but with any
modifications, it wouldn't find good matches. Pure translations
would work fine, but any brightness, perspective, scale, or rotation
would completely throw it.
I tweaked the thresholds of the matching algorithm for my feature
descriptor, so those thresholds didn't work quite as well for
sift. Using the absolute threshold for a matching algorithm, the
SIFT was able to achieve an average distance between describe match and
actual of 445.8386496 pixels. Using the relative threshold, SIFT
reported there were no matches that were good enough, and thus failed
completely for the threshold I used. This performance is fairly
poor, but I would imagine could be improved by tweaking the
thresholds for use with the SIFT feature sets.
For those interested, the lizard data set is available
here.