CSE576 - Computer Vision

Project 1 - Feature Detection/Matching

by Stephen Friedman


Feature Detection

For feature detection, I used a Harris corner detector over a circular Gaussian window.  The actuall function I used as a measure of the "cornerness" is the following:
det(M)-.2(trace(M))^2
as given in the class notes.  The value of .2 was chosen after examining the effects fo different coefficients using gnuplot.  This value was chocen because I felt it was a good trade off accentuating "corner-like" region and discounting the "edge-like" region.

Feature Descriptor

The simple feature was simply a 5x5 block of pixels from the image around the feature-detector indicated point of interest. My more advanced feature descriptor data scructure is defined to be the following:
  typedef struct _invariantDescriptor{
            double topWeight;
            double leftWeight;
            double crossWeight;

            double patch[PATCH_WIDTH][PATCH_WIDTH];

            double xdir;
            double ydir;
            double normScale;
            double normOffset;
  } invariantDescriptor;
For this descriptor, I took a 5x5 patch around the feature-detector indicated point of interest and converted it to grey scale. Because hue compensation and whitebalance considerations can be quite complicated, I though this to be an efficient way to normalize for color.
The first section, the weights, are three values that are computed and intended to be used as a hash for a faster matching algorithm.  They are calculated as the fraction of the total patch value found in the top half, left half, and top left corner plus the bottom right corner of the patch of pixels. 

The second section is the real descriptor itself.  A window of the image is taken at the location of an interesting point.  This is then multiplied by a circular mask to reduce the problems of having a square patch when trying to make the feature rotation invariant.  The values in the patch are then shifted and
scaled so that the minimum value falls on 0, and the maximum is 1. From this, the cardinal direction is computed.  First we assume the origin is the middle of the patch.  We then do something analogous
to the moment of inertia around the origin by doing the following, where x, y are the coordinates:
for(x=0;x<PATCH_WIDTH;x++)
        for(y=0; y<PATCH_WIDTH; y++)
        {
            xComponent += x * Pixel(x,y);
            yComponent += y* Pixel(x,y);
        }
This gives a vector that is used as a cardinal direction.  In order to "normalize" for rotation, the patch
is rotated using a bilinear subsampling so that the cardinal direction is coincident with the positive x axis. 

Because it is a small patch, it is largly translation invariant.  The amplitude normalization provides intensity invariance.  The rotation is an attempt at providing rotational invariance, though the billinear subsampling is not optiman and introduces a large amount of aliasing artifacts, especially relative to the small window chosen.  I attempted to make it scale invariant using image pyramids, but was unable to get it to properly work in the timeframe of this project.

The third section is simply a notation of the transform used to normalize this feature.  This information could later be used in a matching algorithm to construct an overall transform from one feature to a matched one, but due to time constraints, this was not implemented.

Design Ratonale

I had three main goals in designing a feature descriptor, and these are reflected in the three sections of the descriptor above.  They were:
I hope that the first and third goal were acheived, though they remain untested.  The descriptor performs resonably well under the consideration of the second goal, though it's main weakness is in the rotation invariant portion. Aliasing effects from the bilinear sampling were not mitigated, thus the rotation to normalize added noise which could vary quite wildly with the amount of rotation.  Despite this, the matching still worked fairly well in the face of odd rotations, as can be seen in the image below:

Picture of UI showing matches accomplished in the face of rotation.


Also, more features are not always better.  While none is quite worthless, having too many can be just as bad.  Here I demonstrate this in a picture of a lizard with many "corners."  The one-to-one matching is not very good here, and it would be better to look at a more global view.  In this case, it would help a lot to smooth the texture of the scales in order to see more corse grained corners, such as those made by toes and claws. 
A lizard with very corner-like skin.

Benchmarks


Image Set
Average Error(pixels)
graf
292.537856
wall
265.383286
leuven
416.085338

I was not able to obtain any benchmarks for the bikes image set.  The images were quite blurry, and the implementation of the image pyramid was not working as of the running of the benchmarks.  Thus, the corner dectector concluded there were no features. 

Comparisons

The simple window descriptor performance was quite abysmal.  It would match 100% if you used the same image twice, but with any modifications, it wouldn't find good matches.  Pure translations would work fine, but any brightness, perspective, scale, or rotation would completely throw it. 
I tweaked the thresholds of the matching algorithm for my feature descriptor, so those thresholds didn't work quite as well for sift.  Using the absolute threshold for a matching algorithm, the SIFT was able to achieve an average distance between describe match and actual of 445.8386496 pixels.  Using the relative threshold, SIFT reported there were no matches that were good enough, and thus failed completely for the threshold I used.  This performance is fairly poor, but  I would imagine could be improved by tweaking the thresholds for use with the SIFT feature sets.



For those interested, the lizard data set is available here.