Peter Henry
CSE576 Computer Vision
Project 1

Feature Detector

I implemented only the harris corner detector as my feature detector. I associated it with command line option 1 because it was my only feature detector. I first convolve the original grayscale image with the sobel X and Y filters. Then I create three images (I_x * I_x), (I_x * I_y), and (I_y * I_y), storing the appropriate products of sobel pixels. Each of these three images is convolved with a 5x5 gaussian kernel (implemented using the equivalent 7x1 separable kernel). Finally, I compute the ratio of the determinant and trace of the harris matrix, and store this as my harris values. To compute the local maxima, I ensure that a point is the maximum in its 3x3 window and above a threshold of 0.01. Following are harris value images for "graf/img1.ppm" and "yosemite/yosemite1.jpg":



Feature Descriptor

I first implemented the simple 5x5 grayscale image patch descriptor, the results of which are included in the plots of the result section. For my more general feature descriptor, I took inspiration from "Multi-Image Matching using Multi-Scale Oriented Patches" by Brown, Szeliski, and Winder. I continued to use my single-scale harris corner detector, but for my descriptor, I do the following:

An important note is that I do not generate patches for features that whose image patch would fall outside the image. These patches would have reduced matchability due to the missing information. Therefore, this image descriptor will not generate features along the edges of an image.

Results

ROC Plot for graf (my descriptor):
AUC for SSD matching:0.632751
AUC for Ratio matching:0.767705


ROC Plot for graf (simple 5x5 descriptor):
AUC for SSD matching:0.653630
AUC for Ratio matching:0.657511


ROC Plot for yosemite (my descriptor):
AUC for SSD matching:0.950955
AUC for Ratio matching:0.976500


ROC Plot for yosemite (simple 5x5 descriptor):
AUC for SSD matching:0.781920
AUC for Ratio matching:0.894587


Benchmarking results (all use ratio matching):

bikes:
average error: 324.398513 pixels
average AUC: 0.653120

graf:
average error: 294.377062 pixels
average AUC: 0.542049

leuven:
average error: 165.489341 pixels
average AUC: 0.768668

wall:
average error: 278.718948 pixels
average AUC: 0.664109

Following is the result of detecting features in a test photo. Because I have no ground truth for my own images, measuring the usefulness of these features is difficult at this juncture.


Discussion

My feature descriptor is reasonably invariant with respect to translation, orientation (rotations about the camera view ray), and illumination. It is also readily understood and efficiently implemented.

One weakness of my method is the fixed threshold for determining harris corner points. The threshold I selected generates 14779 feature points for "wall/img1.ppm" but only 86 feature points for "bikes/img6.ppm". The wall image has many, many locally invariant features while the bikes image is quite blurry, resulting in very few locally invariant features. This suggests that an adaptive threshold would substantially even out the number of features per image, and would eliminate the guesswork used to determine my threshold. Also, an adaptive threshold could be used to spread the feature points spatially across the image.

My feature points are not scale invariant, with the only invariance coming from the blurring before sampling of the image patch. This may explain the weak performance on the graf dataset, in which many feature points change depth between images. Using multi-scale harris features could alleviate this problem.

Though my feature points attempt to be invariant to rotations around an axis parallel to the view ray, they are not invariant to planar rotations around other axes. Again, the blurred, large image patch makes an attempt at invariance to small rotations of this type, large transformations between images will foil my descriptor.

Finally, it is worth noting that my descriptor uses only grayscale values, meaning that intensity equivalent patches of different colors will be indistinguishable.

Possible Improvements

The first thing to do would be to implement an adaptive threshold. Also, the size of the gaussians used for determining dominant orientation and for patch sampling should be revisited and tweaked with more care.