Project One

Doug Cole

Feature Selection

Feature selection was done using the basic Harris filter as discussed in class. The Harris fitness value was calculated at each point in the image, these were then sorted and the fifty best features from each image were kept.

Feature Descriptor

Although I tried several different feature descriptors, I ended up sticking with a simple widow descriptor as it out performed all other methods I tried. Other descriptors I tried were using the eigen values of the Harris filter and binned gradients from a window around the feature similar to the basic idea of SIFT. For improved performance I used the following modifications to the simple window descriptor:

Global 3x3 Gaussian smoothing
Feature rotated based on gradient of feature
Local mean removed from feature
Features were normalized versus the local standard deviation

All pixels were kept in RGB as opposed to gray scale, no testing was done in gray scale. The smoothing was done globally before finding features, but of course becomes part of the feature as well. Several window sizes were used for the image as well as the size of the area used to compute the mean and gradient that was used to rotate the feature. The best results were gotten with a feature window of 3x3, gradient orientation found over a 5x5 widow and local mean and standard deviation were found over a 7x7 window.

All of these choices were made based on testing versus the image sets and attempting to minimize the average error. This resulted in many compromises being made, for example the "leuven" images can be matched significantly better by removing a global mean from the image, but this hurts the performance on the other image sets. In other cases features were kept for robustness, for example not rotating the features improves the "wall" and "leuven" accuracy, but is kept both because it improves accuracy on the "graf" image set and should improve robustness for future image sets.

Feature rotation was done somewhat differently than that described by Lowe, so I will explain it more here. Lowe recommends binning each gradient angle weighted by its strength (Euclidean distance of the x and y derivatives), then choosing the largest value, fiting a polynomial to this bin and it two closed neighbors and taking the maximum of the polynomial to be the correct angle. I tried using the maximum bin, but was unable to implement the polynomial fit due to time constraints. As a simple substitution I took the average of each bin's corresponding angle weighted by its value, I found this to work better than simply choosing the largest bin.

Feature Matching

Feature matching was done using the ratio of Euclidean distance of the best feature divided by the Euclidean distance of the second feature. After testing a threshold value of 0.5 was selected as optimal, as can be seen on the graph on page 20 of the Lowe paper discussed in class, this value contains about half the correct matches with almost none of the incorrect matches (this assumes my features have a similar probability density function which we will see from the results isn't really true).

Results

Results
Image Set	Average Error (pixels)
graf	170.8
leuven	186.6
wall	324.7
bikes	300.0

Results using the benchmark function are listed in the table to the right. Clearly there is room for improvement. As an example, using the SIFT features set provided on graf img5.ppm versus img1.ppm SIFT had an error rate of 15, where as my feature detetector and descriptor had an error of 100 pixels. One obvious area for improvement would be to search for features across different possible feature sizes. Another area for improvement is in the feature descriptor, many better descriptors exist outside of the simple window. But in many situations my program does quite well comparing the SIFT feature matching on graf img5.ppm versus img1.ppm the SIFT features had an error rate of 249, where as my algorithm had an error of 255 pixels