Project 1: Feature Detection and Matching

Susumu Harada

Feature Descriptor

I implemented several descriptors to handle translation, rotation, and change in illumination, but in the end only the translation descriptor ended up being reliable (the illuminatino descriptor is built into it). In all discussion to follow, I am dealing with a grayscale version of the original image (grayscale conversion performed using the predefined method in Convert.cpp, although I had to change two places in cse576.cpp where the image object for the original photo is being initialized to use 4 bands instead of 3 so that ConvertToGray will accept it).

Local window
- I first convolve a 7x7 Gaussian with a covariance of 2 against the grayscale image to obtain a blurred image.
- I then used a 41x41 window around the feature point in question, within which I subsample every four pixels ((4,4), (8,4), ..., (8,4), ... etc from the corresponding blurred grayscale image created in the previous step. The sampled values were weighted linearly by its distance from the center point of the window such that those further away were given lower weights. The resulting 10x10 matrix was used as the descriptor for the region surrounding the feature point.
Illumination invariance
- To handle differing illumination for the same feature across different images, I normalized the feature vector (in the current case the local window) so that it is of unit length. This worked very nicely in handling the leuven test set, which was not triggering any features beyond the third image using a version of the code without the illumination invariance handling.
Rotation invariance
- I attempted to implement rotation invariance in several ways. The first approach I took was to determine the principle gradient of the feature window (I used a 15x15 window as motivated by Lowe) and then calculate the angular offset of the gradient at each pixel in this window from the principle gradient, and add the magnitude of that gradient to one of the 36 bins corresponding to the angular offset (I tried to follow Lowe's implementation, although I did not make use of the descriptor array concept). I determined the principle gradient based on the Hessian matrix and solved for the larger eigen value to get the principle component.
- The second attempt I made was to try and use the same notion as the local window, except rotating the window to align with the principle gradient and then subsampling from the original image using bilinear interpolation. I used a 15x15 window and used the Hessian matrix to calculate the orientation of the principle gradient and map the rotated window coordinates onto the original image. Disappointingly it performed worse than using just the regular local window, although it was better than simply using the other rotation invariance method I tried as described just above.

Design Choices

For implementing the Harris Corner Detector, I decided to use the corner response measure as outlined in the handout (det(M)/Trace(M)) as opposed to the one in the paper (det(M)-k*Trace(M)^2), and it made quite a significant difference in the number of insignificant features that were showing up. For thresholding the corner response measure, I ended up using 0.003 after manual inspection of the values of the features I was seeing at various parts of the image (of course, I ended up changing this threshold quite a bit as I modified my Gaussian and derivative filters and etc.). From an implementation perspective, I chose to use the provided image libraries, which allowed for cleaner code but there was the downtime spent trying to debug the errors that were present.

I also used a local maxima thresholding to keep the number of immediately adjacent features low. After I get a list of features that passed the corner response measure threshold test, I sort them in descending order of the response measure and starting from the top, keep accepting a feature if it is farther than 10 pixels away from the already accepted set of features.

It was interesting how trying to use binning of the relative gradient orientation around a feature point lead to quite a bit of degraded performance. I expected it to perform equally well for handling translation, but the features were matching at quite a dispersed set of locations. I have not figured out whether this is due to simply an incorrect implementation, or whether there indeed is something about the features that are detected that cause them to be an actual likely match.

Performance

Below is the summary of performance of the simple window descriptor, my own discriptor, and SIFT features. Surprisingly, the scores came out much worse than I had expected for SIFT, and also for my own descriptor, given that through the interaction with the UI, features seem to get matched up much more accurately compared to the simple window descriptor. (I could not compare against the graf data set since the benchmark returned infinity for both of my two descriptors). However, I am pretty certain that these numbers are not "correct" and that I have a bug somewhere in my code, as I have heard that SIFT should be on the order of 10 to 20 pixel error.

Average Error (pixels)	Simple Window Descriptor	My Own	SIFT
graf	293.9	314.5	255.4
leuven	376.7	204.9	129.81
wall	375.7	232.8	268.8

Table 1: Benchmark results for the absolute thresholding feature matching routine.

Average Error (pixels)	Simple Window Descriptor	My Own	SIFT
graf	308.7	319.4	286.5
leuven	417.9	377.1	363.6
wall	385.2	336.2	348.9

Table 2: Benchmark results for the ratio thresholding feature matching routine.

Strengths and Weaknesses

My corner detector seems to be working quite well, hitting almost all the corners and not detecting spurious features in flat regions or straight edges. I started out using a simple [-1 0 1] filter for my derivatives, but found that it was finding many diagonal edges, so after switching to the Sobel filter, the spurious edges went away.

I have not tweaked the threshold for the feature matching based on the ratio of the best two matches.

My Own Images

I was surprised that my features were able to match the rotated stadium fairly well, even though I do not have a robust rotation invariance handling method. As expected, the tree leaves area did not match so well, and I expected the tree tops in the horizon to match better as well.