Project 1 - Features

Feature Descriptor

For my feature descriptor, I chose to focus on rotational and photometric invariance goals. In order to achieve rotational invariance, I calculate the horizontal and vertical difference of gaussians for the gray image, using a 5x5 gaussian followed by the appropriate convolution with horizontal and vertical Sobel kernels. Then for each harris local maxima, the corresponding points in these gradient images are read and used to calculate the direction of the gradient (using arc tangent).

A 11x11 patch is taken as a sub-image from the gray image and is rotated in the opposite direction of this gradient so that the positive gradient will always point in the direction of 0 radians (i.e. to the right). This is accomplished by calculating a 3x3 transformation that is the equivalent of a translation to move the center of the 11x11 patch over the origin, followed by the rotation, and then another translation by the inverse amount to return the patch center to its original location.

The 11x11 patch was chosen to be larger than the eventual feature patch of 7x7. This is because of the black corners that result in the rotated image. If left as part of the descriptor, these would throw off the signature of the patch and undo the goal of rotational invariance. By cropping into the center of this sub-image to get a 7x7 feature patch, this issue is avoided.

Next, steps are taken to compensate for photometric variations:

The mean of the patch values is calculated and subtracted from all the values (i.e. so they become zero-mean).
Next, the square root of the sum of the square of these values is calculated. All values are divided by this value so that their variance is 1.

Lastly, the values of the patch are weighted with the 7x7 gaussian weights, in the idea that this will help with rotational invariance. From limited testing, it is not clear whether the benefits of this outweigh the increased number of false positives.

The feature vector is the 49 adjusted and rotated intensity pixels by row first.

Design Decisions

Outside of the design decisions already discussed above, there were a couple of issues related to the edges of images. First, it seemed that any harris values that were detected along the edge of the image should be treated with suspicion. This is because of the "hallucination" process that goes on at the edges of images when filtering. It appears that the convolution code provided in this project uses zero padding. Therefore our harris detector will easily find a dominant edge in the edge of the images itself and need only have another strong edge interesting the image edge for a corner point to be detected. Indeed, many points do show up in the harris.tga files. This could have been easily addressed in the computeLocalMaxima() nested for-loops, but I chose instead to drop these values later in the particular compute features code. This is because a similar problem occurs when calculating the dominant orientation of the feature patch, except that the window for this step is even larger. So dropping points here takes care of both set of edge issues.

A local maxima in computeLocalMaxima() was chosen from a 7x7 window rather than the suggested 3x3. This allowed for a higher threshold to be used while keeping the number of feature points to a reasonable amount. This in turn means that, for a fixed number of feature points found, the features will be more spread out over the image rather than tightly clustered.

Plots

Yosemite ROC

We're getting many more true positives than false positives for all the methods. Of interest, the simple feature is doing slightly better than the more complex one with rotational and photometric invariance. This is probably because the exposures are similar and a simple translation will match up points. Also of interest, the ratio test doesn't seem to make much of a difference for these images.

Graf ROC

I did all my testing on Yosemite and it shows here. Even with tweaking my thresholds, I was only able to get better than random guessing with the invariant method using the ratio test. I think the big killer here is the scale difference between the images; I didn't get to adding this type of invariance. Also, I suspect that the feature detector is not working totally as it should, as the harris.tga image looks more like an edge detector than a corner detector. It would therefor makes sense that the ratio test is needed as this image has the potential for lots of false positives.

Harris Images

Yosemite Harris

Graf Harris

This seems to be finding edges rather than corners, which causing issues later. This is something I unfortunately did not catch until the final day because I had been testing exclusively with Yosemite.

AUC

wall	0.813725
graf	0.673872
bikes	*
leuven	0.626842**

* didn't find enough matches with ratio test or ssd
* this is the non-ratio result. The ratio result did was under .5

Strengths and Weaknesses

As mentioned above, I did almost all of my testing on Yosemite. If I started again, I would be more equitable in the images I tested on. My algorithms fall apart with Graf, mostly because of lacking scale invariance but I suspect also because of a bug in my feature detection code. The math all looks right to me in the latter except that I had to multiply the determinant by a rather large constant to get it to override the trace. I think this may have lead to a detector that allows edges rather than just corners to be accepted. For a photograph with high frequency information, such as Yosemite, this problem went unnoticed. With a painting with less high frequency information, this flaw became an issue as non-corner edges are more likely to be mismatched

In Yosemite, at first I didn't think the ratio test was working as well as it should, but now think this is because the image did not have as many repeating patterns as you'd expect in, say, an architectural structure. Therefor it is less likely to save us from false positives but can still impact the amount of true positives found. Similarly, the two images are close in exposure, so the photometric invariance didn't help but could cause more false positives.

The next step for improving the feature descriptor would be to switch to an approach that would be scale and blur invariant. This would fix the abysmal results on Graf. A histogram binning approach of gradient magnitudes in given gradient directions, as used in the SIFT algorithm, would be a good choice. To implement this effectively a revamped feature detector would also be required, since such an algorithm relies on the feature descriptor coming from the scale at which it was detected at a maximum.

My second priority would be to ensure a more even distribution of feature points across an image. One idea I have for this, not suggested in the reading, would be to run the feature detection over copies of the image with different gamma adjustments. For instance, a gamma of .4 would flatten out the highlights of an image and increase the contrast of the shadows and mid-tones.

Other photos