CSE576: Project 1

Michael Cafarella

April 14, 2005

Introduction

This document contains information about my project 1 handin.

Feature descriptor

My first feature detector is simply a 3x3 window of pixels around a feature detected by the Harris detector. I do not do any pyramiding with the basic feature.

The second feature is a MOPS-style feature. We again use the Harris detector to find features in the image. We then take the image one step higher on the image pyramid as input to the feature; that makes feature-location slightly less critical, as we're sampling over a larger area. We also apply a 5x5 gaussian blur to this image so that high-frequency changes are less important.

We then compute the dominant gradient direction at the feature point. This tells us the "orientation" of the feature, so we can grab a patch of pixels that is orientation-invariant. (This will allow us to rotate the image and still retain useful features.) We then grab an 8x8 patch of pixels, adjusted by the orientation.

Finally, we normalize the patch of pixels. We compute the mean and subtract from every pixel value, so that the new mean for the set is zero. We then compute the standard deviation, and divide each pixel by this value. These normalizations allow us to match patches even when brightness might vary considerably.

We perform feature detection and generation at five pyramid levels for the image. That means we should be able to gather features for a wide range of scales in the image.

In order to work around an image library bug, I also need to adjust the intensity of pyramided images. Use of the convolve operator (via the pyramid operation) tends to dim an image substantially. I compute the average intensity difference between a convolved image and the original, then readjust the convolved image appropriately. Almost all of my changes are contained in features.cpp.

Performance

On the walltest benchmark, using output from "CSE576 testMatch":

Image 1	Image 2	Image 3	Image 4	Image 5	Image 6
Simple window	263.524	242.447	273.425	300.381	295.217
Feature Descriptor	310.112	299.895	319.949	310.675	335.929

Descriptive performance

It's hard to say why my feature descriptor seems to do fairly poorly. I made sure that I computed the patch orientation correctly, and double-checked the normalization as well. It is possible to see some "clusters" of best-matches that suggest I'm getting in the right ballpark, but this phenomenon is not as clear as I'd hoped.

It's clear from the user interface that my feature detector is finding the correct orientation most of the time, and that the features appear on interesting corner and border locations.

I might investigate further the average intensity of feature patches, and whether normalization is doing its job correctly. Further, I'd like to experiment more with taking image patches at varying scales of the pyramid and with varying levels of gaussian blur.

Simple window feature:

Standard feature:

SIFT:

On my images

Results for my images were very similar to the test images. The simple window feature works OK, probably because my image sequence involved mainly translation. However, the more complicated MOPS feature had disappointing performance. Here are the images I took:

Some queries:

Extra Credit

For extra credit I implemented multi-scale invariance (through the pyramid of images), and a special UI trick that indicates the orientation and scale of features. This last bit of code is similar to what Rick does in his paper to illustrate the feature values.