CSE576 (Spring 2005) Project 1

Harsha V. Madhyastha

Objective: Devise mechanisms to detect, describe and match features in images

We convert the image to grayscale before performing any operations on it.

Detecting interest points

We use the Harris corner measure to detect interest points. Given an image, at every point, we compute the gradient along the X and Y directions. The gradient along a direction is computed by subtracting out the value of the pixel under consideration from that of the pixel adjacent to it along the particular direction. We then compute the 2x2 moments matrix M, the values of whose elements are defined as M(0,0) = Ix * Ix, M(0,1) = M(1,0) = Ix * Iy, and M(1,1) = Iy * Iy (Ix and Iy are the gradient along X and Y direction, respectively). The Harris corner measure is then computed as determinant(M) divided by trace(M). We then identify all points for which this measure is greater than some threshold. Based on our empirical observations, we have determined a threshold of 1.5 to work well. Finally, the points of interest identified are those with a Harris corner measure above this threshold and whose value constitutes a local maximum in the 21x21 window around it. This is essentially the idea of non-maximal suppression from Matt Brown and Rick's CVPR'05 paper.

Feature Descriptor

We first implemented an extremely simple feature descriptor. For each of the points of interest identified as above, we just stored the values of the pixels in the 9x9 window with this point as the center. Matching two images using this feature descriptor can be expected to work reasonably well if the one image is produced by a translation on the other image. However, this descriptor will clearly not work well with changes in intensity, rotation, change in scale, etc.

The second feature descriptor we implemented was aimed to address two of the transformations that the simple descriptor outlined above cannot handle - change in intensity and rotation. Before tackling either of these, we first tried to weed out noise from the image by applying a smoothing filter on it. The filter kernel we used is a 3x3 Gaussian filter, with a weight of 1/16 on each of the four corners, a weight of 1/4 at the center and a weight of 1/8 elsewhere. To account for intensity changes, we subtracted from each pixel value the mean value and divided the result by the standard deviation. This ensures that the image we are handling has mean 0 and standard deviation 1.

Now, just storing the 9x9 window of pixel values around the point of interest will be susceptible to error under rotation. So, instead, we consider a 9x9 window whose orientation is decided by the direction of the gradient at the point of interest. For this, we compute the direction of the gradient at the point of interest, and then rotate the axes such that the Y axis increases along the direction of the gradient. The 9x9 window is now determined based on the new X and Y axes.

Matching features

Given a feature in an image, the two algorithms we employ for determining if a matching feature exists in another image are as follows. In either algorithm, the distance metric used is just the Euclidean distance between two vectors.

Benchmark tests

Benchmark SetSimple feature descriptorComplex feature descriptor
bikes302383
graf287298
leuven331145
wall223219

Based on the above results of the benchmark tests, it is tough to conclude whether the complex feature descriptor is really better than the simple one, or in fact maybe even worse! Both descriptors perform comparably on the graf and wall datasets, whereas on the leuven dataset, the complex descriptor performs considerably better. On the other hand, the simple descriptor manages to outdo the complex one on the bikes dataset. So, it looks like our feature descriptor's attempt to handle changes in intensity worked out great, but it does not seem to handle change in focus and rotational transformations too well.
It is also questionable whether the results on the whole are good, irrespective of the descriptor used.

Strengths and Weaknesses

The strengths of our feature descriptor as outlined previously are:

Here are some of the weaknesses of our feature descriptor, which could be the cause for the poor performance observed in the benchmark tests.

Test image

Test image

We wanted to take a picture of a clock in two different orientations and check out if the feature matching manages to match up corresponding numbers as well as the hands of the clock. It does not look like that turned out all that great! (For just this test, we determined points of interest based on local maxima in a 7x7 neighborhood rather than in a 21x21 neighborhood as used in our benchmark tests.)