CSE576: Project 1 Jeffrey Herron Submitted 4/18/2013

1 Introduction:

This project’s purpose is to demonstrate the effectiveness of different feature detection, description, and matching methods. Two different descriptors were implemented. The first was a simple five by five pixel window around any detected corner. The second was a 15 by 15 window of pixels rotated to a normalized axis with illumination-invariant pixel values. Furthermore, two different matching methods were used to analyze the performance of these descriptors. First we used standard Sum of Squared Distances (SSD) approach. A second method was compared against SSD results that utilized a ratio metric. Several images were used for performance measurement by means of examining the Receiver Operator Characteristic (ROC) curve.

2 Feature Descriptors and Design Decisions:

As mentioned above, the first feature descriptor consisted of a 5 pixel by 5 pixel window around any point in the image where the Harris values exceeded a certain threshold. This threshold was dynamically allocated to ensure a minimum number of detected features while also ensuring a maximum amount of memory allocation. The method consisted of the following steps:

1) First convert the image to a grayscale image.

2) Compute the Harris values for every point.

3) Locate local maxima within every 3 by 3 window above a certain threshold

4) If there are too many or too few features detected, update the threshold and repeat step #3. If the threshold would become negative, proceed because all possible features have been found.

5) For every maxima found in Step 3, save a 5 by 5 window centered on the pixel of interest.

From this simple schema, we can already tell that this feature will not be very effective. This feature can truly only account for simple translational changes and has very little rotational, scale, or illumination invariance. To compute features in this way, the following command should be executed in the directory containing the .exe:

Features computeFeatures srcImage featurefile 3

The second feature descriptor I implemented was significantly more complex and accounts for rotational and illumination invariance, but does not account for scalar changes. This primarily came down to problems implementing a pyramid scheme as discussed in lecture notes. As before, the windows were centered on points in an image where the Harris values exceeded a dynamic threshold. However, unlike before, the pixel values in the window were divided by the average pixel values to get a degree of illumination invariance. Furthermore, the window was rotated such that all features would have the same primary corner orientation. This would allow points of interest to be rotated in the plane and still be successfully matched. Breaking this method into steps:

1) Steps 1-4 are the same as above.

5) Filter the image gradients to get less noisy gradient images.

6) Calculate the dominant corner orientation by using the atan2 function and the filtered x and y gradient images.

7) Take a 31 by 31 window around every point of interest

8) Rotate the window by the angle of the corner orientation

9) Take the center 15 by 15 pixel window around the points of interest

10) Find the average pixel illumination

11) Divide every pixel by the average pixel illumination

In this way, we have a descriptor significantly more robust to rotational and illumination variances. It should be noted, that since I am only averaging the illumination across a window, I hope my features to be more robust against localized illumination changes such shadows being cast by potentially moving objects. To compute features in this way, the following command should be executed in the directory containing the .exe:

Features computeFeatures srcImage featurefile 2

3 Descriptor Performance:

To test both of these descriptors, we first need to examine our Harris Values. From there, we will examine the ROC with our primary interest being the area under the curve (AUC) value. Furthermore, results from the benchmark tests will also be presented.

3.1 Harris Results

Firstly, I would like to show two images and their corresponding Harris value images. The first is a picture from Yosemite, then with the corresponding Harris Values, and finally an image showing the localized threshold maximums:

This same analysis was done on a second set of images featuring some graffiti:

As can be seen from the above images, we are successfully identifying points of interest that correspond to corner areas. In the graffiti set particularly, the corners of blue repeated structures are immediately obvious in the Harris images.

3.2 ROC Test Results

A standard method of comparing the results of different descriptors and matching algorithms is to examine the ROC curves. We examined the ROC curves for matching two sets of image pairs with both descriptors outlined above. We matched each of the feature files by using both SSD and Ratio matching. Furthermore, we were provided with the SIFT ROC curves for the image sets. This comes to a total of 6 ROC curves for each image pair to examine. Note that I decided to use MATLAB to plot the ROC curves from the raw data primarily as a cosmetic choice.

The Yosemite image pair have only a translational relationship, so we should expect good performance from each of the descriptors. Here are the two images:

First, the ROC curves for the Yosemite pair with the simple five pixels by five pixels feature descriptor using both SSD and Ratio matching:

SSD AUC: 0.931849 Ratio AUC: 0.897590

Secondly, the ROC curves for the Yosemite pair with the custom 15 pixel wide feature descriptor using both SSD and Ratio matching:

SSD AUC: 0.956124 Ratio AUC: 0.972140

Finally, the ROC curves for the Yosemite pair with the SIFT feature descriptor using both SSD and Ratio matching:

AUC values not given, but are ~1

Next we will examine the ROC curves of the graffiti images, which include a perspective rotational warp around the image due to a change of viewpoint. This will be very hard for the 5 by 5 and my custom descriptor to match. This is because there is no rotational invariance built into the 5 by 5, and my custom descriptor is only invariant to planar rotations. Here are the two images:

First, the ROC curves for the Yosemite pair with the simple five pixels by five pixels feature descriptor using both SSD and Ratio matching:

SSD AUC: 0.588724 Ratio AUC: 0.686437

Secondly, the ROC curves for the Yosemite pair with the custom 15 pixel wide feature descriptor using both SSD and Ratio matching:

SSD AUC: 0.769828 Ratio AUC: 0.863898

Finally, the ROC curves for the Yosemite pair with the SIFT feature descriptor using both SSD and Ratio matching:

AUC values not given, but are ~.95

In almost every case, the ratio matching was better with the one exception was the Yosemite 5 by 5 descriptor. However, the simple 5 by 5 descriptor’s weaknesses really show through when comparing it against my custom descriptor and the SIFT descriptor. In every case it was worse than either of the two. However, the custom descriptor cannot compare against the performance shown by SIFT descriptor. In the graffiti images, the weaknesses of not having any form or perspective invariance lead to significantly worse results.

3.3 Benchmark Results

To test these two descriptors against even more images to understand their weaknesses I utilized the benchmark feature of the Features.exe program. The average AUC values for each image set are shown in the following table:

Image Set	5by5, SSD AUC	5b5, Ratio AUC	Custom, SSD AUC	Custom, Ratio AUC
bikes	0.34715	0.562370	0.459305	0.694549
graf	0.39117	0.607535	0.687815	0.641050
leuven	Error(=1.#IND00)?	Error(=1.#IND00)?	0.765866	0.861175
wall	0.532012	0.634676	0.704273	0.831373

As a quick aside, I have no idea why the five by five descriptor has an error output. I’ve attempted to debug this with no avail. It would seem that using the ROC command with these settings also returns this output.

Once again, we generally see that the ratio performance is better than the SSD. Each of the images have a varying set of differences between images we may want to match against. The bike images progressively get blurrier, the graf images have a perspective and rotational change, the leuven gets quite darker, and the wall has some shift but also highly repeating structures. These results will be examined more closely in the next section.

4 Additional Images and Performance:

To further test my feature matching I took some additional images of items within my home and then placed them within scenes. As a side note, since I know my methods have no scale invariance I pre-scaled each of the item images to be approximately the same size as the item within the scene. First I took a picture of a DVD case and then placed it amongst other DVD cases. The two originals are:

After pre-scaling the images so the DVD case would be approximately the same size I then performed feature matching on it. First using the simple descriptor:

Then I used my custom descriptor:

Qualitatively, the second image has a lot more positive matches and less false positives. Note that both of these results are shown within the features.exe GUI which does not allow threshold setting.

I also took some pictures of my cats in different rooms and attempted feature recognition. Note that there are both illumination, pose and rotational changes, however the size is approximately the same so I did not pre-scale. The originals are:

First, the results of the simple descriptor matching:

Second, the results of the custom descriptor matching:

Qualitatively, both are pretty bad. Part of the problem is that despite the stripes there are fewer features being found on the cat’s face, so there are fewer features to match. While both have a pretty high false positive to true positive rate, it seems as though there were more correct matches made with the custom descriptor, but at this point it’s hard to tell if that is a fluke or not.

5 Conclusions, Algorithm Strengths and Weaknesses:

The performance test results can give us a glimpse into the strengths of the different descriptors created in this project. The five-by-five window around potential corners is fine for translations, however in the cases where there is blurring, rotation, perspective shifts, contrast, or repeating structures, the performance drops dramatically to in some cases below chance levels depending on the matching method. While it has a small memory footprint and is easy to program, it is just not useful for real world use.

The custom descriptor I created for this project had significantly better performance than the five-by-five descriptor. With some level of rotational and illumination invariance as well as being significantly larger in dimensions allowed it to be matched far more accurately. However, it still had problems with the blurred images and the perspective shifts. This definitely gives me an appreciation for the SIFT descriptor and how robust it is over such a wide variety of variables that can exist between two images of the same item or scene.

Furthermore, it is interesting to see just how dramatic the results are after a change in the matching function. The ratio matching was almost always better than the SSD alone, except in the cases where both were doing pretty poorly anyway.

Overall this was a very interesting project and it’s helpful to know what is actually going on in the lower layers of the image recognition libraries. That being said, I believe in the future I will stick to using more established feature descriptor libraries such as SIFT rather than attempting to build my own. Building a robust feature descriptor is hard, but in the case of this project it was really fun and interesting.