Computer Vision

Computer Vision
CSE 576 Spring 2013

Project 1: Feature Detection and Matching

Kevin Huang

Feature Descriptor Description

In this project, features are first detected within an image by computing the Harris matrix at each pixel location as well as the corner strength. The Harris matrix at any pixel location was first computed by convolving the image with a 3x3 Gaussian mask. In the resulting image, the Harris matrix for any image was computed as the summation of pixels over a 3x3 window centered about each pixel of interest multiplied by the inner product of the X direction gradient and Y direction gradient. This is summarized as

Where the summation is taken over a 3x3 window, w_p is the Gaussian mask convolved pixel value, and I_xand I_yrepresent the x and y gradient values of the original image at that location respectively.

The corner strength at each point is then defined as the determinant of the Harris matrix over the trace. In this way, relative corner strengths are computed for each image, and then thresholded and isolated for local maxima. Features are thus detected. Here we show 2 images with their associated corner strength image

Following this, a features descriptor (a vector that characterizes a given feature) must be computed at each detected feature. The intent is to be able to use these descriptors to match features in a given image to corresponding points in another. This project consists of two types of feature descriptors: the required simple 5x5 window and one designed by the student. We begin with the former.

Simple 5x5 Window

A portion of this project mandates the implementation of a simple, 5x5 window feature descriptor. In other words, for any given detected feature, its descriptor is defined simply by the 5x5 window of pixels centered about the feature pixel.

Designed Descriptor

The designed descriptor is of size 39x39. Additionally, for any given detected feature, the descriptor window is rotated about the feature point by the angle from the horizontal axis of the feature point gradient. Finally, within the descriptor window, the mean is subtracted to normalize intensity values within the descriptor window.

Additional options (including in the source code but commented out) include dividing the descriptors by the standard deviation within the window, addition of 39x39 windows of scaled down versions of the original image (2, 4 and 8 times smaller), as well as the addition of features from images which are blurred versions of the original image to various degrees (Gaussian 5x5 filter applied 1, 2 and 3 times).

Major Design Decisions

While the 5x5 simple window descriptor using Harris features is sufficient for simple translations, we would like to make the more sophisticated feature descriptor robust to variations in rotation, illumination, contrast, resolution as well as scale, things the simple 5x5 window is not amenable to. The following design decisions were made to try and address these issues:

39x39 Window

A larger window size can perhaps reduce false positives in matching, but too large may result in no matches. A window edge length of 39 was empirically determined.

Rotation Invariance

Recall that each detected feature is in some sense a corner. The gradient at this corner should characterize the direction of the feature. Therefore, a larger window (79x79) about the feature is taken and rotated. A window at least times greater in edge length is necessary to ensure no loss of information in rotation. The rotation angle is chosen as the angle between the horizontal axis and the gradient. Then a 39x39 window is taken from within the 79x79 rotated window.

Illumination Invariance

Because images are often taken under different lighting, a simple subtraction of the window mean intensity from each data pixel normalizes the descriptor values to accommodate for illumination variation.

*Extra Credit/Variants Design Decisions*

Contrast Invariance

The lighting situation of a scene can affect contrast as well as intensity. To account for this, each pixel value within the descriptor is divided by the standard deviation of the window intensities. This normalizes the spread within each descriptor, and helps to address variations in contrast.

Resolution Invariance

In order to address potential variation in resolution, new features are detected in the same way described above from images obtained by applying a 5x5 Gaussian mask to the original image, once, twice and thrice. These features are then added to the features database to hopefully match to blurrier images.

Scale Invariance

We first build an image pyramid. In other words, the original image is scaled down by a factor of 2 three times. In each scaling, the resultant image is stored and features are detected within the scaled down image. These features are then added to the feature file for the original image and can potentially be matched to scaled versions of features.

Performance Analysis

Once features and their associated descriptors are found, we can build an ROC curve with relates false positive matching to true positive matching. The more quickly the true positive rates reach 100% relative to the false positive rate, the more accurate the descriptor and matching method. This can be captured in the area under the ROC curve.

First we compare the 5x5 descriptor, the designed 39x39 descriptor, and the SIFT descriptor using SSD matching and ratio matching for two pairs of images. We do so by building ROC curves and reporting the area under each.

Graffiti Test Pair (graf1.ppm and graf2.ppm)

The AUC scores are as follows:

Descriptor	Matching Algorithm	AUC
5x5	SSD	0.606621
5x5	Ratio	0.719107
KHuang 39x39	SSD	0.838416
KHuang 39x39	Ratio	0.857111
SIFT	SSD	0.932023
SIFT	Ratio	0.967779

Yosemite Test Pair (Yosemite1.jpg, Yosemite 2.jpg)

The AUC scores are as follows:

Descriptor	Matching Algorithm	AUC
5x5	SSD	0.931941
5x5	Ratio	0.913883
KHuang 39x39	SSD	0.866351
KHuang 39x39	Ratio	0.888072
SIFT	SSD	0.994692
SIFT	Ratio	0.995494

Benchmark Sets

Next, the average AUC scores for 4 different benchmark sets were computed. For each set, img1.ppm was matched with img2.ppm, img3.ppm, img4.ppm, img5.ppm, and img6.ppm using the designed descriptor with no options enabled, i.e. designed only for translation, rotation and illumination invariance. The matching method was the ratio method. For each benchmark set, img1.ppm and img6.ppm are shown to illustrate the desired transformation(s). Here are the resultant AUC scores for the benchmark sets:

graf

img1.ppm – img2.ppm AUC: 0.857111

img1.ppm – img3.ppm AUC: 0.763059

img1.ppm – img4.ppm AUC: 0.736913

img1.ppm – img5.ppm AUC: 0.700797

img1.ppm – img6.ppm AUC: 0.338889

mean AUC: 0.6793538

leuven

img1.ppm – img2.ppm AUC: 0.850348

img1.ppm – img3.ppm AUC: 0.761289

img1.ppm – img4.ppm AUC: 0.878917

img1.ppm – img5.ppm AUC: 0.654488

img1.ppm – img6.ppm AUC: 0.000000

mean AUC: 0.6290084

bikes

*it should be noted that for the bikes benchmark set that the threshold for detecting features was lowered

img1.ppm – img2.ppm AUC: 0.854745

img1.ppm – img3.ppm AUC: 0.837248

img1.ppm – img4.ppm AUC: 0.581030

img1.ppm – img5.ppm AUC: 0.928272

img1.ppm – img6.ppm AUC: 0.598672

mean AUC: 0.7599934

wall

img1.ppm – img2.ppm AUC: 0.882735

img1.ppm – img3.ppm AUC: 0.867456

img1.ppm – img4.ppm AUC: 0.793124

img1.ppm – img5.ppm AUC: 0.869582

img1.ppm – img6.ppm AUC: 0.990304

mean AUC: 0.8806402

Here are the average AUCs for both descriptors and both matching methods

Descriptor	Benchmark Set	Matching Algorithm	Average AUC
5x5	graf	SSD	0.399176
	graf	Ratio	0.624436
	bikes	SSD	0.151271
	bikes	Ratio	0.503483
	leuven	SSD	0
	leuven	Ratio	0
	wall	SSD	0.549966
	wall	Ratio	0.684650
KHuang 39x39	graf	SSD	0.5955206
	graf	Ratio	0.6793538
	bikes	SSD	0.7014158
	bikes	Ratio	0.7599934
	leuven	SSD	0.2684986
	leuven	Ratio	0.6290084
	wall	SSD	0.7203364
	wall	Ratio	0.8806402

Note that in all cases, the designed 39x39 descriptor performed better than the simple 5x5. Moreover, in each case, the ratio feature matcher was better than SSD.

It should be noted that the ./Feature benchmark and ./Feature roc commands yield different results. Therefore, most of the tabulated numbers above were calculated by iteratively using ./Features roc and then computing the average by hand.

Discussion

Image Pair Tests

When testing the graf image pairs, it is clear that for each descriptor type that the ratio method performed better than the SSD. Moreover, we see that the designed descriptor performed better than the simple, but worse than SIFT. This is potentially due to the rotation between the two images. The simple descriptor does not take into account rotation, while the designed descriptor and the SIFT both do. In addition, there appears to be a bit of scaling between the two images, something the designed descriptor does not account for.

When doing the pair wise test for the Yosemite images, we observe similar behavior except that the designed descriptor performed worse than the simple. This is because the simple descriptor is made to detect only translation (the only difference it seems between the two Yosemite images), while the designed descriptor also incorporates rotations and illumination. Because of the lack of rotation and illumination variation, this intended design becomes a detriment because it can potentially more false positives than simple translation when there is only translation between the two images of interest.

graf Benchmark

For the graf benchmark set, we observe that the key components are rotation and scale. There seems to be skewing as well. The first four matches have relatively high accuracy, with AUC over 0.7. However, the last match has poor accuracy with AUC of 0.338889. While the designed descriptor can account for rotation, the image is skewed and stretched beyond its capabilities of recognition. Furthermore, scale was not taken into account with our default descriptor.

leuven Benchmark

For the leuven benchmark, we observe that the key components are illumination and contrast. The first four matches again have relatively high accuracy, all AUC above 0.7. The sixth match has an AUC of zero, all false positives, and indicates that no features were matched correctly or no features were found in img6.ppm of leuven. This can be remedied by changing the threshold or incorporating contrast invariance.

bikes Benchmark

The bikes benchmark set includes images of the relatively same scene but at different resolutions. Our default for the designed descriptor is able to match well img1.ppm to img2.ppm, img3.ppm and img5.ppm. Our default descriptor is not invariant to changes in resolution.

wall Benchmark

The wall benchmark set presents images that are rotated and slightly scaled from one another. There is also a bit of skewing. However, there are many, distinct corners in this brick wall, making it amenable to corner searching. Because of this, our descriptor performed very well (invariance to rotation).

Strengths

From these tests, we know that our default designed descriptor is rather invariant to rotation, as seen in the wall benchmark set. The feature detection and matching algorithm works even better when distinct and unique corners are present. To a degree, the descriptor is sufficient for differences in lighting and contrast, but drops off quickly after a certain threshold (see leuven img1 and img6). Overall it seems to work well with different illumination settings, but may need a change in thresholding or modifications to make it contrast invariant. As shown from the Yosemite pair match, we see that the descriptor is sufficient but does not perform as well as the simple 5x5 because of the lack of rotation. However, it still performs adequately.

Weaknesses

The default designed descriptor does not account for extreme variations in contrast (leuven benchmark set), scaling (graf benchmark set), or resolution (bikes benchmark set). Moreover, only one out of the available three bands is used in this feature descriptor. Perhaps adding color information around the feature would increase greatly the positive matching rate.

Extra Credit Items

In addition to making a descriptor robust to changes in position, orientation and illumination, we incorporated methods to help the descriptor be robust to contrast, resolution, and scaling. We will address each in order.

We recognized from the leuven benchmark set that our method is not entirely robust to changes in contrast, namely between img1.ppm and img6.ppm with an AUC of 0. To that end, after subtracting the mean intensity from each descriptor, we also divided each pixel within the descriptor by the standard deviation. This in effect normalizes the distribution within descriptor, and thus makes it more robust to changes in contrast.

Contrast

Using this option in the designed descriptor, we computed an ROC curve and AUC

Calculated AUC: 0.780355

As we can see, by dividing by the standard deviation, the descriptor is much more robust to changes in contrast. The AUC increased from 0 to 0.780355 for matching leuven img1.ppm to img6.ppm.

Resolution

Moreover, we saw that the default designed descriptor that it was not very robust to changes in resolution (bikes img1.ppm, img4.ppm and img6.ppm). To remedy this, a 5x5 Gaussian mask was convolved with the original image three times. After each convolution, features were detected in the convolved image and recorded. In this way, hopefully changes in resolution would not effect the feature matching as much. First, we calculate the ROC curve and AUC for matching img1.ppm to img4.ppm, which before the modification had AUC of 0.581030. Doing so for img1.ppm and img4.ppm yields

Calculated AUC: 0.607982

And again for img1.ppm to img6.ppm

Calculated AUC: 0.734646

We see in both cases that by adding the Gaussian mask convolved images to the feature database increased the AUC, especially in the case of img1.ppm to img6.ppm.

Scale

Difference in scale could account for the poor AUC of 0.338889 in the graf benchmark set img1.ppm to img6.ppm. To account for scale, a pyramid of the original image was constructed. Namely, the image was made coarser and shrunk by a scale of 2 three separate times, resulting in images that were 2, 4 and 8 times smaller in edge length than the original image. Features were then detected in the smaller images and added to the feature database. Recalculating for graf img1.ppm to img6.ppm with the scale option, we obtain ROC graph and AUC

Calculated AUC: 0.285745

Unfortunately, this modification/option does not help with the graf benchmark set. In fact, the AUC decreased after applying this change. This could be due to a miscalculation in the scaling/pyramiding algorithm that I wrote, or because there is more than just scale involved when matching img1.ppm to img6.ppm.

Extra Images

In addition to the above given benchmark sets, we applied the developed feature detection and matching algorithm to a few other images. Three sets of other images are shown below

Fountain

In this set there is some rotation and scaling. Using the provided GUI, we detect features and attempt to match a few. It is somewhat successful

Books

With the books set, we should observe only translation and rotation. We anticipate that our algorithm will perform well, which it does, as shown below

Cat and Apples

Finally, we take these images that are cluttered with many features and corners that are difficult to distinguish. We expect our algorithm won’t do so well, as shown below