Description

 

Feature Detection

 

The code uses the Harris Corner Detector method to do feature detection. The functions responsible for this are:

 

computeHarrisValues()

 

1.      Determine Gradient of Image

This is done using the Sobel operator. I have implemented the convolution code myself. The results are stored in a 3-dimensional vector called matrix.

 

2.      Multiply gradient of each pixel with the transpose of itself

 

3.      Apply Gaussian smoothing

A 5x5 gaussian mask is used for smoothing the image.

 

4.      Compute Corner-strength

This is done by dividing the determinant of the matrix resulting from step 3 with its trace.

 

The pixels in harrisImage are then assigned their respective corner strength values. Figure 1 shows the results on a checkerboard image for each step.

 

Gradient and Transposition

Gaussian Smoothing

Corner Strength

 

Figure 1: computeHarrisValues result on a checkerboard image

 

computeLocalMaxima()

 

1.      Threshold the image and pick only pixels which are a local maxima in a 3x3 window according to corner strength

The threshold is picked by trial and error. For most images, a threshold of 0.1 seems to work. However, for very blurred images such as img6.ppm of “bikes”, the threshold needs to be reduced so that more features are extracted. This is likely because blurring reduces the sharpness at the edges making it difficult to detect edges.

 

Pixels which pass the threshold and local maxima test are assigned the value 255 (white) while other pixels are assigned the value 0 (black).

 

Figure 2 shows the results of this step after having been implemented on the corner strength image from Figure 1.

 

Figure 2: Features detected for checkerboard image

 

Note that Figure 2 shows the picked features to have been further narrowed as compared to the corner strength image from Figure 1. This is very helpful since it proves the uniqueness of the selected features.

 

Feature Description

 

There are two implementations of the feature descriptor:

 

1.      Sum of all three channel values of a pixel in the colored image for a 5x5 window around the selected feature point

This works well for translational images but is orientation and illumination variant.

 

2.      Store each channel value of each pixel in the colored image for a 5x5 window around the selected feature point

This provides a more accurate descriptor which is invariant to scale and reduces the number of false positives because of its accuracy. It is also less orientation variant since each pixel channel value is being noted instead of their collective sum. This quality of the descriptor also helps reduce the SSD in differing image illuminations since the variance in this case would simply be a small scale of the descriptor values. Henceforth, I call my descriptor the 5x5x3 descriptor.

 

Feature Matching

 

The ratio test was used as the matching scheme. The ratio test takes the SSD between the selected feature point and the best matched feature point divided by the SSD between the selected feature point and the second best matched feature point.

 

 

Harris Image

 

Yosemite

 

Figure 3: Yosemite1.jpg Harris Operator Image

 

Graf

 

Figure 4: img1.ppm Harris Operator Image

 

 

 

Performance

 

Yosemite ROC Curve

 

Graph 1: Yosemite ROC

 

Figure 4 shows that there is not much difference between the 5x5x3 descriptor and the 5x5 descriptor. This is likely because the image consists of large areas with little variation in color, thus causing the sum of the band values per pixel in the 5x5 descriptor to be equivalent to taking the band values individually. We are likely to see a boost in performance of the 5x5x3 descriptor when trying to match features in images with smaller areas of homogeneous color such as the graf images. We also notice that SSD seems to perform better than the ratio test for this image. A possible reason for this could be that a large chunk of the first image is missing from the second image. This causes the ratio test to rely on second best distances which may be close in SSD but are actually physically far apart (example: snow on grass). SSD, however, considers only the best point, thus allowing it to more fully rely on the common region between the two images, thus making it perform better than the ratio test.

 

Graf ROC Curve

 

Graph 2: Graf ROC

 

As was predicted, the 5x5x3 descriptor outperforms the 5x5 descriptor for the graf images. This is because the graf images consist of smaller regions with similar color and thus having a more accurate color descriptor proves to be more useful. Graf img2 is a rotated version of img1, and this is another reason why the 5x5x3 descriptor performs better. Since we are aware of the full color composition in a 5x5 window, the distance calculated between it in img1 and its rotated version in img2 is lesser than if we simply take the sum of the band values of each pixel and use that as the descriptor; i.e. the 5x5 descriptor. We also notice that the ratio test outperforms SSD for this image. The reason for this is because most of the features in img1 are also visible in img2 (as opposed to the Yosemite images), thus allowing the ratio test to fully take advantage of its strengths – tempering its best matches by careful comparison with second best matches, therefore allowing it to know how ambiguous a match is.

 

Benchmark Results

 

Table 1 shows the benchmark results on four image directories. The settings used my descriptor (5x5x3 descriptor) and the ratio test for matching.

 

Image Directory

AUC

Bikes

0.353958

Graf

0.560233

Leuven

0.502588

Wall

0.565314

Table 1: Benchmark Results

 

Note: There was a problem in the Bikes directory in that it would not match image 1 to images 5 and 6 successfully. Thus, only the first 4 images in the Bikes folder were used to calculate the benchmark results.

 

 

Strengths and Weaknesses

 

As per the AUC results in Table 1, we see that the algorithm handles blurriness (bikes image directory) the worst. This is likely due to there being less sharp edges and correspondingly less good features, thus resulting in very few features being selected to begin with and the descriptors being quite different between images. In order to improve performance, we would need to extract more diverse features so that a majority of the selected features are not clustered in small regions which are a little sharper than the rest of the image (example: the headlights of the bikes). We also need a stronger descriptor which can handle blurriness. Thus, an implementation of MOPS along with a descriptor which performs smoothing of colors and extracts gradient values would be beneficial.

 

The other three image characteristics – illumination, rotation and repeated features – are handled equally well by the algorithm. Of the three, illumination variance is handled less well. This is because of the lack of sharply contrasting edges. However, the algorithm performs better here than with blurriness because even in the darkened images, the edges are still more visible than when blurred. Our descriptor also consists of more accurate color values and thus can cope with illumination variance better than the 5x5 descriptor. A stronger descriptor such as one which consists of the color gradient values of the pixels would be beneficial. Rotation variance can also be better handled with a stronger descriptor which perhaps extends the gradient descriptor suggested earlier by performing dominant orientation detection. Repeated features can be better handled by having a more diverse feature extraction mechanism such as MOPS and a stronger descriptor which minutely analyzes the color variations in a window such as SIFT. A better matching algorithm would also be beneficial.

 

We see the algorithms strengths in its accurate descriptor representation. This allows it to cope with repeated structures, illumination and orientation variances equally well.

 

 

Performance on self taken images

 

Figure 5: Kitchen pot detected

 

 

Figure 6: Exhaust detected