Description
Feature Detection
The code uses the Harris Corner Detector method to do feature detection. The functions responsible for this are:
computeHarrisValues()
1. Determine Gradient of Image
This is done using the Sobel operator. I have implemented the convolution code myself. The results are stored in a 3-dimensional vector called matrix.
2. Multiply gradient of each pixel with the transpose of itself
3. Apply Gaussian smoothing
A 5x5 gaussian mask is used for smoothing the image.
4. Compute Corner-strength
This is done by dividing the determinant of the matrix resulting from step 3 with its trace.
The pixels in harrisImage are then assigned their respective corner strength values. Figure 1 shows the results on a checkerboard image for each step.
Gradient and
Transposition |
Gaussian Smoothing |
Corner Strength |
Figure 1: computeHarrisValues result on a checkerboard image
computeLocalMaxima()
1. Threshold the image and pick only pixels which are a local maxima in a 3x3 window according to corner strength
The threshold is picked by trial and error. For most images, a threshold of 0.1 seems to work. However, for very blurred images such as img6.ppm of “bikes”, the threshold needs to be reduced so that more features are extracted. This is likely because blurring reduces the sharpness at the edges making it difficult to detect edges.
Pixels which pass the threshold and local maxima test are assigned the value 255 (white) while other pixels are assigned the value 0 (black).
Figure 2 shows the results of this step after having been implemented on the corner strength image from Figure 1.
Figure 2: Features detected for checkerboard image
Note that Figure 2 shows the picked features to have been further narrowed as compared to the corner strength image from Figure 1. This is very helpful since it proves the uniqueness of the selected features.
Feature Description
There are two implementations of the feature descriptor:
1. Sum of all three channel values of a pixel in the colored image for a 5x5 window around the selected feature point
This works well for translational images but is orientation and illumination variant.
2. Store each channel value of each pixel in the colored image for a 5x5 window around the selected feature point
This provides a more accurate descriptor which is invariant to scale and reduces the number of false positives because of its accuracy. It is also less orientation variant since each pixel channel value is being noted instead of their collective sum. This quality of the descriptor also helps reduce the SSD in differing image illuminations since the variance in this case would simply be a small scale of the descriptor values. Henceforth, I call my descriptor the 5x5x3 descriptor.
Feature Matching
The ratio test was used as the matching scheme. The ratio test takes the SSD between the selected feature point and the best matched feature point divided by the SSD between the selected feature point and the second best matched feature point.
Harris Image
Yosemite
Figure 3: Yosemite1.jpg Harris Operator Image
Graf
Figure 4: img1.ppm Harris Operator Image
Performance
Yosemite ROC Curve
Graph 1:
Figure 4 shows that
there is not much difference between the 5x5x3 descriptor and the 5x5
descriptor. This is likely because the image consists of large areas with
little variation in color, thus causing the sum of the band values per pixel in
the 5x5 descriptor to be equivalent to taking the band values individually. We
are likely to see a boost in performance of the 5x5x3 descriptor when trying to
match features in images with smaller areas of homogeneous color such as the graf images. We also notice that SSD seems to perform
better than the ratio test for this image. A possible reason for this could be
that a large chunk of the first image is missing from the second image. This
causes the ratio test to rely on second best distances which may be close in
SSD but are actually physically far apart (example: snow on grass). SSD,
however, considers only the best point, thus allowing it to more fully rely on
the common region between the two images, thus making it perform better than
the ratio test.
Graf ROC Curve
Graph 2: Graf ROC
As was predicted, the
5x5x3 descriptor outperforms the 5x5 descriptor for the graf
images. This is because the graf images consist of
smaller regions with similar color and thus having a more accurate color
descriptor proves to be more useful. Graf img2 is a rotated version of img1,
and this is another reason why the 5x5x3 descriptor performs better. Since we
are aware of the full color composition in a 5x5 window, the distance
calculated between it in img1 and its rotated version in img2 is lesser than if
we simply take the sum of the band values of each pixel and use that as the
descriptor; i.e. the 5x5 descriptor. We also notice that the ratio test
outperforms SSD for this image. The reason for this is because most of the
features in img1 are also visible in img2 (as opposed to the Yosemite images),
thus allowing the ratio test to fully take advantage of its strengths –
tempering its best matches by careful comparison with second best matches,
therefore allowing it to know how ambiguous a match is.
Benchmark Results
Table 1 shows the
benchmark results on four image directories. The settings used my descriptor
(5x5x3 descriptor) and the ratio test for matching.
Image
Directory |
AUC |
Bikes |
0.353958 |
Graf |
0.560233 |
|
0.502588 |
Wall |
0.565314 |
Table 1: Benchmark Results
Note: There was a problem in the Bikes directory in that it would not
match image 1 to images 5 and 6 successfully. Thus, only the first 4 images in
the Bikes folder were used to
calculate the benchmark results.
Strengths and Weaknesses
As per the AUC
results in Table 1, we see that the algorithm handles blurriness (bikes image
directory) the worst. This is likely due to there being less sharp edges and
correspondingly less good features, thus resulting in very few features being
selected to begin with and the descriptors being quite different between
images. In order to improve performance, we would need to extract more diverse
features so that a majority of the selected features are not clustered in small
regions which are a little sharper than the rest of the image (example: the
headlights of the bikes). We also need a stronger descriptor which can handle
blurriness. Thus, an implementation of MOPS along with a descriptor which
performs smoothing of colors and extracts gradient values would be beneficial.
The other three
image characteristics – illumination, rotation and repeated features – are
handled equally well by the algorithm. Of the three, illumination variance is
handled less well. This is because of the lack of sharply contrasting edges.
However, the algorithm performs better here than with blurriness because even
in the darkened images, the edges are still more visible than when blurred. Our
descriptor also consists of more accurate color values and thus can cope with
illumination variance better than the 5x5 descriptor. A stronger descriptor
such as one which consists of the color gradient values of the pixels would be
beneficial. Rotation variance can also be better handled with a stronger
descriptor which perhaps extends the gradient descriptor suggested earlier by
performing dominant orientation detection. Repeated features can be better
handled by having a more diverse feature extraction mechanism such as MOPS and
a stronger descriptor which minutely analyzes the color variations in a window
such as SIFT. A better matching algorithm would also be beneficial.
We see the algorithms strengths in its accurate descriptor
representation. This allows it to cope with repeated structures, illumination
and orientation variances equally well.
Performance on self taken images
Figure 5: Kitchen pot detected
Figure 6: Exhaust detected