Computer Vision (CSE576), Spring 2008
Project 1: Feature Detection and Matching
Overview of the Feature Detector:
I used the standard Harris corner detector as described on the project web page and in class. However, instead of using a hard threshold to detect local maxima in the harris response of the image, I use an adaptive threshold which is set to be µ+5*σ where µ and σ are the mean and standard deviation of the harris response of the image. This might adversely effect the repeatability of the detector but it performed better than the hard threshold on the provided dataset.
Overview of the Descriptor :
I use a variant of the SIFT descriptor, the Rotation Invariant Feature Transform (RIFT), which was first proposed for the purpose of texture classification in . However, the descriptor is not as popular in the field of object recognition.
RIFT is invariant to rotation and hence it is not required to normalize for rotation first (SIFT normalizes by detecting dominant gradient orientation).
The above figure shows the process of building the RIFT descriptor. The region around the feature point is divided into concentric rings of equal width and for each ring a gradient orientation histogram is built. All gradient orientations are measured relative to the line joining the point to the center. The contribution of a point is weighted by the gradient magnitude at that point and the weighing factor exp(-3.33*r2/R2) where r is the distance of the point from the center and R is the radius of the outer most disc which was chosen to be 40 pixels. Finally, all histograms are concatenated and the resultant vector normalized to yield the final descriptor. In my implementation, I used 5 rings and quantized the orientations into 8 directions to yield a 40-dimensional descriptor. The numerical constants were determined experimentally but finer tuning of parameters may be possible.
I tried improving the performance by changing the descriptor and the match strategy but none of them gave an overall improvement over the dataset. Some of the things I tried:
Strengths and Weaknesses of the descriptor:
The descriptor is rotation invariant and does not rely on finding the dominant gradient orientation which might be error prone. However, the rotation invariance of the descriptor affects the discriminative power of the descriptor -- for e.g., the gradients in each of the rings may be shuffled by a different permutation and the descriptor would remain the same. The descriptor is much lower dimensional as compared to the SIFT descriptor (40 vs 128) which accelerates the matching. However, the descriptor uses a window of size 80x80, hence it might be a bit expensive to compute. In my opinion, this is an abnormally large window size but the best results were obtained using this value.
The above figure shows input images and the corresponding harris responses.
|The ROC curves for the graf dataset (img1.ppm and img2.ppm)
The ROC curve for the RIFT descriptor does relatively well but does not beat the SIFT descriptor. However, it does outperform the SIFT descriptor for low false positive rates. . Also interesting is the fact that it does not benefit that much from the ratio test unlike the SIFT descriptor. The window descriptor used is simply the intensity values in a rectangular window of size 7x7 centered around the feature point.
The ROC curves for the Yosemite dataset
The ROC curves for Yosemite dataset also show similar trends. Its interesting to note here that the ratio test actually decreases the accuracy of the window descriptor. A possible explanation could be that these two image have little variance and SSD is a more accurate measure for a highly discriminative descriptor like the window descriptor.