Computer Vision (CSE576), Spring 2008
Project 1: Feature Detection and Matching
Rahul Garg
Overview of the Feature Detector:
I
used the standard Harris corner detector as described on the project
web page and in class. However, instead of using a hard threshold to
detect local maxima in the harris response of the image, I use an
adaptive threshold which is set to be µ+5*σ where µ and σ are
the mean and standard deviation of the harris response of the image.
This might adversely effect the repeatability of the detector but it
performed better than the hard threshold on the provided dataset.
Overview of the Descriptor :
I
use a variant of the SIFT descriptor, the Rotation Invariant Feature
Transform (RIFT), which was first proposed for the purpose of texture
classification in [1]. However, the descriptor is not as popular in the
field of object recognition.
RIFT
is invariant to rotation and hence it is not required to normalize for
rotation first (SIFT normalizes by detecting dominant gradient
orientation).
The
above figure shows the process of building the RIFT descriptor. The
region around the feature point is divided into concentric rings of equal width and
for each ring a gradient orientation histogram is built. All gradient
orientations are measured relative to the line joining the point to the
center. The contribution of a point is weighted by the gradient
magnitude at that point and the weighing factor exp(-3.33*r2/R2) where r is the distance of the point from the center and R is the radius of the outer most disc which was chosen to be 40 pixels. Finally,
all histograms are concatenated and the resultant vector normalized to
yield the final descriptor. In my implementation, I used 5 rings and
quantized the orientations into 8 directions to yield a 40-dimensional
descriptor. The numerical constants were determined experimentally but finer tuning of parameters may be possible.
Strengths and Weaknesses of the descriptor:
The descriptor is rotation invariant and does not rely on finding the dominant gradient orientation which might be error prone. However, the rotation invariance of the descriptor affects the discriminative power of the descriptor -- for e.g., the gradients in each of the rings may be shuffled by a different permutation and the descriptor would remain the same. The descriptor is much lower dimensional as compared to the SIFT descriptor (40 vs 128) which accelerates the matching. However, the descriptor uses a window of size 80x80, hence it might be a bit expensive to compute. In my opinion, this is an abnormally large window size but the best results were obtained using this value.
Results :
Dataset | Average AUC | Img2 | Img3 | Img4 | Img5 | Img6 |
Graf | 0.60 | 0.90 | 0.52 | 0.57 | 0.49 | 0.52 |
Leuven | 0.74 | 0.91 | 0.79 | 0.70 | 0.67 | 0.65 |
Bikes | 0.65 | 0.94 | 0.76 | 0.59 | 0.53 | 0.45 |
Walls | 0.66 | 0.94 | 0.75 | 0.61 | 0.52 | 0.49 |
![]() |
![]() |
![]() |
![]() |
The above figure shows input images and the corresponding harris responses.
![]() |
The ROC curves for the graf dataset (img1.ppm and img2.ppm) The ROC curve for the RIFT descriptor does relatively well but does not beat the SIFT descriptor. However, it does outperform the SIFT descriptor for low false positive rates. . Also interesting is the fact that it does not benefit that much from the ratio test unlike the SIFT descriptor. The window descriptor used is simply the intensity values in a rectangular window of size 7x7 centered around the feature point. |
![]() |
The ROC curves for the Yosemite dataset
The ROC curves for Yosemite dataset also show similar trends. Its interesting to note here that the ratio test actually decreases the accuracy of the window descriptor. A possible explanation could be that these two image have little variance and SSD is a more accurate measure for a highly discriminative descriptor like the window descriptor. |