Feature Detection and Matching

Objective : Detect discriminating features in an image and find the best matching features in other images.

Feature Detection

In this case, the interest points correspond to "corners" found by using a Harris Corner Detector. Click HERE for implementation details and results.

Feature Descriptor

I implemented the following feature descriptors:

  1. RGB window : The feature descriptor used is a vector of RGB color values taken from a N x N window centered on each corner found by the feature detector. The motivation for using this feature descriptor was that color can be an informative feature for discriminating between features. I used a value of N = 5 which creates a feature vector of length 125.
  2. Gabor wavelet features : These are filter responses to the Gabor filter at multiple scales and orientations. I used N=5 scales and M = 6 orientations ( 0, 45 , 90 , 135 , 180, 270 ). Since Gabor wavelets are composed of Gaussian functions, they can be characterized by mean and variance for each scale-orientation pair. Therefore, a feature vector of size M*N*2 = 120 is used. Gabor wavelet features are quite popular in recognition applications such as texture matching.
  3. Orientation histogram : This is a simplification of the feature descriptor in David Lowe's IJCV paper [Lowe IJCV 2001]. After the interest points are detected, the dominant orientation at that point is computed as the orientation of the dominant eigenvector in the Harris matrix for that point. The orientations in a 5 x 5 window surrounding that point are computed from the edge gradients. A histogram of these orientations relative to the dominant orientation of the interest points is computed. While binning, each sample is weighed by the corresponding edge magnitude. I used 36 ( 360/10 ) bins. The histogram forms the feature descriptor.

Feature Matching

I used two measures for matching the features:

  1. 1-Nearest Neighbor(1-NN) : The Euclidean distance is computed between the query and target feature vectors and the target location whose feature vector is closest to the feature vector corresponding to query location is considered a match.
  2. Nearest to next nearest neighbor ( 1-NN/2-NN) : The ratio of the distance corresponding to best match and second match is thresholded. A large number indicates that it is a good match.

Results

Results on benchmark datasets

The table below shows the average pixel error on the benchmark datasets graf, leuven, bikes, wall . The three rows in the graf dataset correspond to the three different feature descriptors tried (RGB window, Orientation histogram,Gabor wavelets respectively). Also shown is the performance of SIFT features for these datasets ( marked by *). The columns correspond to the feature matching method used. The simplest feature descriptor ( RGB window ) was found to give the best results and therefore, only it's performance on all the datasets is listed. Also note that the 1-NN/2-NN feature matching seems to provide inferior results compared to 1-NN. This could be due to the sensitivity of the measure to the threshold setting.
** 1-NN 1-NN/2-NN
graf 248.00 256.00
328.24 358.54
370.27 327.57
255.43* 207.44*
leuven 269.4 276.04
129.81* 122.31*
bikes 229.27 249.7
223.68* 200.14*
wall 244.56 260.36
268.77* 234.32*

The high average pixel error can be attributed to the fact that the features implemented handle simple transformations such as translation, rotation and minor illumination invariance and cannot be used reliably in affinely transformed images. Another factor is that the low repeatability of the Harris corner measure over such affinely transformed images.

Results on images taken by me

The results on some images I took are given below:

Toy



The image on the right is a viewpoint transformation. Correct matches are labeled with green lines, incorrect ones with red lines. Inspite of the simple descriptor(RGB window) and matching(1-NN) used, we still obtain correct matches.



Balcony



The image on the right side is a translated version of the one on the left. Correct matches are indicated with blue lines, incorrect ones with brown. Although all the matches are not displayed here, the % of correct matches is much greater than that obtained for the "Toy" example above.






In this, the image on the right is apprximately 30 degree rotation of the image on the left. Correct matches are shown in blue, incorrect ones with yellow. In this case also, the matches were fairly high compared to the "Toy" example, but lower than the % of correct matches in the translated version of the image as above.