Feature Detection and Matching
Objective : Detect discriminating features in an image and find the best matching features in other images.
Feature Detection
In this case, the interest points correspond to "corners" found by using a Harris Corner Detector.
Click HERE for implementation details and results.
Feature Descriptor
I implemented the following feature descriptors:
- RGB window : The feature descriptor used is a vector of RGB color values
taken from a N x N window centered on each corner found by the feature detector. The motivation for using this feature
descriptor was that color can be an informative feature for discriminating between features. I used a value of N = 5
which creates a feature vector of length 125.
- Gabor wavelet features : These are filter responses to the Gabor filter at multiple scales and
orientations. I used N=5 scales and M = 6 orientations ( 0, 45 , 90 , 135 , 180, 270 ). Since Gabor wavelets are composed
of Gaussian functions, they can be characterized by mean and variance for each scale-orientation pair. Therefore, a
feature vector of size M*N*2 = 120 is used. Gabor wavelet features are quite popular in recognition applications such
as texture matching.
- Orientation histogram : This is a simplification of the feature descriptor in David Lowe's IJCV
paper [Lowe IJCV 2001]. After the interest points are detected, the dominant orientation at that point is computed
as the orientation of the dominant eigenvector in the Harris matrix for that point. The orientations in a 5 x 5 window
surrounding that point are computed from the edge gradients. A histogram of these orientations relative to the dominant
orientation of the interest points is computed. While binning, each sample is weighed by the corresponding edge
magnitude. I used 36 ( 360/10 ) bins. The histogram forms the feature descriptor.
Feature Matching
I used two measures for matching the features:
- 1-Nearest Neighbor(1-NN) : The Euclidean distance is computed between the query and target feature vectors and
the target location whose feature vector is closest to the feature vector corresponding to query location is considered
a match.
- Nearest to next nearest neighbor ( 1-NN/2-NN) : The ratio of the distance corresponding to best match and second
match is thresholded. A large number indicates that it is a good match.
Results
Results on benchmark datasets
The table below shows the average pixel error on the benchmark datasets graf, leuven, bikes, wall . The three rows in the graf dataset correspond to
the three different feature descriptors tried (RGB window, Orientation histogram,Gabor wavelets respectively). Also shown is the performance of SIFT features for these
datasets ( marked by *). The columns correspond to the feature
matching method used. The simplest feature descriptor ( RGB window ) was found to give the best results and therefore, only it's performance on all the datasets
is listed. Also note that the 1-NN/2-NN feature matching seems to provide inferior results compared to 1-NN. This could be due to the sensitivity of the measure
to the threshold setting.
** | 1-NN | 1-NN/2-NN |
graf | 248.00 | 256.00 |
328.24 | 358.54 |
370.27 | 327.57 |
255.43* | 207.44* |
leuven | 269.4 | 276.04 |
129.81* | 122.31* |
bikes | 229.27 | 249.7 |
223.68* | 200.14* |
wall | 244.56 | 260.36 |
268.77* | 234.32* |
The high average pixel error can be attributed to the fact that the features implemented handle simple transformations such as translation,
rotation and minor illumination invariance and cannot be used reliably in affinely transformed images. Another factor is that the low repeatability of
the Harris corner measure over such affinely transformed images.
Results on images taken by me
The results on some images I took are given below:
Toy
The image on the right is a viewpoint transformation. Correct matches are labeled with green lines, incorrect ones with red lines. Inspite
of the simple descriptor(RGB window) and matching(1-NN) used, we still obtain correct matches.
Balcony
The image on the right side is a translated version of the one on the left. Correct matches are indicated with blue lines, incorrect ones
with brown. Although all the matches are not displayed here, the % of correct matches is much greater than that obtained for the "Toy" example
above.
In this, the image on the right is apprximately 30 degree rotation of the image on the left. Correct matches are shown in blue, incorrect ones
with yellow. In this case also, the matches were fairly high compared to the "Toy" example, but lower than the % of correct matches in the translated
version of the image as above.