Feature Detection, Description and Matching

Report by Ankit Gupta



Contents

Feature Detection
Feature Descriptors
Feature Matching
Evaluation
Extra credit work
Conclusion
References
 

Feature Detection [back to top]
                                                                                                                                         
Step 1: Harris operator is applied on the images to generate corner points and points having response greater than 0.00005 are retained. The threshold was chosen so that it gives sufficient number of descriptors even for the blurriest of the images (like the last image in the dataset of bikes). The following images show results of the operator on some images.

Graf sequence:

                                   

Yosemite sequence:
                                    


Step 2: The corner points from the previous stage are filtered out to give only the points whose response function is a local maxima. The patch size used was 5X5 for this.

Step 3Removing Edge responses - Many times, the detected features are clustered along a strong gradient like a long running edge. Thus we estimate the ratio of radii of curvatures from the harris corner function and filter out features where the ratio of the largest to smallest radius is greater than 12.1. This threshold value was decided based on the experiments conducted by Lowe et al [1]. The following figure shows the reduction of such interest points.

4637 features before 2938 features after edge response filtering
Note the filtering of many features lying on image edges


I did not observe a significant improvement by using this technique since the harris corner operator implicitly maximizes the smallest eigenvalue (magnitude of change in a direction) thus achieving this to some extent.

Step 4Adaptive non maximal suppression - For better matching and recognition in databases, it is good to limit the number of features in each image. I set this threshold to 1200 empirically by observing performance on provided datasets. One way to truncate features is to just pick first 1200 features in order of their harris function value. But this gives a bad distribution of the features over the image. Instead, I implemented MOPS [2] which adaptively chooses features based on their proximity to each other thus giving a well distributed set of features over the image. This comparison of feature distribution can be seen in the following figure.

Choosing top 1200 features based on highest response to Harris operator Choosing 1200 features based on adaptive non maximal suppression

Note that while features are concentrated in the left image to textured regions, using the new suppression scheme uniformly distributed features over the image.

Step 5: Estimation of feature orientation - Usually patch based descriptors are only able to cover pure translations since they are always rectangular in nature with sides aligned with the image axes. If we first estimate the feature orientation somehow, and then first rotate the patch before sampling its values, the descriptor can become more robust to rotations.

This is done by dividing sampling a 3X3 patch centered at that pixel and convolving the gradient orientations with a Gaussian mask [ 1 2 1 | 2 4 2 | 1 2 1 ]/16. This gives us an estimate of the dominant orientation in that region. The following figure shows the oriented patches.


 

Feature Descriptors  [back to top]
                                                                                                                             
Features descriptors should ideally should be sufficiently invariant to illumination changes, affine transformations, blur etc. At the same time they should be sufficiently discriminative. I tried out various techniques to achieve the goal with mixed results. Here I describe the idea and implementation details of various descriptors.

Color Patches in RGB space - This is the naive method where for each interest point I store a 7X7 RGB patch centered at that pixel. This 49 dimensional vector is then used for matching features. I do not orient the patches first for this case since it had to be a trivial descriptor for baseline comparisons. This method will be translation invariant but problems will occur in rotation, blur and illumination changes (rather any transformation which changes the color space).

Gray Normalized patches - To counter the illumination changes in the scene and other color space modifications, I consider the gray values and normalize the rotated patch by resizing the range from [0,1]. 7X7 patches give a 49 dimensional descriptor.

Color Patches in YIQ space with normalized Y - I thought using colors will give us additional discriminative power and hence implemented the above version in YIQ color space. The image was first transformed in the YIQ space from RGB, the assumption being that the illumination changes in the scene will only affect the Y channel and not the chrominance values. So the Y channel for the rotated patch was normalized as before and the three channels were now used to give a {3X(7X7)=147} dimensional descriptor.

Gray histograms - When we create vectors from sequential patch values as in the above cases, the descriptor should become sensitive to the correct patch orientation and also minor shifts in interest point detectors. Doing a vector difference based approach for matching can now lead to error. Motivated by this, I decided to create patch histograms for gray values. Histograms are good because they will discount any minor positional errors in the process. I used 8 bin histogram for the values of the rotated 9X9 gray patch. All histogram entries were given a weight of 1 while adding and in the end the histogram was normalized to give a unit length 8 dimensional vector. The normalization step was required to counter any scaling or shifts in illumination. The gray histograms turn out to be very less discriminative and more much more invariant leading to bad performance.

Color histograms - Motivated by the less discriminative power of gray histograms, I tried to make it more discriminative by adding histograms for the color values also. I worked in YIQ space and had a 8 bin histogram for each channel Y, I, Q. The Y channel was normalized to counter the illumination changes. The rest details remain same as above and we get a 24 dimensional vector. These features still turn out to be less discriminative and more invariant leading to bad performance.

Gradient orientation histograms - SIFT feature descriptors motivate the line that any changes to image illumination and color spaces do not affect the gradients. Also, any blurring processes will affect the gradient magnitudes but the directions will be similar. Further, to make it invariant to small positional changes to some extent, histograms will be a better idea. Motivated by this, I implemented a gradient orientation histogram descriptor with slightly different approaches.


Figure showing gradient orientation histogram descriptors (courtesy David Lowe[1])

As shown in above figure (courtesy David Lowe [1]), for each interest point oriented patch, we consider a set of subpatches (2X2) above. Each subpatch is of size 4X4. For each subpatch we compute an 8 bin histogram of gradient orientations with the histogram weights weighted by the gradient magnitude at that point. Each histogram is further rotated to bring the most dominant gradient direction at the first index. This helps in making the descriptors rotation invariant.

Hence for the above figure, we get a 8X(2X2) dimensional descriptor. Each descriptor is normalized to make the L2 norms a feasible distance metric between features. I also tried variations to this descriptor where I used 16 subpatches instead of 4. I also tried to use just one single patch instead of dividing into subpatches for histograms. The performance is mixed for different image sets. The logic for using subpatches is to make the descriptor more discriminative than a single big histogram.
 

Feature Matching [back to top]
                    
Nearest neighbor match - This is the trivial matching approach where the best match is decided the minimum euclidean distance.

Ratio test - To make the matching for discriminative, we can use the ratio of the distance of the given feature from its nearest match to distance from the second nearest match as a metric.

Bidirectional ratio test - To make the ratio test even more discriminative, I tried to use a bidirectional approach. So given two feature sets, I compute the best match for the features in both the feature sets using simple ratio test. Then I consider (f1,f1') to be a match if f1 match to f1' as well as f1' matched to f1. Note that in the regular ratio test, this is only mono-directional. This approach reduced the number of false matches but did not give the statistical performance boost to the system. Visually the matched features are much better as expected.

I used kd-tree based ANN library provided at [3] to increase the query speed for the ratio test.
 

Evaluation [back to top]
                                                                                                                                               
Descriptor convention -

A - RGB patches
B - Rotated illumination normalized gray scale patches
C - Rotated illumination normalized color patches
D - Illumination normalized gray value histograms for rotated patches
E - Illumination normalized color value histograms for rotated patches
F - Gradient orientation histograms (2X2 subpatch)
G - Gradient orientation histograms (4X4 subpatch)

Generally, the gradient orientation histograms seem to work better than other descriptors. Though there are cases where some descriptor gives better results that different matching algorithm (say bidirectional ratio test in place of regular ratio test). Overall, the regular ratio test is seen to give better results. Following table gives the AUC values for the four benchmarks for the ratio test with various tried descriptors.

AUC for benchmark datasets (regular ratio test)

Graf Leuven Bikes Wall
A 0.536 0.523 0..559 0.563
B 0.562 0.609 0.547 0.579
C 0.598 0.611 0.525 0.583
D 0.507 0.543 0.510 0.482
E 0.520 0.481 0.571 0.572
F 0.583 0.563 0.590 0.564
G 0.587 0.649 0.622 0.562

Best AUC values obtained over different descriptors and matching algorithm

AUC Descriptor Matching algorithm
Graf 0.676 Rotated illumination normalized gray scale patches (B) Bidirectional ratio
Leuven 0.649 Gradient orientation histograms (4X4 subpatch) (G) Regular ratio test
Bikes 0.7442 Gradient orientation histograms (2X2 subpatch) (F) Nearest neighbor
Wall 0.634 Gradient orientation histograms (2X2 subpatch) (F) Nearest neighbour


Caveat -  There is a small bug in the original provided code because of which one can get a very high AUC value even if the descriptor is bad. This is the case when you end up having very low number of actual positives (which will be the case if the descriptor is bad). In that case, the AUC is governed by very few points leading to a linear approximation giving high areas. Also where we are creating ROC curve points, we should check if the denominators for division are zero and if yes, we should explicitly set the quotient to zero instead of default "division by zero" value. The above AUCs have been generated with the changed code.

Descriptor comparison

Descriptor Translation Rotation Blur Illumination Color space change Positional inaccuracy
RGB patches yes no no no no no
Rotated illumination normalized gray scale patches yes yes no yes yes no
Rotated illumination normalized color patches yes yes no yes no no
Illumination normalized gray value histograms for rotated patches yes yes no yes yes yes
Illumination normalized color value histograms for rotated patches yes yes no yes no yes
Gradient orientation histograms (2X2 subpatch) yes yes yes yes yes yes
Gradient orientation histograms (4X4 subpatch) yes yes yes yes yes yes

The values "yes" above do not indicate that the descriptor is 100% invariant to that transformation. It just means that motivation to counter that transformation is built into the descriptor structure and it does perform reasonably well. In general, the answers above may have anomalies when we consider particular datasets under certain circumstances but the comparison is based on the descriptor stucture.

Required ROC Curves

These curves have been generated for the best performing descriptor among the set for that sequence.

Graf sequence Yosemite sequence
 
More ROC Curves

A - RGB patches
B - Rotated illumination normalized gray scale patches
C - Rotated illumination normalized color patches
D - Illumination normalized gray value histograms for rotated patches
E - Illumination normalized color value histograms for rotated patches
F - Gradient orientation histograms (2X2 subpatch)
G - Gradient orientation histograms (4X4 subpatch)

ROC curves for Graf sequence ROC curves for Yosemite squence
 
Sample Matches



The above two images present a reasonalbly hard case and have blurring, translation, color and orientation differences. I selected the clusters of features in the left image (origins of yellow lines) and tried to match them. The yellow lines show the corresponding regions and we do get some valid matches. For this the threshold for the ratio test was 0.95 and Gradient orientation (4X4 subpatches) descriptors were used. There are some false matches also due to the relatively high threshold for the ratio test. The gradient orientation descriptors perform better since they are sufficiently able to counter the differences present (as mentioned in the tabulated comparison of features above).

 

Extra credit work [back to top]
                                                                                                                             
1. Removing edge responses (explained earlier)
2. Adaptive non maximal suppression (explained earlier)
3. Use of kd-trees for query - I used the ANN library provided at [3].
4. Bidirectional matching function (explained earlier)


 
Conclusion [back to top]

Harris Corner interest point detection is good and gives satisfactory results. But it will be nice to use a scale invariant detector like that in SIFT [1] which detects more better features than just corners. By the experiments the main thing that comes out is the delicate balance between descriptor's invariance and discriminative power. Different descriptors are seen to give the best performance for each of the sets. Of the current set the gradient orientation histogram is structurally superior and performs better in a general set. Another important factor is the number of features that we need to extract from the image. This involves a tradeoff and adaptive non maximal suppression scheme helps detecting a feature set which is well distributed over the entire image. In the matching process, rati test definitely performs better than the nearest ssd neighbor approach. Using birectional ratio test does improve results but also reduces the number of matches.

The final choice of the factors discussed above usually depends on the application we want to develop. For example, using colored rotated patches should suffice for creating panoramas but not for object recognition in general environments. I observed that the types of features implemented here did not work well for recognizing objects.

The state of art, SIFT, is a nicely engineered system which gives good enough results for various feature detection and matching applications.


References

[1] 
David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
[2]  M. Brown, R. Szeliski, and S. Winder. "Multi-image matching using multi-scale oriented patches," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2005), volume I, pages 510-517, San Diego, CA, June 2005.
[3] David M. Mount and Sunil Arya. " ANN: A Library for Approximate Nearest Neighbor Searching", http://www.cs.umd.edu/~mount/ANN/