Contents
Feature Detection
Feature Descriptors
Feature Matching
Evaluation
Extra credit work
Conclusion
References
Step 1: Harris operator is
applied on the images to generate corner points and points having
response greater than 0.00005 are retained. The threshold was chosen so
that it gives sufficient number of descriptors even for the blurriest
of the images (like the last image in the dataset of bikes). The
following images show
results of the operator on some images.
Graf sequence:
Yosemite sequence:
Step 2: The corner points from
the previous stage are filtered out to give only the points whose
response function is a local maxima. The patch size used was 5X5
for this.
Step 3:
Removing Edge responses
- Many times, the detected features are clustered along a strong
gradient like a long running edge. Thus we estimate the ratio of radii
of curvatures from the harris corner function and filter out features
where the ratio of the largest to smallest radius is greater than 12.1.
This threshold value was decided based on the experiments conducted by
Lowe et al [1]. The following figure shows the reduction of such
interest points.
 |
 |
4637 features before |
2938 features after edge response filtering
Note the filtering of many features lying on image edges |
I did not observe a significant improvement by using this
technique since the harris corner operator implicitly maximizes the
smallest eigenvalue (magnitude of change in a direction) thus achieving
this to some extent.
Step 4:
Adaptive non maximal suppression
- For better matching and recognition in databases, it is good to limit
the number of features in each image. I set this threshold to 1200
empirically by observing performance on provided datasets. One way to
truncate features is to just pick first 1200 features in order of their
harris function value. But this gives a bad distribution of the
features over the image. Instead, I implemented MOPS [2] which
adaptively chooses features based on their proximity to each other thus
giving a well distributed set of features over the image. This
comparison of feature distribution can be seen in the following figure.
 |
 |
Choosing top 1200 features based on highest response to Harris operator |
Choosing 1200 features based on adaptive non maximal suppression |
Note
that while features are concentrated in the left image to textured
regions, using the new suppression scheme uniformly distributed
features over the image.
Step 5: Estimation of feature orientation - Usually
patch based descriptors are only able to cover pure translations since
they are always rectangular in nature with sides aligned with the image
axes. If we first estimate the feature orientation somehow, and
then first rotate the patch before sampling its values, the descriptor
can become more robust to rotations.
This is done by dividing sampling a 3X3 patch centered at that pixel
and convolving the gradient orientations with a Gaussian mask [ 1 2 1 |
2 4 2 | 1 2 1 ]/16. This gives us an estimate of the dominant
orientation in that region. The following figure shows the oriented
patches.
Features descriptors should ideally should be
sufficiently invariant to illumination changes, affine transformations,
blur etc. At the same time they should be sufficiently discriminative.
I tried out various techniques to achieve the goal with mixed results.
Here I describe the idea and implementation details of various
descriptors.
Color Patches in RGB space - This is the naive method where for each interest point I store a 7X7
RGB patch centered at that pixel. This 49 dimensional vector is then
used for matching features. I do not orient the patches first for this
case since it had to be a trivial descriptor for baseline comparisons.
This method will be translation invariant but problems will occur
in rotation, blur and illumination changes (rather any transformation
which changes the color space).
Gray Normalized patches - To counter the illumination changes in the scene and other color
space modifications, I consider the gray values and normalize the
rotated patch by resizing the range from [0,1]. 7X7 patches give a 49
dimensional descriptor.
Color Patches in YIQ space with normalized Y - I thought using colors will give us additional discriminative
power and hence implemented the above version in YIQ color space. The
image was first transformed in the YIQ space from RGB, the assumption
being that the illumination changes in the scene will only affect the Y
channel and not the chrominance values. So the Y channel for the
rotated patch was normalized as before and the three channels were now
used to give a {3X(7X7)=147} dimensional descriptor.
Gray histograms - When we create vectors from sequential patch values as in the
above cases, the descriptor should become sensitive to the correct
patch orientation and also minor shifts in interest point detectors.
Doing a vector difference based approach for matching can now lead to
error. Motivated by this, I decided to create patch histograms for gray
values. Histograms are good because they will discount any minor
positional errors in the process. I used 8 bin histogram for the values
of the rotated 9X9 gray patch. All histogram entries were given a
weight of 1 while adding and in the end the histogram was normalized to
give a unit length 8 dimensional vector. The normalization step was
required to counter any scaling or shifts in illumination. The gray histograms turn out to be very
less discriminative and more much more invariant leading to bad
performance.
Color histograms - Motivated by the less discriminative power of gray histograms, I
tried to make it more discriminative by adding histograms for the color
values also. I worked in YIQ space and had a 8 bin histogram for each
channel Y, I, Q. The Y channel was normalized to counter the
illumination changes. The rest details remain same as above and we get
a 24 dimensional vector. These features still turn out to be less
discriminative and more invariant leading to bad performance.
Gradient orientation histograms - SIFT feature descriptors motivate the line that any
changes to image illumination and color spaces do not affect the
gradients. Also, any blurring processes will affect the gradient
magnitudes but the directions will be similar. Further, to make it
invariant to small positional changes to some extent, histograms will
be a better idea. Motivated by this, I implemented a gradient
orientation histogram descriptor with slightly different approaches.

|
Figure showing gradient orientation histogram descriptors (courtesy David Lowe[1]) |
As shown in above figure (courtesy David Lowe
[1]), for each interest point oriented patch, we consider a set of
subpatches (2X2) above. Each subpatch is of size 4X4. For each subpatch
we compute an 8 bin histogram of gradient orientations with the
histogram weights weighted by the gradient magnitude at that point.
Each histogram is further rotated to bring the most dominant gradient
direction at the first index. This helps in making the descriptors
rotation invariant.
Hence for the above figure, we get a 8X(2X2) dimensional descriptor.
Each descriptor is normalized to make the L2 norms a feasible distance
metric between features. I also tried variations to this descriptor
where I used 16 subpatches instead of 4. I also tried to use just
one single patch instead of dividing into subpatches for histograms.
The performance is mixed for different image sets. The logic for using
subpatches is to make the descriptor more discriminative than a single
big histogram.
Nearest neighbor match - This is the trivial matching approach where the best match is decided the minimum euclidean distance.
Ratio test - To make the
matching for discriminative, we can use the ratio of the distance of
the given feature from its nearest match to distance from the second
nearest match as a metric.
Bidirectional ratio test
- To make the ratio test even more discriminative, I tried to use a
bidirectional approach. So given two feature sets, I compute the best
match for the features in both the feature sets using simple ratio
test. Then I consider (f1,f1') to be a match if f1 match to f1' as well
as f1' matched to f1. Note that in the regular ratio test, this is only
mono-directional. This approach reduced the number of false matches but did
not give the statistical performance boost to the system. Visually the matched features are much better as expected.
I used kd-tree based ANN library provided at [3] to increase the query speed for the ratio test.
Descriptor convention -
A - RGB patches
B - Rotated illumination normalized gray scale patches
C - Rotated illumination normalized color patches
D - Illumination normalized gray value histograms for rotated patches
E - Illumination normalized color value histograms for rotated patches
F - Gradient orientation histograms (2X2 subpatch)
G - Gradient orientation histograms (4X4 subpatch)
Generally, the gradient orientation histograms seem to work better than
other descriptors. Though there are cases where some descriptor gives
better results that different matching algorithm (say bidirectional
ratio test in place of regular ratio test). Overall, the regular ratio
test is seen to give better results. Following table gives the AUC
values for the four benchmarks for the ratio test with various tried
descriptors.
AUC for benchmark datasets (regular ratio test)
|
Graf |
Leuven |
Bikes |
Wall |
A |
0.536 |
0.523 |
0..559 |
0.563 |
B |
0.562 |
0.609 |
0.547 |
0.579 |
C |
0.598 |
0.611 |
0.525 |
0.583 |
D |
0.507 |
0.543 |
0.510 |
0.482 |
E |
0.520 |
0.481 |
0.571 |
0.572 |
F |
0.583 |
0.563 |
0.590 |
0.564 |
G |
0.587 |
0.649 |
0.622 |
0.562 |
Best AUC values obtained over different descriptors and matching algorithm
|
AUC |
Descriptor |
Matching algorithm |
Graf |
0.676 |
Rotated illumination normalized gray scale patches (B) |
Bidirectional ratio |
Leuven |
0.649 |
Gradient orientation histograms (4X4 subpatch) (G) |
Regular ratio test |
Bikes |
0.7442 |
Gradient orientation histograms (2X2 subpatch) (F) |
Nearest neighbor |
Wall |
0.634 |
Gradient orientation histograms (2X2 subpatch) (F) |
Nearest neighbour |
Caveat - There
is a small bug in the original provided code because of which one can get a very high AUC
value even if the descriptor is bad. This is the case when you end up
having very low number of actual positives (which will be the case if
the descriptor is bad). In that case, the AUC is governed by very
few points leading to a linear approximation giving high areas. Also
where we are creating ROC curve points, we should check if the
denominators for division are zero and if yes, we should explicitly set
the quotient to zero instead of default "division by zero" value. The
above AUCs have been generated with the changed code.
Descriptor comparison
Descriptor |
Translation |
Rotation |
Blur |
Illumination |
Color space change |
Positional inaccuracy |
RGB patches |
yes |
no |
no |
no |
no |
no |
Rotated illumination normalized gray scale patches |
yes |
yes |
no |
yes |
yes |
no |
Rotated illumination normalized color patches |
yes |
yes |
no |
yes |
no |
no |
Illumination normalized gray value histograms for rotated patches |
yes |
yes |
no |
yes |
yes |
yes |
Illumination normalized color value histograms for rotated patches |
yes |
yes |
no |
yes |
no |
yes |
Gradient orientation histograms (2X2 subpatch) |
yes |
yes |
yes |
yes |
yes |
yes |
Gradient orientation histograms (4X4 subpatch) |
yes |
yes |
yes |
yes |
yes |
yes |
The values "yes" above do not indicate that the descriptor is 100%
invariant to that transformation. It just means that motivation to
counter that transformation is built into the descriptor structure and
it does perform reasonably well. In general, the answers above may have
anomalies when we consider particular datasets under certain
circumstances but the comparison is based on the descriptor stucture.
Required ROC Curves
These curves have been generated for the best performing descriptor among the set for that sequence.
 |
 |
Graf sequence |
Yosemite sequence |
More ROC Curves
A - RGB patches
B - Rotated illumination normalized gray scale patches
C - Rotated illumination normalized color patches
D - Illumination normalized gray value histograms for rotated patches
E - Illumination normalized color value histograms for rotated patches
F - Gradient orientation histograms (2X2 subpatch)
G - Gradient orientation histograms (4X4 subpatch)
 |
 |
ROC curves for Graf sequence |
ROC curves for Yosemite squence |
Sample Matches
The above two images present a reasonalbly hard case and have blurring,
translation, color and orientation
differences. I selected the clusters of features in the left image
(origins of yellow lines) and tried to match them. The yellow lines
show the corresponding regions and we do
get some valid matches. For this the threshold for the ratio test was
0.95 and Gradient orientation (4X4 subpatches) descriptors were used.
There are some false matches also due to the relatively high threshold
for the ratio test. The gradient orientation descriptors perform better
since they are sufficiently able to counter the differences present (as
mentioned in the tabulated comparison of features above).
1. Removing edge responses (explained earlier)
2. Adaptive non maximal suppression (explained earlier)
3. Use of kd-trees for query - I used the ANN library provided at [3].
4. Bidirectional matching function (explained earlier)
Harris Corner interest point detection is good and gives satisfactory
results. But it will be nice to use a scale invariant detector like
that in SIFT [1] which detects more better features than just corners.
By the experiments the main thing that comes out is the
delicate balance between descriptor's invariance and discriminative
power. Different descriptors are seen to give the best performance for
each of the sets. Of the current set the gradient orientation histogram
is structurally superior and performs better in a general set. Another
important factor is the number of features that we
need to extract from the image. This involves a tradeoff and adaptive
non maximal suppression scheme helps detecting a feature set which is
well distributed over the entire image. In the matching process, rati
test definitely performs better than the nearest ssd neighbor approach.
Using birectional ratio test does improve results but also reduces the
number of matches.
The final choice of the factors discussed above usually depends on the
application we want to develop. For example, using colored rotated
patches should suffice for creating panoramas but not for object
recognition in general environments. I observed that the types of
features implemented here did not work well for recognizing objects.
The
state of art, SIFT, is a nicely engineered system which gives good
enough results for various feature detection and matching applications.
References
[1]
David G. Lowe,
"Distinctive image features from scale-invariant keypoints,"
International Journal of Computer Vision,
60, 2 (2004), pp. 91-110.
[2]
M. Brown, R. Szeliski, and S. Winder.
"Multi-image matching using multi-scale oriented patches," IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR'2005), volume I, pages 510-517, San Diego, CA,
June 2005.
[3] David M. Mount and Sunil Arya. "
ANN: A Library for Approximate Nearest Neighbor Searching", http://www.cs.umd.edu/~mount/ANN/