CSE 576
Computer Vision
Project 1:
Feature Detection and Matching
Mengyuan Liu
liumyzoe@uw.edu
April 18, 2013
Introduction
In this project, I
developed a set of algorithm for feature detector, feature descriptor and
feature matching to identity discriminating features from an image and search
for matching features in other images. Especially, my implementation was
designed to detect features that are reasonably invariant to translation,
rotation, illumination and some scaling.
Methods
Feature Detector
For the feature detector, I tried two implementations of Harris Operator as described in class and textbook.
For the simple implementation:
á Smooth the image with a Gaussian kernel (sigma =1)
á Calculate the gradient of each pixel with kernel [-1 0 1] (in x direction) and [-1 0 1]T (in y direction)
á For each pixel in the given image, computer the Harris Matrix H as a weighted summation of all pixel in the 5*5 detector window.
á Compute corner strength function for each pixel c(H) = det(H)/tr(H).
á Select pixels as features of interest if its corner strength is local maxima in a 3*3 neighborhood above certain threshold.
á Output the features of interest to further descriptor.
For the more detailed implementation: (from textbook page 214)
á Compute the horizontal and vertical derivative of the image by convolving the original image with derivatives of Gaussians, which is 3*3 Sobel kernel. (This step is same with first smooth the image with 3*3 Gaussian kernel, then compute the gradient.)
á Compute the Ix2, Iy2, and IxIy image corresponding to the outer products of these gradients. These 3 images are the entries of Harris matrix H of all pixels.
á Convolve each of these images with a 7*7 Gaussian. This step equals computing Gaussian weighted summation of Ix2, Iy2, and IxIy respectively within a 7*7 window.
á Compute corner strength function for each pixel c(H) = det(H)/tr(H).
á Select pixels as features of interest if its corner strength is local maxima in a 3*3 neighborhood above certain threshold.
á Output the features of interest to further descriptor.
Feature Descriptor
For the feature descriptor, I tried two different descriptors.
For the simple descriptor:
á 5*5 window around the given feature was used as feature descriptor.
á Followed by normalization to account to illumination variance.
á This descriptor should achieve illumination and translation invariant.
For the more advanced descriptor, which adds rotation invariant:
á Given a detected feature, its dominant orientation was computed from the smoothed local gradient. The dominant orientation was divided into 8 bins.
á Take a 41*41 patch around the detected feature and rotate the patch towards its dominant orientation. The pixel value after rotation was calculated through bilinear interpolation.
á Normalize within the 41*41 patch.
á Down sample the 41*41 patch to 9*9 patch.
á Use this 9*9 patch to be a 81-element feature descriptor.
Feature Matching
After feature detection and description, feature matching was realized through two methods: SSD (given in skeleton code) and ratio SSD test. For the ratio SSD matching, the ratio between nearest SSD and the second nearest SSD was used as the matching score between two features from two images. By doing this, some ambiguous matches, especially in cases when images contain a lot of repetitive patterns or similar features, could be excluded.
Results
ROC Curves (left: Yosemite; right: graf)
Images of Harris
Operator
Yosemite images: (left: Yosemite1; right: Yosemite2)
graf images: (left: graf1; right: graf2)
Average Error and AUC
for Benchmark Images
|
Window + SSD |
Window + ratio |
MOPS + SSD |
MOPS + ratio |
||||
|
AvgError |
AgeAUC |
AvgError |
AvgAUC |
AvgError |
AvgAUC |
AvgError |
AvgAUC |
Bike |
377.99 |
0.665 |
375.51 |
0.421 |
438.04 |
0.863 |
433.19 |
0.702 |
Graf |
301.05 |
0.445 |
297.73 |
0.402 |
300.09 |
0.500 |
298.04 |
0.432 |
Leuven |
316.62 |
0.788 |
292.13 |
0.565 |
311.18 |
0.762 |
291.33 |
0.600 |
wall |
322.46 |
0.671 |
310.35 |
0.487 |
379.17 |
0.621 |
373.57 |
0.498 |
Tests on Other Images
I also tested my algorithm on this following set of images. There are translation, illumination difference, a little scaling, and some rotation between these two images.
Feature detection and matching results are shown below. We can see that performances of illumination invariance, illustration invariance and rotation invariance (not much rotation between two images) are satisfactory. And small amount of scale invariance is achieved, too.
Discussion
My approach includes three components that handle rotation invariance, illumination invariance, and translation invariance. Translation invariance is achieved by comparison to all features in the to-be-matched image. Illumination invariance is realized by, for each patch, subtracting mean within the patch and divided by the standard deviation of the patch. Rotation invariance is achieved by computing each patchÕs dominant orientation and rotate all patches by this degree before comparing to other features.
Ratio matching mainly improves performance when a lot of repeats or many similar features existing in the image. However, most of the images donÕt have these. Therefore, the average AUC, which generally speaking is an indicator of True Positives versus False Positives may not reflect great improvement. However, the improvement could be clearly seen in average number of errors of matching when compared to the ground truth. The results above were generated with ratio threshold of 0.97, which is very high. If we adjust this threshold lower, more obvious improvement in average error would reveal.