Contents
  1. Feature Detection
  2. Feature Descriptor
  3. Feature Matching
  4. Performance
  5. Discussion
  6. Extra Credit
  7. References

CSE 576 Project 1:
Feature Detection and Matching

Seong Jae Lee
seongjae at cs dot washington dot edu
April 14, 2008

Feature Detection

First, create a Gaussian image pyramid. Starting from original greyscale image
I(x,y) = P0(x,y)
. We build a filtered image
Pl'(x,y)
with a Gaussian kernel of std
σ = 1.0
, and subsample the filtered image with a subsamplign rate 2. We continue this till the subsampled image size is less than 20.
Pl'(x,y) = Pl(x,y) * gσ(x,y)
Pl+1 = P'l(2x, 2y)
Second, construct Harris matrix at each level
l
at position
(x, y)
with σ = 1.0 and σ' = 1.5.
Hl(x,y) = ∇σ' Pl(x,y) ∇σ' Pl(x,y)T * gσ(x,y)
Finally, calculate Harris operator, or the corner strength function,
fl
for each level
l
. Pick candidates with value greater than 0.0015 and local maximum in its 3x3 window. If there are more than 500 candidates, we just take 500 of them with highest value for each level.
fl(x,y) = det(Hl(x,y)) / tr(Hl(x,y))

Feature Descriptor

Find an orientation
θ
by computing the local gradient of each interesting point after applying a Gaussian filter of std
σ = 4.5
. Then, we collect 8x8 data from the subscaled image of two level above (which has 1/4 size of the current level) after rotating it by
. Of course, if the window size a selected feature is out of the image, we throw that feature out. Finally, normalize the feature data vector with a length 64 so that its mean is zero and its std is one.
[cosθ, sinθ] = u / |u|
ul(x,y) = ∇σ Pl(x,y)

Feature Matching

I'm taking the most simplest approach here: the "ratio test" with the Euclidean distance.

Performance

ROC Curves

ROC Graph of yosemite
ROC Graph of graf
Above, we have plots ROC curves for the naive implementation (A), more complicate implmentation (B) and SIFT. You can see larger pictures if you click the image. In "yosemite", it's really hard to see the difference between ROC curve of B and that of SIFT. In "graf", SIFT shows better performance than B's performance. In both cases, we can see ratio matching score is better than naive matching score. Click the images for larger view.

Average AUC on 4 Benchmark Sets

average erroraverage AUC
wall231.9180220.715908
bikes104.2249110.783666
graf199.6435600.630269
leuven101.0711900.839841
Average AUC on 4 Benchmark Sets (SSD)
average erroraverage AUC
wall231.9180220.691639
bikes104.2249110.803937
graf199.6435600.597237
leuven101.0711900.872462
Average AUC on 4 Benchmark Sets (ratio)
The table above is the benchmark table of given four test sets. We can see "bikes" and "leuven" works much better than "wall" and "graf".

Harris Operator

Harris Image of yosemite
Harris Image of graf
Harris image of "yosemite" and "graf" images. I multiplied 255 to the value to make the image visible.Click the images for larger view.

Results on Other Images


Translation (original / translated)
Feature matching on translated images shows a good performance. While we are capturing most of the features at the finest image pyramid level, (the reason some are missing is that we confined the number of features at each level to 500) we are missing many features at higher levels. I guess subsampling process is not working perfectly.


Rotation (original / 15 deg / 90 deg)
Feature matching on rotated images is almost perfect - even for 15 degree rotation.


Brightness and Contrast (original / brightened / high contrasted)
Feature matchings on brightened image and contrasted image are almost perfect. Since we are normalizing our features, it's obvious.


Scale (original / 1.5x / 2.0x)
Feature matching on scaled images shows horrible performance. Selected features are different, and even though there are matching features, there are cases that my algorithm cannot match them.

Discussion

My implementation was not as good as SIFT, but it's showing a decent performance. Especially in "yosemite" experiment, it's as good as SIFT descriptors. The last experiment shows that my implementation shows a good performance on rotation, translation, contrast change and brightness change, but it's not working well on scaling (even for 2x case). If I had more time, I want to refine my scaling filter.

Extra Credit

All the refined implementations are from MOSP paper.
  1. Bulit a Gaussian image pyramid and searched features from each level (multiscale).
  2. Normalized the feature array so that brightness and contrast will not matter.
  3. Used 8x8 feature descriptor after scaling/rotating interesting points.

References