CSE 576 Project 1:
Feature Detection and Matching
Seong Jae Lee
seongjae at cs dot washington dot edu
April 14, 2008
Feature Detection
First, create a Gaussian image pyramid. Starting from original greyscale image
I(x,y) = P0(x,y)
. We build a filtered image
Pl'(x,y)
with a Gaussian kernel of std
σ = 1.0
, and subsample the filtered image with a subsamplign rate 2. We continue this till the subsampled image size is less than 20.
Pl'(x,y) = Pl(x,y) * gσ(x,y)
Pl+1 = P'l(2x, 2y)
Second, construct Harris matrix at each level
l
at position
(x, y)
with σ = 1.0 and σ' = 1.5.
Hl(x,y) =
∇σ' Pl(x,y)
∇σ' Pl(x,y)T
* gσ(x,y)
Finally, calculate Harris operator, or the
corner strength function,
fl
for each level
l
. Pick candidates with value greater than 0.0015 and local maximum in its 3x3 window. If there are more than 500 candidates, we just take 500 of them with highest value for each level.
fl(x,y) = det(Hl(x,y)) / tr(Hl(x,y))
Feature Descriptor
Find an orientation
θ
by computing the local gradient of each interesting point after applying a Gaussian filter of std
σ = 4.5
. Then, we collect 8x8 data from the subscaled image of two level above (which has 1/4 size of the current level) after rotating it by
-θ
. Of course, if the window size a selected feature is out of the image, we throw that feature out. Finally, normalize the feature data vector with a length 64 so that its mean is zero and its std is one.
[cosθ, sinθ] = u / |u|
ul(x,y) = ∇σ Pl(x,y)
Feature Matching
I'm taking the most simplest approach here: the "ratio test" with the Euclidean distance.
Performance
ROC Curves
Above, we have plots ROC curves for the naive implementation (A), more complicate implmentation (B) and SIFT. You can see larger pictures if you click the image. In "yosemite", it's really hard to see the difference between ROC curve of B and that of SIFT. In "graf", SIFT shows better performance than B's performance. In both cases, we can see ratio matching score is better than naive matching score.
Click the images for larger view.
Average AUC on 4 Benchmark Sets
| average error | average AUC |
wall | 231.918022 | 0.715908 |
bikes | 104.224911 | 0.783666 |
graf | 199.643560 | 0.630269 |
leuven | 101.071190 | 0.839841 |
Average AUC on 4 Benchmark Sets (SSD)
| average error | average AUC |
wall | 231.918022 | 0.691639 |
bikes | 104.224911 | 0.803937 |
graf | 199.643560 | 0.597237 |
leuven | 101.071190 | 0.872462 |
Average AUC on 4 Benchmark Sets (ratio)
The table above is the benchmark table of given four test sets.
We can see "bikes" and "leuven" works much better than "wall" and "graf".
Harris Operator
Harris image of "yosemite" and "graf" images. I multiplied 255 to the value to make the image visible.
Click the images for larger view.
Results on Other Images
Feature matching on translated images shows a good performance.
While we are capturing most of the features at the finest image pyramid level, (the reason some are missing is that we confined the number of features at each level to 500) we are missing many features at higher levels. I guess subsampling process is not working perfectly.
Feature matching on rotated images is almost perfect - even for 15 degree rotation.
Feature matchings on brightened image and contrasted image are almost perfect. Since we are normalizing our features, it's obvious.
Feature matching on scaled images shows horrible performance. Selected features are different, and even though there are matching features, there are cases that my algorithm cannot match them.
Discussion
My implementation was not as good as SIFT, but it's showing a decent performance. Especially in "yosemite" experiment, it's as good as SIFT descriptors. The last experiment shows that my implementation shows a good performance on rotation, translation, contrast change and brightness change, but it's not working well on scaling (even for 2x case). If I had more time, I want to refine my scaling filter.
Extra Credit
All the refined implementations are from MOSP paper.
- Bulit a Gaussian image pyramid and searched features from each level (multiscale).
- Normalized the feature array so that brightness and contrast will not matter.
- Used 8x8 feature descriptor after scaling/rotating interesting points.
References
- Computer Vision Class, University of Washington
- Wikipedia, Gaussian Blur
- Brown, M., Szeliski, R., and Winder. Multi-Scale Oriented Patches.
- Brown, M., Szeliski, R., and Winder. Multi-Image Matching Using Multi-Scale Oriented Patches.