Given a image at level l, the first step is to find interest points in the image. For every point in the image. A Harris matrix can be generated as follow [1]:
The "corner" strength function is
Followings are Harris image for Yosemite and graf:
After calculating the Harris value, what we want are the points with the largest value, which means it is more likely to be a corner. One method is to set a threshold and select the points with value larger than it. However, it is not easy to choose a proper threshold adaptively and the selected points may have many overlap. Instead of setting a threshold for the whole image, the Adaptive Non-Maximal Suppression algorithm can select the interest points with better spatial distribution.
The main idea of this method is to select the point which is maximum over a large area. A minimum suppression radius $r_i$ calculated to measure the area, as:
where $x_i$ is a 2D interest point image location and $\mathcal{I}$ is the set of all interest point locations. The value $c = 0.9$ is to ensure that neighbor must have significantly higher strength. In this project, we select the 500 interest points with largest values of $r_i$
The following figures compare the results of using regular detection (local maximum in the $3\times 3$ window and above a threshold) to the AMNS. It is obvious that features are distributed better across the whole image.
[back to top]Similar as discussed in the lecture sides, an orientation anlge θ=atan(dy⁄dx) could be derived. After getting the angle, I rotation the 40×40 window to the horizontal direction. After rotation, an 8×8 image patch is sampled with a stepsize of 5 in the 40×40 window. We use this 64 values as the descriptor vector
After sampling, the descriptor vector should be normalized to make the features invariant to intensity. Finally, the feature vector will have zero mean and one standard deviation.
[back to top]Two matching methods are evaluated in this project: sum of square difference (SSD) distance and ratio score. SSD is the summation of square differences between elements of the two descriptors. I implemented the other method called ratio test. The ratio score is the ratio of the best match SSD and the second best match SSD. A smaller ratio score means a better feature point.
[back to top]average error | average AUC | |
bikes | 245.904784 | 0.635211 |
graf | 289.679639 | 0.568391 |
leuven | 262.119331 | 0.649965 |
wall | 254.218998 | 0.533478 |
average error | average AUC | |
bikes | 245.904784 | 0.899729 |
graf | 289.679639 | 0.680809 |
leuven | 262.119331 | 0.861150 |
wall | 254.218998 | 0.746586 |
I also evaluate different scenarioes those are w/ and w/o the pyramid structure and ANMS to see whether they improve the performance.
average error | average AUC | |
bikes | 304.108791 | 0.849347 |
graf | 303.153780 | 0.698413 |
leuven | 283.916442 | 0.836846 |
wall | 315.313407 | 0.818150 |
We can see the the error increase for all 4 dataset and the for "bikes" and "leuven" the average AUC also drops.
average error | average AUC | |
bikes | 183.982601 | 0.889333 |
graf | 239.170204 | 0.650636 |
leuven | 264.051454 | 0.874434 |
wall | 231.801160 | 0.739099 |