(a) | (b) | (c) |
---|
Two separate feature descriptors were implemented to describe each feature point:
Simple 5x5 Window: A 5x5 square window around the point of interest is selected and the pixel values are serialized into a feature vector of size 25.
MOPS (simplified): A simplified version of the MOPS descriptor[2] was implemented. A 39x39 window is selected around the point of interest. The window is pre-rotated by the dominant gradient at that point (calculated above, in step 6 of the Feature Dectection process). The rotation is achieved by inversely mapping image pixels into rotated window pixels. Bilinear interpolation of values was used for points that lie between two pixels. If the rotated window falls outside the original image, the feature is discarded. Once the window patch has been extracted, it is prefiltered with a gaussian and sampled into an 8x8 patch by extracting every 5th pixel. The 8x8 patch is then serialized into a vector of length 64. The brightness values are normalized normalized by dividing by the square root of the sum of the squares. The algorithm is summarized as follows:Harris corner detection was efficient to implement as it does not involve the computation of a square root. Also precomputing dominant patch gradient at this point seemed efficient. All convolutions assume a repeating border. This reduces the corner response at edges, which is generally OK for most matching applications.
The Simple 5x5 descriptor is only invariant to position. In contrast, the MOPS descriptor is invariant to position, orientation (due to the rotated window) and illumination (due to the intensity normalization). Bilinear interpolation was used to calculate pixel intensities of the rotated patch providing sub-pixel accuracy. The multi-scale aspect of MOPS was implemented for extra credit by running the above detection and description algorithms on 4 levels of the gaussian image pyramid. The resulting features were scaled back to map to the original image. I did not implement the haar wavelet analysis mentioned in the MOPS paper and leave that for future work.
A lot of time was spent in debugging the image library provided. Off-by-one errors and image shifts were causing issues in the project. I hence decided to implement my own patch rotation and interpolation code. Modifications were made to the library's convolution code as well.
yosemite | graf |
---|---|
yosemite | graf |
---|---|
SSD | Ratio | |||
---|---|---|---|---|
Average Error | Average AUC | Average Error | Average AUC | |
graf | 282.38 | 0.542 | 282.38 | 0.577 |
wall | 346.56 | 0.490 | 346.56 | 0.618 |
leuven | 401.55 | 0.309 | 401.55 | 0.502 |
bikes | 331.32 | 0.698 | 331.32 | 0.764 |
SSD | Ratio | |||
---|---|---|---|---|
Average Error | Average AUC | Average Error | Average AUC | |
graf | 242.55 | 0.587 | 242.55 | 0.684 |
wall | 267.27 | 0.681 | 267.27 | 0.847 |
leuven | 315.18 | 0.736 | 315.18 | 0.761 |
bikes | 293.78 | 0.699 | 293.78 | 0.908 |
The custom descriptor is designed to work well with chances in translation, rotation, scale and illumination. The benchmark datasets provide variations in these attributes. In particular the leuven datasets offers large changes in illumination, resulting in slightly lower performance. The graf dataset has translation, rotation and scale and is the most difficult to match, as indicated by the AUC score.
Detecting features at multiple scales helps with the bikes and wall dataset, where a significant improvement is noted. The descriptor also performs well on 3D rotations. Further improvement may be achieved by implementing a gradient histogram based approach, such as SIFT.
Overall, the custom descriptor seems to do a good job on the datasets, but lags behind SIFT, which is the state of the art.
Test results. Click image for larger view. |
---|
Without ANMS | With ANMS (500 most dominant features) |
---|
Gaussian Image Pyramid for the Yosemite image. |
---|