Homework #1 – Feature Detectors, Descriptors, and Matching

For my feature descriptor, I chose to implement an oriented, normalized descriptor. It attempted to provide rotational invariance by orienting each description to a certain angle, as well as robustness to different lighting conditions by normalizing the mean and variance of each descriptor.

Each descriptor was defined as a 5x5 window of pixels about the “point of interest” found through the corner detection algorithm discussed in class. This window of pixels was rotated to orient the data according to the direction yielded by the gradient of the image. This rotation was chosen because we are already calculating the gradient in the Harris corner detection algorithm, therefore this angle calculation comes at very little extra cost, and in experiments has proven to be reasonably accurate. Implementation note: to perform a rotation that will not have areas of missing pixels, I rotated larger (9x9) patches, and then cut out the center 5x5 pixel area such that the data in my feature set was always natural data, and not synthetic blackness. I used a linear interpolator for my rotations as it appears that the bicubic interpolator for transformations in the code given us is not functional and simply returns synthetic blackness.

Normalization was effected by simply demeaning and dividing out the variance of the feature descriptor. This compresses the dynamic range expansion of different lighting conditions, hopefully increasing the ability of this algorithm to work across these conditions.

Performance

ROC curve for the graf dataset left, ROC curves for the Yosemite dataset right

Dataset Name	Average Area Under Curve
bikes	0.346
graf	0.614
leuven	0.667
wall	0.602

Harris images for graf/img1.ppm (left) and Yosemite1.jpg (right)

Performance of my feature descriptor was, predictably, much worse than SIFT. In particular, the bikes and Yosemite image sets seem to give my algorithm trouble. I imagine this is due to the smoothness of the image sets, as the blurring applied to the bikes set and the smooth face of the rock and sky in Yosemite cause my relatively small (5x5) feature descriptor sizes to not properly discriminate amongst themselves. To fix this, I would implement a system similar to that described in the MOPS paper, taking a downsampled, larger-scale area as my feature descriptor to more accurately capture the image around that point.

Comparison of matched personal images, showing good rotational matching