Homework #1 – Feature Detectors,
Descriptors, and Matching
For my
feature descriptor, I chose to implement an oriented, normalized descriptor. It
attempted to provide rotational invariance by orienting each description to a
certain angle, as well as robustness to different lighting conditions by
normalizing the mean and variance of each descriptor.
Each
descriptor was defined as a 5x5 window of pixels about the “point of interest”
found through the corner detection algorithm discussed in class. This window of pixels was rotated to orient
the data according to the direction yielded by the gradient of the image. This rotation was chosen because we are
already calculating the gradient in the Harris corner detection algorithm,
therefore this angle calculation comes at very little extra cost, and in experiments
has proven to be reasonably accurate.
Implementation note: to perform a rotation that will not have areas of
missing pixels, I rotated larger (9x9) patches, and then cut out the center 5x5
pixel area such that the data in my feature set was always natural data, and
not synthetic blackness. I used a linear interpolator for my rotations as it
appears that the bicubic interpolator for
transformations in the code given us is not functional and simply returns
synthetic blackness.
Normalization
was effected by simply demeaning and dividing out the variance of the feature
descriptor. This compresses the dynamic
range expansion of different lighting conditions, hopefully increasing the
ability of this algorithm to work across these conditions.
Performance
ROC curve
for the graf
dataset left, ROC curves for the Yosemite
dataset right
Dataset
Name |
Average
Area Under Curve |
bikes |
0.346 |
graf |
0.614 |
leuven |
0.667 |
wall |
0.602 |
Harris
images for graf/img1.ppm (left) and Yosemite1.jpg (right)
Performance
of my feature descriptor was, predictably, much worse than SIFT. In particular,
the bikes and Yosemite image sets seem to give my algorithm trouble. I imagine this is due to the smoothness of
the image sets, as the blurring applied to the bikes set and the smooth face of
the rock and sky in Yosemite cause my relatively small (5x5) feature descriptor
sizes to not properly discriminate amongst themselves. To fix this, I would implement a system
similar to that described in the MOPS paper, taking a downsampled,
larger-scale area as my feature descriptor to more accurately capture the image
around that point.
Comparison
of matched personal images, showing good rotational matching