Project 1: Detection and Matching Report

CSE 576 Computer Vision, Spring 2005

Project 1: Feature Detection and Matching

Enrique Larios Delgado

Feature Description:

My code uses as its feature detector the Harris Corner-Edge Detector. It is based on the method described in [Harris 88]. The threshold that I set allowed that mostly the corner where selected. This because it is considered that corners are points where information accumulates. My detection algorithm starts visiting all the pixels in the image calculating the gradient magnitude and angle as well as the differences in x and y. The differences are used to compute the corner response as proposed by Harris. The pixels whose corner response is greater than the threshold and that are local maxima in its 3x3 neighborhood are selected as feature points.

The process of detection is applied to the first three levels of the pyramid of images in order to make the feature detector more scale invariant. My descriptor design is based in the scale invariant keypoints described in [Lowe 04]. The descriptor implementation starts with the computation of the dominant orientation of the region surrounding the keypoint. The gradient magnitudes in the 7x7 area around the keypoint are sorted in a 36 bin histogram according to the angle of the gradient. The magnitudes are weighted by a Gaussian function centered in the keypoint in order to give more importance to the gradients closer to the keypoint. The bin with the highest magnitude is used as base to compute the orientation of the region. The magnitude of the two neighboring bin is used in a linear interpolation to obtain the final value of the main orientation angle. As recommended by Lowe, if the magnitude of the second greatest bin is 80% or more of the main orientation, then my algorithm generates another keypoint with this second orientation.

With the main orientation of the neighborhood the algorithm normalizes all the gradient angles of the 7x7 patch in order to reach rotation invariance. With the new angles the assignment to the bins is recomputed. My descriptor is a 36 dimension vector that is normalized to increase its robustness to contrast change and withe bin assignment normalized to the main orientation for rotation invariance.

Design Choices:

My design choices are as follow. With the image pyramid I hoped to achieve invariance to chance of scale. The bin assignment has the main purpose of giving rotation invariance, although it also provides some resistance to affine transformation because a shift of the pixels in the neighborhood of the keypoint still contributes to the same bin.

Report:

Bikes
	Simple Window	My Design	Provided SIFT
1 to 2	307.210302	400.089969	7.281360
1 to 3	379.670285	425.163391	13.653063
1 to 4	No Keypoint detected in 4	No Keypoint detected in 4	19.995964
1 to 5	No Keypoint detected in 5	No Keypoint detected in 5	34.570192
1 to 6	No Keypoint detected in 6	No Keypoint detected in 6	49.073032

Graf
	Simple Window	My Design	Provided SIFT
1 to 2	280.019288	351.501793	14.593259
1 to 3	229.225710	307.363170	32.300461
1 to 4	284.752243	341.761410	140.227971
1 to 5	283.424408	318.527942	293.978437
1 to 6	330.458009	378.423513	326.708168

Leuven
	Simple Window	My Design	Provided SIFT
1 to 2	76.069583	327.631952	9.831576
1 to 3	285.942611	345.994421	8.202653
1 to 4	289.529478	367.727299	13.443062
1 to 5	373.714004	368.069801	12.893896
1 to 6	357.200646	378.820748	18.085878

Wall
	Simple Window	My Design	Provided SIFT
1 to 2	132.360895	345.078615	4.627598
1 to 3	192.859104	335.677877	2.744426
1 to 4	243.411028	360.679550	13.156131
1 to 5	333.306296	350.012036	53.908964
1 to 6	409.619672	375.960471	332.456770

Strength and Weakness:

It is easy to see from my results that the simple window descriptor performs better than my feature descriptor. I thinks that this happens because of the lack of more dimension in the vector of my descriptor. Particularly I attribute the failure to match because the missing spatial location information that it is contained in Lowe's descriptor. I must point out that I tried to implement the full fledged SIFT features as described in [Lowe 04] in the light of the poor results of my descriptor but the lack of a detailed description of the SIFT's implementation also worked against its successful implementation. However, I still left the implementation commented in my source code.
Although the low score in the test could also be the result of a bug I would like to point out that I thoroughly reviewed my code, that the reason I attribute it it to the design of my descriptor.

Performance Demonstration:

Here I show the performance of my operator that although didn't reached very good performance, it still found the corner of the sweater even with the change of scale and translation.
Note: This is not a picture of myself. Right now I don't have a digital camera.

Here I show some results of my detection:

Extra Credit:

My feature detector is run over the pyramid of images in order to achieve scale invariance. I must point out that I have booth versions in my code. I have a scale invariant version of my descriptor and the window descriptor and a simple one that only uses the oririgal image, without using the image pyramid. I did this because the code of the pyramid of images still has some glitches, it tended to failed with big images.