The goal of this project is to develop matching procedures that can detect features specified as patterns of intensity values, and are invariant to translation, orientation, scale, and intensity change.

Part One: Compute Features

In order to compute features, we need a feature points detector and a feature descriptor. In this project, I have tried two methods to compute features:

Type 1: Harris corner detector + Square window descriptor

Here are two examples of the Harris Value image for graf and yosemite:

harris_graf (346K) Harris_yosemite (274K)

After thresholding the HarrisValue image, we need to detect local maximums to find the key points. Once we have detected each feature point, we need to design a method to describe the feature. First I have tried a very simple square window descriptor. Here is the implementation detail:

Step 1: Compute the diminate orientation by using histogram.
(1)For each image sample I(x, y), the gradient magnitude and orientation is precomputed using pixel differences. These work can be done simultaneously while we are calculating the Harris value for each sample.
(2)An orientation histogram is formed from the gradient orientations of sample points within a region (7x7) around the keypoint. The orientation histogram has 36 bins covering 360 degree range of orientations. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window.
(3)Peak in the orientation histogram is detected, which is used as the dominant orientation.
histo (8K)
Step 2: Represent the keypoint descriptor relative to the dominant orientation.

(1)Compute the transformed coordinates for the 7x7 square window around the keypoint by using rotation matrix.
(2)Once we get the relative coordinates, we are going to grab the corresponding pixel intensities to fill the feature descriptor.

Step 3: Normalize the descriptor intensities.

(1)Compute the average intensity and the standard deviation of the descriptor.
(2)Subtract each intensity by the average and divede by the standard deviation.

Strength & Weakness
(1)Harries cornor detector is invariant to image translation, rotation, pespective changes, intensity changes, but not invariant to scale changes.
(2)As for the window descriptor, it should be invariant to image translation, rotation, intensity changes. But it's not invariant to affine changes, or scale changes.

Type 2: Scale Invariant Detectors (SIFT operator) + SIFT descriptor

I have read a paper "Distinctive Image Features from Scale-Invariant Keypoints" and all my implementation here is trying to follow the instructions from this paper.
scale_space (56K) LocalExtrema (17K)
Basicly, the SIFT feature detector and descriptor is as follow:

Step 1: Detection of scale-space extrema.
(1) Detect keypoints using a cascade DOG filter to identify candidate locations that will be examined further. The cascade filter is displayed above in the left side picture. In the project I have explored 4 scales in the first two octave. The sigma value for each guassian filter is sequentially set to be 1, 21/2, 2, 23/2.
(2) After computing the DOG images by subtracting each pair of Gaussian smoothed picture, we are going to search for local maximum and minimum in a 9x9x9 neighbourhood (as displayed above in the right side picture)
(3) The detected local extremas at each scale and from each level will be examined further next.

Step 2: Accurate keypoint localization.
(1)For each detected local extrema, we need to check its Hessian value from the Gaussian smoothed image at the corresponding scale. The purpose here is to eliminate edge responses. The method used here is very like Harries value calculation, if you want detailed explanation, please refer to the paper.
(2)After thresholding the detected extrema, the remaining local extrema will be used as key points.

Step 3: Dominant orientation assignment ( using the similar mehtod as explained above ).
(1) Creat histogram of local gradient directions computed at selected scale of Gaussian pyramid in 7x7 neighbourhood of a keypoint.
(2)Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian-weighted circular window.
(3)The highest two peaks in the orientation histogram are detected. If the second one is 80% as high as the first one, both will be used as dominant orientations for two features. If not, we will only use the highest peak as the dominant orientation and only one feature will be produced.
SIFT (12K)
Step 4: Local image descriptor.
(1)Compute gradient orientation histograms on 4x4 neighborhoods, relative to the keypoint orientation using thresholded image gradients from Gaussian pyramid level at keypoint's scale.
(2)Quantize orientations to 8 values. Similarly, each sample added to the histogram is weighted by its gradient magnitude and by a Gaussian window.
(3) In the project, I have created a 4x4 array of histograms, which means the SIFT feature vector is of length 4x4x8 = 128.
(4)Normalize the descriptor to make it invariant to intensity change.

Strength & Weakness
SIFT feature is invariant to image translation, rotation, pespective changes, intensity changes, and invariant to scale changes. According to my own experience, the performance may depend on some important threshold values used in the detection. I need more practice to try this method.

Part Two: Test the Performance of Each Feature Desciptor and Evaluate different Feature Matching Method

The average AUC value for each type of feature with different matching method is as follow:
    Harris detector + Window descriptor + SSD matching: 0.45 (graf) 0.68(leuven) 0.67(wall) 0.57(bikes)

    Harris detector + Window descriptor + Ratio matching: 0.55 (graf) 0.78(leuven) 0.79(wall) 0.60(bikes)

    SIFT operator + SIFT descriptor + SSD matching: 0.57 (graf) 0.76 (leuven) 0.79(wall) 0.78(bikes)

    SIFT operator + SIFT descriptor + Ratio matching: 0.64 (graf) 0.84(leuven) 0.83(wall) 0.73(bikes)

As we can see, SIFT descriptor is better than the simple window descriptor. Besides, we can also see that the Ratio matching method works better than the SSD matching method.

From the above AUC value, it seems that both descriptors can work well when images are mainly just translated and ratated, such as the wall test images and the Yosemite test images. Whereas both descriptors can not work well when there are affine changes and scale changes existing between different images, such as in the graf test images.

The two sets of 6 ROC curve is displayed as below:
roc_graf (58K)
The above set of 6 ROC curve is computed from the graf test images (img1.ppm and img2.ppm). The next one is computed from the Yosemite test images (Yosemite1.jpg and Yosemite2.jpg).
roc_Yosemite (47K)