Indriyati Atmosukarto - CSE576 Project 1

INDRIYATI ATMOSUKARTO
CSE576 PROJECT 1: FEATURE DETECTION AND MATCHING
THURSDAY, 14 APRIL 2005

Project objective
Detect discriminating features is an image (in this case edges and corners) and find the best matching features in other images.
The features should be invariant to translation, rotation, illumination and scale.

Feature Detection
Identify interest points in the image using Harris corner detection method
Algorithm:
    Convert image to greyscale
    Apply Gaussian convolution to blur the image and remove noise
    Calculate gradient of image in x and y direction for every pixel
    For each point in the image, consider a 3x3 square window of pixels around that point.
            Compute the Harris matrix H for that point, defined as

where the summation is over all pixels in the window.
Compute the corner strength function

    Choose points whose c(H) is above threshold and c(H) is local maxima in a 10x10 neighborhood.
    These points will be called the feature points
    (Current threshold is set at 0.001)

Feature description
1. A simple window descriptor
    For each feature points
       Take a 45x45 window centered at the feature point
       Normalise the color of each pixels in the window
       Make a 9x9 window feature descriptor by doing a pyramid sampling for every 5 points
           The feature descriptor will contain the RGB value of the points, hence in total our feature descriptor has 9x9x3 dimensions
            Sum up the RGB values separately of every 5 points in the window by applying linear weights to the 5 points.

2. A simplified version of MOPS
    For each feature points
       Calculate the principal gradient direction of the point by using its Harris matrix to find the eigenvector and eigenvalue of the matrix
       Take a 45x45 window centered at the feature point
          Rotate the window using the feature point as the pivot of rotation and using the principal gradient direction as the angle of rotation
          Make a 9x9 feature descriptor by doing pyramid sampling for every 5 points
                Sum up the RGB values of every 5 points in the rotated window by applying linear weights to the 5 points
                The feature descriptor will be 9x93x dimensions

* I actually also tried a simpler version of the SIFT feature (the code is still there it's just that the function is not called)
   I didn't include it because after running the benchmark test I saw that it was performing worse than simplified MOPS
   so I chose simplified MOPS as my second descriptor
    For each feature points,
          Calculate principal gradient direction of the point by using its Harris matrix to find the eigenvector and eigenvalue of the matrix
          So here you would have used a window to find the Harrix matrix
          For each of the pixels in the above window find their individual gradient by taking their derivatives
          Calculate the difference between each of the window pixel's individual gradient to the principal gradient direction
          Bin the difference into one of 8 bins where bin0 is represents 0 to 45 degrees difference to principal gradient, bin1 is for 45 to 90 and so on
          So in total our feature vector will have 8 dimensions

Feature matching
1. Use Euclidean distance to calculate the distance between two feature descriptor and apply a threshold
    This matching measure is not very good because the threshold is so subjective (finding a perfect threshold is an art)
     and you actually need to change it for each dataset, so I wouldn't recommend it at all.

2. Use Euclidean distance to calculate the distance between two feature descriptor and then apply a threshold on the
    ratio of the best distance matched to the second best distance matched.
    This matching measure is much better than applying a direct threshold. We still use a threshold here but it's a ratio threshold
    and it gives a pretty good result for all the test cases without tweaking it here and there. Currently set at 0.5 which basically
    means if the second best distance method is at least twice as much as the first distance then that's good enough for a match.

* I actually tried doing sum of absolute difference but the performance wasn't much different from doing Euclidean distance.
    I also tried doing normalized cross correlation but that didn't do well at all, could have been a mistake in my implementation though.

Why make such a descriptor?
Simple window
A window descriptor is the simplest possible descriptor. The window descriptor is normalised so that the feature descriptor is more or less
invariant to changes in illumination.A sampling of a bigger window is taken to allow some pixel inaccuracy.

Rotated window
To create a rotation invariant feature descriptor we have to take into account the relative orientation of the features and align
the features so that we can match them across images.

Performance on benchmark images
Average pixel error:

Testcase	Feature Descriptor 1	Feature Descriptor 2	SIFT
Leuven	51.23	303.65	12.48
Graf	284.83	201.24	161.5
Bike	204.05	217.34	24.91
Wall	195.67	262.08	81.37

Note this performance measure depends on few factors: repeatability of the feature detector, the descriptor and the matching.
Also averaging out may not be very fair because if you look at the first few image pairs (image1-image2, image1-image3,
image1-image4) the simple descriptor does not bad but for the last two images where the transformation is always
so extreme, there is a huge jump in error and averaging out kind of misses this point. Also. for panoramic
stitching applications I don't think there would be any extreme transformations between images.
To get the SIFT averages I ran testSIFTMatch for each of the 5 image pairs (1-2,1-3,1-4,1-5,1-6) and averaged them.

The simple window descriptor which was assume to only do well in translated images, can actually handle
some degree of rotations, sometimes even better than the so-called rotation invariant descriptor.
For Leuven test case, the simple window descriptor performs well because the images in Leuven have illumination changes
and very little transformation and since the window descriptor is normalised then change of illumination is taken care of.

Strength and weakness
Strength:
The simple window descriptor is invariant to translation and since each pixel in the feature window are normalised then
the descriptor is also invariant to illumination. In addition doing sampling at every 5 pixels instead of taking
the whole bigger window will allow some inaccuracy in pixels.
The rotated window descriptor is slightly invariant to rotation

Weakness:
The descriptor does not perform as well as SIFT features.
The descriptor is not invariant to scale changes and major rotations

Pictures
Yongjoon, Seth and I went up to level 6 to take these pictures with a tripod.