Date released: Monday, January 28, 2008
Date due: Sunday, February 17, 2008 11.59pm
Late
policy: 5% off per day late till Wednesday 02/20/2008
Download
Indri's slides (results from her project)
In this project, you will write code to detect discriminating features in an image and find the best matching features in other images. Your features should be reasonably invariant to translation, rotation, and illumination, and (to a lesser degree) scale, and you'll evaluate their performance on a suite of benchmark images. Scale is less important because it's a lot harder - however, any descriptor that is invariant to the other factors will be slightly scale invariant as well. You're not expected to implement SIFT!
To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images. We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.
The project has three parts: feature detection, description,
and matching.
The Harris operator and descriptors work with gray-scale images, so the
input images will need to be converted. The skeleton code has the
function ConvertToGrey in
features.cpp to convert the images to grayscale.
In this step, you will identify points of interest in the image using the Harris corner detection method.
For each point in the image, consider a 5 x5 window of pixels around that point. Compute the Harris matrix M for that point, defined as
To find interest points, first compute the corner response R
(Try k = 0.05)
Once you've computed R for every point in the image, choose
points
where R is above a threshold (Try threshold = 0.001). You also
want it to be a local
maximum in at least a 3x3 neighborhood.
To calculate the trace and determinant of M, you can find the formula
in slide 23 of the Interest Operator lecture.
Now that you've identified points of interest, the next step is to
come up with a descriptor for the feature centered at each
interest point. This descriptor will be the representation you'll
use to compare features in different images to see if they match.
1. Simple descriptor
Use a small square window (say 5x5) as the
feature descriptor. This should be very easy to implement and
should work well when the images you're comparing are related by a
translation. You can test this simple descriptor on the images in the
easy dataset (http://cat.middlebury.edu/stereo/data.html)
You might also want to try to normalize the brightness values by
dividing by the length of the descriptor vector (square root of the sum
of the squares).
2. Advanced descriptor
You want the descriptor to be robust to changes in position,
orientation (i.e. rotation) and illumination. One way to do this is to
detect the dominant orientation of the pixels in a square window. We
will implement a very simple version of SIFT.
Given a square window (say 9x9), compute the gradient magnitude and
orientation of each pixel in the window. The magnitude and direction of
a pixel gradient can be calculated as follows:
Create a histogram of the gradient direction of all the pixels in
the
window (say a 36-bin histogram so that each bin is of 10-degree
increment). Accumulate the histograms bins by the magnitude of thepixel
gradient.
The dominent orientation of the pixel would correspond to the bin with
the highest count in the histogram.
Rotate the descriptor window according to the dominant direction
with the rotation matrix
The goal is to find the pixel location (u,v) that correspond to pixel
location (x,y) rotated with the dominant direction. where
angle theta is measured counterclockwise.
Note that the interest point, which is the center of the descriptor
window, has to be at the origin (0,0) when doing the rotation. So you
will first have to translate the descriptor window so that the interest
point (center of the descriptor window) is at the origin (0,0), rotate
the window by the dominant direction, and then translate the window
back to its original position.
However, the grid may not be
perfectly aligned:
To find values between the pixel samples, use bilinear interpolation:
Value of pixel p is a weighted average of the values of its four
closest pixels:
Now that you've detected and described your features, the next step
is to write code to match them, i.e., given a feature in one image,
find the best matching feature in one or more other images.
The skeleton code provided finds the SSD between all feature
descriptors in a pair of images. The code declares a match between each
feature and it's best match (nearest neighbor) in the second image.
For each feature in the first image, use SSD to find the best match
(or no match) in the second image. The idea here is to find a good
threshold for deciding if a match exists. To do this, compute (score of
the best feature match)/(score of the second
best feature match), and threshold on that.
Now you're ready to go! Using the UI and skeleton code that we provide, or your own matlab code, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes. Matlab users may want to scope out the C++ code for tips on comparing the features.
We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring). For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches. This is done using a routine that we supply in the skeleton code.
Download some image sets: leuven, bikes, wall
Included with these images are
Get access to FLTK.
(Here are local copies for windows
and linux, but see below about
using it on linux.) **disclaimer** the skeleton
code was designed to be used with FLTK-1. It has not been tested with
FLTK-2.
On Windows you'll need to install it yourself. (If you unzip FLTK to
somewhere other than the directory above the project, you'll have
to change the project settings
to look for the include and library files in the correct location.)
If you're using Linux on one of the CSE department machines, you don't
need to download FLTK, since you can just use the libraries in
/cse/courses/cse557/fltk-1.1.2 . The Makefile included in the zip
expects to find fltk in
/uns. If you already have it or are installing it in /usr/local, here
are fltk
installation instructions and an example of the
Makefile for the skeleton code (you will need to change the path to
suit your own paths).
Download the C++ skeleton code:
The skeleton classes that need to be edited are in file features.cpp.
The starting points for editing the classes are in functions: computeFeatures and matchFeatures. computeFeatures is the function
that performs feature detection and feature description. matchFeatures is the function that
performs the feature matching. Currently computeFeatures is calling a dummy
function called dummyComputeFeatures,
and matchFeatures is calling
a dummy function called dummyMatchFeatures.
1.For feature detection, you will need to modify the computeHarris method. The skeleton
code provides the function to conver tthe image to grayscale and to
convolve the image with a Gaussian mask.
2. For feature description, you will need to write two different extractDescriptor methods that
implement the two different ways of representing your features, and
modify computeFeatures to
call the two different extractDescriptor
methods.
3. For feature matching, you will need to write one function for the MatchingFeatures function
that implement the ratio
threshold matching when using SSD. .
After compiling and linking the skeleton code, you will have an executable cse576. This can be run in several ways:
cse576 computeFeatures imagefile featurefile [featuretype]
uses your feature detection routine to compute the features for imagefile,
and writes them to featurefile. featuretype specifies
which of your (at least two) types of features to compute.
cse576 testMatch featurefile1 featurefile2 homographyfile
[matchtype]
uses your feature matching routine to match the features in featurefile1
with the features in featurefile2. homographyfile
contains the correct transformation between the points in the two
images, specified by a 3-by-3 matrix. matchtype
specifies your type of matching algorithm to
use. Returns the average pixel error between the matched feature and
the real transformed point.
cse576 testSIFTMatch featurefile1 featurefile2 homographyfile
[matchtype]
is the same as above, but uses the SIFT file format.
cse576
benchmark imagedir [featuretype matchtype]
tests your feature finding and matching for all of the images in one of
the four above sets. imagedir is the directory containing the
image (and homography) files. This command will return the average
pixel error when matching the first image in the set with each of the
other five images.
In addition to your source code and executable, turn in a report describing your approach and results. In particular:
This report can be a Word document, or pdf document.
Email Indri writeup and your source code by Sunday, February 17, 2008
11:59pm.. Zip up your report, source code and images into a file with
your name as the name of the file , eg. JohnDoe.zip.