Date released: Mon, April 16, 2012
Date due: Wed, May 2, 2012 11:59pm
Late policy: 5% off per day
late till Fri, May 4, 2012
You can take a look at Indri's slides (results from her project) or previous years' Artifacts
In this project, you will write code to detect discriminating features in an image and find the best matching features in other images. Your features should be reasonably invariant to translation, rotation, and illumination, and (to a lesser degree) scale, and you'll evaluate their performance on a suite of benchmark images. Scale is less important because it's a lot harder - however, any descriptor that is invariant to the other factors will be slightly scale invariant as well. You're not expected to implement SIFT!
To help you visualize the results and debug your program, we provide a working user interface that displays detected features and best matches in other images. We also provide sample feature files that were generated using SIFT, the current best of breed technique in the vision community, for comparison.
The project has three parts: feature detection, description, and
matching.
The Harris operator and descriptors work with gray-scale images, so the input
images will need to be converted. The skeleton code has the function ConvertToGrey
in features.cpp to convert the images to grayscale.
In this step, you will identify points of interest in the image using the Harris corner detection method.
For each point in the image, consider a 5 x5 window of pixels around that point. Compute the Harris matrix M for that point, defined as
Where the summation is over all pixels (xk,yk) in the window W centered at
location (x,y). For the weights w(xk,yk) you can use a 5x5 Gaussian
mask
Ix and Iy indicate the x and y-gradients of a pixel,
which you can approximate using masks, such as the Sobel operator.
To find interest points, first compute the corner response R
(Try k = 0.05)
Once you've computed R for every point in the image, choose points where R
is above a threshold (Try threshold = 0.001). You also want it to be a
local maximum in at least a 3x3 neighborhood.
To calculate the trace and determinant of M, you can find the formula in slide
20 of the Harris lecture.
Now that you've identified points of interest, the next step is to come up with a descriptor for the feature centered at each interest point. This descriptor will be the representation you'll use to compare features in different images to see if they match.
1. Simple descriptor
Use a small square window (say 5x5) as the feature descriptor. This
should be very easy to implement and should work well when the images you're
comparing are related by a translation. You can test this simple descriptor on
the images in the easy dataset (http://cat.middlebury.edu/stereo/data.html)
You might also want to try to normalize the brightness values by dividing by
the length of the descriptor vector (square root of the sum of the squares).
2. Advanced descriptor
You can define it however you want, but you should design it to be robust to
changes in position, orientation (i.e., rotation), and illumination. You
are welcome to use techniques described in lecture (e.g., detecting dominant
orientations, using image pyramids, using a disc instead of a square window),
or come up with your own ideas. This is the main challenge of the assignment.
Now that you've detected and described your features, the next step is to write code to match them, i.e., given a feature in one image, find the best matching feature in one or more other images.
The skeleton code provided finds the SSD between all feature descriptors in a pair of images. The code declares a match between each feature and it's best match (nearest neighbor) in the second image.
For each feature in the first image, use SSD to find the best match (or no match) in the second image. The idea here is to find a good threshold for deciding if a match exists. To do this, compute (score of the best feature match)/(score of the second best feature match), and threshold on that.
Now you're ready to go! Using the UI and skeleton code that we provide, or your own matlab code, you can load in a set of images, view the detected features, and visualize the feature matches that your algorithm computes. Matlab users may want to scope out the C++ code for tips on comparing the features.
We are providing a set of benchmark images to be used to test the performance of your algorithm as a function of different types of controlled variation (i.e., rotation, scale, illumination, perspective, blurring). For each of these images, we know the correct transformation and can therefore measure the accuracy of each of your feature matches. This is done using a routine that we supply in the skeleton code.
1. Download
some image sets: leuven, bikes,
wall, graf
Included with these images are
· Download the C++ skeleton code:
The skeleton classes that need to be edited are in
file features.cpp.
The starting points for editing the classes are in functions: computeFeatures
and matchFeatures. computeFeatures is the function that performs
feature detection and feature description. matchFeatures is the function
that performs the feature matching. Currently computeFeatures is calling
a dummy function called dummyComputeFeatures, and matchFeatures
is calling a dummy function called dummyMatchFeatures.
1.For feature detection, you will need to modify the computeHarris
method. The skeleton code provides the function to conver tthe image to
grayscale and to convolve the image with a Gaussian mask.
2. For feature description, you will need to write two different extractDescriptor
methods that implement the two different ways of representing your features,
and modify computeFeatures to call the two different extractDescriptor
methods.
3. For feature matching, you will need to write one function
for the MatchingFeatures function that implement the ratio
threshold matching when using SSD.
If you are not using one of the CSE department machines, you need to get access to FLTK. (Here are local copies for linux). If you're using Linux on one of the CSE department machines, you don't need to download FLTK, since you can just use the libraries in /cse/courses/cse557/fltk-1.1.2 . The Makefile included in the zip expects to find fltk in /uns. If you already have it or are installing it in /usr/local, here are fltk installation instructions and an example of the Makefile for the skeleton code (you will need to change the path to suit your own paths).
**disclaimer** the skeleton code was designed to be used with FLTK-1. It has not been tested with FLTK-2.
After compiling and linking the skeleton code, you will have an executable cse576. This can be run in several ways:
To use the GUI:
You can also use the mouse buttons to
select a subset of the features to use in the query, and then perform the query
Until you write your
feature matching routine, the features are matched by minimizing the Euclidean
distance between feature vectors.
The GUI is used to
visualize the location of your feature points and the matching. You will need
to compute your features first (using the computeFeatures argument in
the command line) to compute
your Harris features and save them to a file. Then
only you can load your feature file and visualize it in the GUI.
·
cse576 computeFeatures imagefile featurefile
[featuretype]
uses your feature detection routine to compute the features for imagefile,
and writes them to featurefile. featuretype specifies which of
your (at least two) types of features to compute.
·
cse576 testMatch featurefile1 featurefile2
homographyfile [matchtype]
uses your feature matching routine to match the features in featurefile1
with the features in featurefile2. homographyfile contains the
correct transformation between the points in the two images, specified by a
3-by-3 matrix. matchtype specifies your type of matching algorithm to
use. Returns the average pixel error between the matched feature and the real
transformed point.
·
cse576 testSIFTMatch featurefile1
featurefile2 homographyfile [matchtype]
is the same as above, but uses the SIFT file format.
·
cse576 benchmark
imagedir [featuretype matchtype]
tests your feature finding and matching for all of the images in one of the
four above sets. imagedir is the directory containing the image (and
homography) files. This command will return the average pixel error when
matching the first image in the set with each of the other five images.
You are a little more on your own here, you will probably
find it useful to look at the skeleton code. The key element of testing is to
take two images, generate lists of feature descriptors for them, and match
them. For each feature match, see how far off the actual transformation is.
This code applies the transformation matrix (similar to applyHomography in the
C++ code).
function [xNew, yNew] = applyHomography(x, y, h)
d = h(3,1)*y + h(3,2)*x + h(3,3);
yNew = (h(1,1)*y + h(1,2)*x + h(1,3))./d;
xNew = (h(2,1)*y + h(2,2)*x + h(2,3))./d;
Download the .pdf template for the report here. You are
free to use other text processing tools like latex etc, however make sure that
you have the same sections in your report.
The grading guidelines for project 2 is here.
In addition to your source code and executable, turn in a report describing your approach and results. In particular:
This report can be a Word document, or pdf document.
Email Jia (JiaWu@cs.washington.edu) writeup and your source code by Wed, May
2, 2012 11:59pm. Zip up your report, source code and images into a file
with your name as the name of the file , eg. JohnDoe.
For those who would like to challenge themselves, here is a list of suggestions for extending the program for a small amount of extra credit. You are encouraged to come up with your own extensions as well!
Note:
This project is slightly harder than project 1, so start early! Also we do not
expect your descriptor to beat the SIFT descriptor.