Assigned: Tue, Feb 17, 2009 Note: This is not a group project. You are expected to do your own work but it is okay to discuss issues that arise with other students and post questions to the class mailing list. |
Synopsis
Description
Skeleton Code and Dataset
To Do
What to Turn in
Bells and Whistles
Congratulations! You’ve just been hired by the internet startup Gurgle, Inc. Your boss, on seeing the “A” in your transcript for CSE 455, makes you the leader of their fledgling content-based image search effort. To show off your computer vision prowess, you decide to implement a proof-of-concept content-based image retrieval system that, given a query image, retrieves related color images from an image database. You decide to represent each image (database image/query image) by its color histogram. Using an appropriate similarity metric between color histograms of a query image and a database image, you hope to find the set of images in the database most similar to any query image.
Query Images Retrieved
(Note: The above is just an example and not necessarily what your system should
retrieve!)
The project is loosely based on the paper Color Indexing by Swain and Ballard. The project consists of two parts:
Part
I: Color Indexing
A color histogram can be regarded as a signature for an image that captures the distribution of colors in the image without regard to the actual spatial locations of the colors. Suppose the color channels in the image are R, G, and B. For 8-bit encoding of intensities, the total number of possible colors is 2563. A typical histogram quantizes these colors to a much smaller number of bins, say, 10 x 10 x 10 = 1000 bins, where each bin stands for some range of original colors. In the 10 x 10 x 10 case, the ranges would be 0 through 25 as level 1, 26 through 51 as level 2, …, up to 234 through 255 as level 10, for each channel. Each bin of the histogram represents a different combination of R, G, B levels. The RGB histogram could then be computed as:
hRGB(x,y,z) = number of R pixels at level x, G pixels at level y, B pixels at level z
where x, y, z range between 1 through 10.
Color
spaces
In this project, you will compare the performance of histograms in two color spaces:
· RGB space, and
· the color opponent space used by Swain and Ballard: wb = R + G + B, rg = R – G, by = 2B - R - G
For the RGB histogram, use 10 levels for each channel as above to compute a histogram with 1000 bins.
For the color opponent histogram, use 5 levels for wb and 14 levels each for rg and by, yielding 5 x 14 x 14 = 980 bins.
Similarity
Metrics
You will compare the performance of two similarity metrics:
· Sum of squared differences (SSD) distance
· Histogram intersection
Definitions for these two metrics will be given below.
Part
II: Incorporating Spatial Information
The color histogram for an image discards all information about the spatial distribution of colors. In Part II of the project, you will investigate whether providing a modest amount of spatial information improves retrieval performance.
Since the images we consider in this project are outdoor scenes, we will adopt a fairly simplistic approach to incorporating spatial information:
· Divide the image into 3 sections: the top one-thirds, the middle, and the bottom one-thirds of the image. This roughly corresponds to dividing the image into content that is furthest away (e.g., the sky), content in the middle (e.g., waves, a boat, etc.), and content in the foreground (e.g., beach).
· Compute the color histogram for each of the 3 sections of an image. Thus, each image is represented by 3 histograms.
· When comparing images, compute the similarity between corresponding histograms for each section of the image. If s1, s2, and s3 are the similarity scores obtained for the top, middle, and bottom sections respectively, then compute the overall similarity between the two images as the average score:
There is no skeleton code for this project. You will need to write your own code in C/C++/Java. You may re-use code from the previous projects if you wish (for image I/O etc.).
Downloading code from the internet is not allowed unless it is for basic image I/O. Here is a link to the libraries some students find useful..
You will use the following dataset of images:
Download the set of images.
Download the set of image thumbnails. (The image thumbnails, in jpeg format, may be useful
for your write up.)
1) For each image in the database you will perform the following procedure:
a) Compute the color histogram in RGB space and in color-opponent (OPP) space.
b) Store the two histograms for each database image in two separate data structures that can both be used in memory and also saved to a file so you don't have to keep rerunning the computation for each query.
2) Write code to compute the similarity between any two images I1 and I2 using:
a) SSD: Given histograms h1 and h2 for the two images, the SSD distance between them is defined as:
where the summation is over all the levels (see example of levels above) for each color channel in the histogram.
b) Histogram intersection: The intersection between two histograms h1 and h2 is defined as:
where #h denotes the size of the histogram h (total number of pixels).
3) Repeat 1) and 2) above for the spatial information case where the image is partitioned into top, middle, and bottom sections (see Part II of the project).
4) Create a query system which, given a query image Q, a color space C = RGB or OPP, a similarity metric S = SSD or INT, and spatial information SP = YES or NO, computes the similarity between Q and each image I in the database, and returns an ordered list of the images in the database and their similarity to Q.
precision = (number of retrieved images that are relevant) / (number of retrieved images)
recall = (number of retrieved images that are relevant) / (total number of relevant images in database)
So, for example, if you decide to output 15 possibly relevant images for any query and for query image “beach_1”, only 3 of the 15 retrieved images have the label “beach” in their filenames, then precision = 3/15 and recall = 3/5 (since there are 5 “beach” images in the database). (NOTE: We will assume sunset1 and sunset2 images in the dataset are two different categories i.e., images with sunset2 in their filenames are not considered relevant for queries involving sunset1, and vice versa).
Compute and store the precision and recall values for each of the 5 query images above when the number of retrieved images is the top 5, 15, 25, 40 of your ordered list of images from step 3 above. Compute these precision and recall values for each of the 8 ordered lists (produced by different settings of C, S, and SP) for each query image.
Web
page
Example Query Results for boat_5 (using OPP, INT, and no spatial information)
|
|
|
|
|
|
|
|
|
... |
... |
|
|
|
|
... |
|
|
|
|
... |
|
|
|
|
... |
|
|
|
|
... |
|
|
|
|
... |
... |
... |
... |
|
(NOTE: INT numbers above are not actual INT values and are only given for illustrative purposes)
Based on your results from I.4 above, compute the average precision and average recall across the 5 query images, for each of the 8 combinations of settings. Plot the resulting average precision versus average recall values (put average recall on the x-axis). You may want to plot multiple precision vs. recall curves in one graph to compare the different combinations.
Based on your plots, which combination of settings appears to be the best (for our 5 query image set)?
Source code
and binaries
Here is a list of suggestions for extra credit. You are encouraged to come up with your own extensions.
Create a simple GUI for performing queries.
Perform a more comprehensive analysis to determine which combination of settings (color space, similarity, spatial information) works best for the entire dataset. This would involve calculating the average precision and average recall over all 40 possible query images.
Implement color-constant color indexing as suggested by Funt and Finlayson. Ascertain whether it improves the performance of the retrieval system by analyzing average precision and recall values for the five query images. Artificially alter the brightness of some of the images in the database and compare the performance of your original retrieval system with the color-constant one.
Implement a general reordering program that runs on top of Google Image Search and reorders the images retrieved by Google according to their similarity to the “dominant content” in the retrieved images. One straightforward method might be to compute the average color histogram based on the set of top N retrieved images, and reorder these images according to similarity to the average histogram. A smarter method would be to analyze the distribution of color histograms for the N retrieved images using some form of cluster analysis and discard any outlier histograms (due to irrelevant images) before computing the average histogram.