Computer Vision (CSE 455), Winter 2009

Project 3:  Content-Based Image Retrieval using Color Histograms

Assigned:  Tue, Feb 17, 2009
Due:  Fri, Feb 27, 2009 (by 11:59 pm)
Please turn in all of your source code and binaries as instructed in "Project Turnin Procedure".

Note: This is not a group project. You are expected to do your own work but it is okay to discuss issues that arise with other students and post questions to the class mailing list.


Synopsis
Description
Skeleton Code and Dataset
To Do
What to Turn in
Bells and Whistles

Synopsis

Congratulations! You’ve just been hired by the internet startup Gurgle, Inc. Your boss, on seeing the “A” in your transcript for CSE 455, makes you the leader of their fledgling content-based image search effort. To show off your computer vision prowess, you decide to implement a proof-of-concept content-based image retrieval system that, given a query image, retrieves related color images from an image database. You decide to represent each image (database image/query image) by its color histogram. Using an appropriate similarity metric between color histograms of a query image and a database image, you hope to find the set of images in the database most similar to any query image.

 

   Query                                                                                    Images Retrieved

               
(Note: The above is just an example and not necessarily what your system should retrieve!)

Description

The project is loosely based on the paper Color Indexing by Swain and Ballard. The project consists of two parts:

Part I: Color Indexing

A color histogram can be regarded as a signature for an image that captures the distribution of colors in the image without regard to the actual spatial locations of the colors. Suppose the color channels in the image are R, G, and B. For 8-bit encoding of intensities, the total number of possible colors is 2563. A typical histogram quantizes these colors to a much smaller number of bins, say, 10 x 10 x 10 = 1000 bins, where each bin stands for some range of original colors. In the 10 x 10 x 10 case, the ranges would be 0 through 25 as level 1, 26 through 51 as level 2, …, up to 234 through 255 as level 10, for each channel.  Each bin of the histogram represents a different combination of R, G, B levels. The RGB histogram could then be computed as:

 

hRGB(x,y,z) = number of R pixels at level x, G pixels at level y, B pixels at level z

 

where x, y, z range between 1 through 10.

Color spaces

In this project, you will compare the performance of histograms in two color spaces:

·        RGB space, and

·        the color opponent space used by Swain and Ballard:  wb = R + G + B,   rg = R – G,  by = 2B - R - G

For the RGB histogram, use 10 levels for each channel as above to compute a histogram with 1000 bins.

For the color opponent histogram, use 5 levels for wb and 14 levels each for rg and by, yielding 5 x 14 x 14 = 980 bins.

Similarity Metrics

You will compare the performance of two similarity metrics:

·        Sum of squared differences (SSD) distance

·        Histogram intersection

Definitions for these two metrics will be given below.

Part II: Incorporating Spatial Information

The color histogram for an image discards all information about the spatial distribution of colors. In Part II of the project, you will investigate whether providing a modest amount of spatial information improves retrieval performance.

Since the images we consider in this project are outdoor scenes, we will adopt a fairly simplistic approach to incorporating spatial information:

·        Divide the image into 3 sections: the top one-thirds, the middle, and the bottom one-thirds of the image. This roughly corresponds to dividing the image into content that is furthest away (e.g., the sky), content in the middle (e.g., waves, a boat, etc.), and content in the foreground (e.g., beach).

·        Compute the color histogram for each of the 3 sections of an image. Thus, each image is represented by 3 histograms.

·        When comparing images, compute the similarity between corresponding histograms for each section of the image. If s1, s2, and s3 are the similarity scores obtained for the top, middle, and bottom sections respectively, then compute the overall similarity between the two images as the average score: 

            s = (s1 + s2 + s3)/3

 

Skeleton Code and Dataset

There is no skeleton code for this project. You will need to write your own code in C/C++/Java. You may re-use code from the previous projects if you wish (for image I/O etc.).

Downloading code from the internet is not allowed unless it is for basic image I/O. Here is a link to the libraries  some students find useful..

You will use the following dataset of images:

Download the set of images.
Download the set of image thumbnails. (The image thumbnails, in jpeg format, may be useful for your write up.)

 

To Do

Coding

1)      For each image in the database you will perform the following procedure:

a)      Compute the color histogram in RGB space and in color-opponent (OPP) space.

b)      Store the two histograms for each database image in two separate data structures that can both be used in memory and also saved to a file so you don't have to keep rerunning the computation for each query.

2)      Write code to compute the similarity between any two images I1 and I2 using:

a)      SSD: Given histograms h1 and h2 for the two images, the SSD distance between them is defined as:

 

where the summation is over all the levels (see example of levels above) for each color channel in the histogram.

b)      Histogram intersection: The intersection between two histograms h1 and h2 is defined as:

 

where #h denotes the size of the histogram h (total number of pixels).

3)      Repeat 1) and 2) above for the spatial information case where the image is partitioned into top, middle, and bottom sections (see Part II of the project).

4)      Create a query system which, given a query image Q, a color space C = RGB or OPP, a similarity metric S = SSD or INT, and spatial information SP = YES or NO, computes the similarity between Q and each image I in the database, and returns an ordered list of the images in the database and their similarity to Q.

Testing and Analysis

  1. Test your system as follows.
    1. Use the 40 images provided as your database.
    2. Use the following images as queries:
      • beach_1
      • boat_5
      • cherry_3
      • stHelens_2
      • sunset1_2
    1. For each color space C, similarity metric S, and SP = YES or NO, compute and store the ordered list of images and similarities. This should give you 2 x 2 x 2 = 8 ordered lists for each query for the different settings of C, S, and SP.
    2. The performance of a retrieval system can be quantified using “precision” and “recall” which are defined as follows:

precision = (number of retrieved images that are relevant) / (number of retrieved images)

      recall = (number of retrieved images that are relevant) / (total number of relevant images in database)

So, for example, if you decide to output 15 possibly relevant images for any query and for query image “beach_1”, only 3 of the 15 retrieved images have the label “beach” in their filenames, then precision = 3/15 and recall = 3/5 (since there are 5 “beach” images in the database). (NOTE: We will assume sunset1 and sunset2 images in the dataset are two different categories i.e., images with sunset2 in their filenames are not considered relevant for queries involving sunset1, and vice versa).

Compute and store the precision and recall values for each of the 5 query images above when the number of retrieved images is the top 5, 15, 25, 40 of your ordered list of images from step 3 above. Compute these precision and recall values for each of the 8 ordered lists (produced by different settings of C, S, and SP) for each query image.

  1. Cleaning up after Google Image Search: Use your method to reorder images output by Google Image Search. Since Google Image Search only uses text associated with images to retrieve images given a query word, it may retrieve images whose content is not directly related to the query. A content-based retrieval method should be able to perform better. To illustrate this:
    • Type in “beach” in Google Image Search (http://images.google.com/)
    • Store the first 20 images retrieved by Google Image Search.
    • Using beach_1 as your query image and C = OPP, S = INT, and SP = YES, generate a re-ordered list of the 20 Google-retrieved images, along with their similarity values.
    • Repeat the above three steps for the query “boat” in Google Image Search and boat_2 as the query image.

What to Turn In

Web page

  1. Make a web page that shows the ordered list of images for each of the 5 required queries, for C = OPP, S = INT, and both cases of SP = YES and SP = NO. You will thus show 2 ordered lists per query image. For each query, place the retrieved images in order of decreasing similarity. Include the INT value for each database image. 

Example Query Results for boat_5 (using OPP, INT, and no spatial information)


boat_5
INT = 1


boat_4
INT = 0.93


boat_3
INT =  0.91


boat_2
INT = 0.89


beach_3
INT = 0.71


beach_2
INT = 0.68


beach_4
INT = 0.63


crater_2
INT = 0.55


boat_1
INT = 0.52

...

...

 

 

 

 

...

 

 

 

 

...

 

 

 

 

...

 

 

 

 

...

 

 

 

 

...

...

...

...


sunset1_5
INT = 0.06

(NOTE: INT numbers above are not actual INT values and are only given for illustrative purposes)

  1. Retrieval performance is typically analyzed by plotting precision versus recall. Given two methods, the method that gives higher precision values for the same recall values is considered superior. In our case, we would like to find out which combination of settings (C = RGB or OPP, S = SSD or INT, SP = YES or NO) is the best.

Based on your results from I.4 above, compute the average precision and average recall across the 5 query images, for each of the 8 combinations of settings. Plot the resulting average precision versus average recall values (put average recall on the x-axis). You may want to plot multiple precision vs. recall curves in one graph to compare the different combinations.

Based on your plots, which combination of settings appears to be the best (for our 5 query image set)? 

  1. [Google Image results] For each query image (i.e., beach_1 and boat_2), you will show the comparisons between the original ordered list (returned by Google) and the re-ordered list (returned by your method with C = OPP, S = INT, and SP = YES). Place the re-ordered images in order of decreasing similarity. Include the INT value for each image in the re-order list.
  2. Some extra credit items are listed below. If you attempt any, describe your idea and implementation on your web page. State clearly how to test your implementation.

Source code and binaries

 

 

Bells and Whistles

Here is a list of suggestions for extra credit. You are encouraged to come up with your own extensions.

  Create a simple GUI for performing queries.

  Perform a more comprehensive analysis to determine which combination of settings (color space, similarity, spatial information) works best for the entire dataset. This would involve calculating the average precision and average recall over all 40 possible query images.

  Implement color-constant color indexing as suggested by Funt and Finlayson. Ascertain whether it improves the performance of the retrieval system by analyzing average precision and recall values for the five query images. Artificially alter the brightness of some of the images in the database and compare the performance of your original retrieval system with the color-constant one.

   Implement a general reordering program that runs on top of Google Image Search and reorders the images retrieved by Google according to their similarity to the “dominant content” in the retrieved images. One straightforward method might be to compute the average color histogram based on the set of top N retrieved images, and reorder these images according to similarity to the average histogram. A smarter method would be to analyze the distribution of color histograms for the N retrieved images using some form of cluster analysis and discard any outlier histograms (due to irrelevant images) before computing the average histogram.