CSE 455 HW6: Content-Based Image Retrieval

Date Released: Tuesday, November 18, 2014

Date Due: Monday, December 8, 2014 11:59pm

(Late Policy 5% off per day till Wednesday, Dec 10)

For this assignment, you may work in teams of up to two people. However, there will be more requirements for a two person team.

Materials

Download the set of images.
Download the set of image thumbnails. (The image thumbnails, in jpeg format, should help keep your write-up a manageable size.)

Note that there is no provided code for this assignment. You can start with your k-means code from the previous assignment if you like.

Contents

In this assignment, you will develop a content-based image retrieval system that retrieves database images based on the similarity between their regions and those of a query image.

mountain image segmented by color
another mountain image segmented by color

What To Do

The main idea is to represent each image (the database images and the query image) by a set of regions obtained from color clustering, their attributes, and (for two person teams) their simple spatial relationships. Then, come up with a distance function between two images in this representation, and use this distance function to find the images most similar to a query image.

  1. For each image in the database you will perform the following procedure:
    1. Run your color clustering on it to obtain a labeled cluster image.
    2. Run connected components on it to obtain a labeled segmentation image. Possibly perform some noise cleaning/merging operations to improve the regions. Don't vary any parameters between images.
    3. For each major region (use a size threshold), compute at least the following attributes:
      • size
      • mean color, in RGB or whatever space you like
      • at least the following co-occurrence texture features (use spatial relationship d = (1,1)):
        1. energy
        2. entropy
        3. contrast
      • centroid (row, column)
      • bounding box or other representation of where the region is
      • Two person teams: RAG (region adjacency graph). This should probably be represented as a set of adjacency lists, one for each node (region) in the graph.
    4. Two person teams: For each pair of adjacent regions in the RAG, find and record the following possible relationships:
      • inside
      • above_adjacency
      • below_adjacency
      • left_adjacency
      • right_adjacency
      • other_adjacency (if none of the others are satisfied by whatever requirements you impose to define them).
    5. Store the attributes and relationships in a data structure that you define and that can be both used in memory and also saved in a file so you don't have to keep rerunning the analysis. We'll refer to the data structure for image I as DS(I)

  2. Develop a distance measure that will compute the distance RELDIST(I1,I2) between DS(I1) and DS(I2) for any two images I1 and I2. To do this, you need to find a correspondence between the regions of I1 and the regions of I2.
    One person teams: Compute this correspondence greedily. That is, for each region in I1, find its closest match in I2, and so on.
    Two person teams: Compute the optimal correspondence. You can do this with an exponential search procedure, since the number of regions will be small.

    Finding correspondences is in Chapter 11 and was covered in the 2D Object Recognition lecture. Once you have the correspondence, the distance between I1 and I2 should be some function of:

    We would like you to experiment with at least 2 different distance measures (4 different distance measures for two-person teams) and tell us what they were and which one worked best and is used for your final results. Distance measures can vary in what attributes you used and also in the kind of metric you use to compare 2 vectors. Euclidean distance is only one such metric.

  3. Create a query system in which you can select a query image Q and compare it to each image I in your database by computing RELDIST(Q,I). Then you order the images in the database according to their distance to the query and return the ordered list (the images and associated distances). Extra credit: Create a simple GUI for performing these queries.

  4. Test your system as indicated below.
Sample One Person Timeline

Sample Two Person Timeline

Turnin

You should turn in a report that describes all aspects of your system, and shows the required results. You should also submit all of your code. Here is a list of what must be included in the report.

  1. Describe your color clustering and region-finding algorithm.
  2. List the region attributes you used.
  3. List the region relationships you used and explain how you determine the relationship between two regions (two person teams).
  4. Describe your distance measure, including the region correspondence algorithm.
  5. Use the provided thumbnail images to show the results for each of the required queries. Also include the distances between the query image and each result image. Print the images in order of ascending distance. Hopefully, the query image will have distance zero to itself, and similar images will have smaller distances than dissimilar ones.

    Example Query Results for boat_2

    boat_2
    d = 0

    boat_4
    d = 0.05

    boat_3
    d = 0.07

    boat_5
    d = 0.07

    beach_3
    d = 0.12

    beach_2
    d = 0.13

    beach_4
    d = 0.15

    crater_2
    d = 0.16

    boat_1
    d = 0.20
    ...
    ...
    ...
    ...
    ...
    ...
    ... ... ... ...
    sunset1_5
    d = 0.98

  6. Write a report on your results.
  7. Submit all the source code in a compilable state as well as the executables. Your source code must be well structured and well commented.

Report Template

Evaluation

Dropbox

Upload your report and code to the homework dropbox HERE.

Homework is due on December 8 (Monday) by 11:59 PM. Please plan your work early. You can submit until December 10 but you will lose 5% of your grade for every late day.

This is a one-person assignment. You may discuss it, but please turn in your own individual work.