CSE 455 HW6: Content-Based Image Retrieval

Date Released: Tuesday, November 18, 2014

Date Due: Monday, December 8, 2014 11:59pm

(Late Policy 5% off per day till Wednesday, Dec 10)

For this assignment, you may work in teams of up to two people. However, there will be more requirements for a two person team.

Materials

Download the set of images.
Download the set of image thumbnails. (The image thumbnails, in jpeg format, should help keep your write-up a manageable size.)

Note that there is no provided code for this assignment. You can start with your k-means code from the previous assignment if you like.

In this assignment, you will develop a content-based image retrieval system that retrieves database images based on the similarity between their regions and those of a query image.


mountain image	segmented by color


another mountain image	segmented by color

What To Do

The main idea is to represent each image (the database images and the query image) by a set of regions obtained from color clustering, their attributes, and (for two person teams) their simple spatial relationships. Then, come up with a distance function between two images in this representation, and use this distance function to find the images most similar to a query image.

For each image in the database you will perform the following procedure:
1. Run your color clustering on it to obtain a labeled cluster image.
2. Run connected components on it to obtain a labeled segmentation image. Possibly perform some noise cleaning/merging operations to improve the regions. Don't vary any parameters between images.
3. For each major region (use a size threshold), compute at least the following attributes:
  - size
  - mean color, in RGB or whatever space you like
  - at least the following co-occurrence texture features (use spatial relationship d = (1,1)):
  - centroid (row, column)
  - bounding box or other representation of where the region is
  - Two person teams: RAG (region adjacency graph). This should probably be represented as a set of adjacency lists, one for each node (region) in the graph.
4. Two person teams: For each pair of adjacent regions in the RAG, find and record the following possible relationships:
  - inside
  - above_adjacency
  - below_adjacency
  - left_adjacency
  - right_adjacency
  - other_adjacency (if none of the others are satisfied by whatever requirements you impose to define them).
5. Store the attributes and relationships in a data structure that you define and that can be both used in memory and also saved in a file so you don't have to keep rerunning the analysis. We'll refer to the data structure for image I as DS(I)

Develop a distance measure that will compute the distance RELDIST(I₁,I₂) between DS(I₁) and DS(I₂) for any two images I₁ and I₂. To do this, you need to find a correspondence between the regions of I₁ and the regions of I₂.
One person teams: Compute this correspondence greedily. That is, for each region in I₁, find its closest match in I₂, and so on.
Two person teams: Compute the optimal correspondence. You can do this with an exponential search procedure, since the number of regions will be small.
Finding correspondences is in Chapter 11 and was covered in the 2D Object Recognition lecture. Once you have the correspondence, the distance between I₁ and I₂ should be some function of:
- difference in attributes of corresponding regions
- difference in number of regions
- (Two person teams) difference in region relationships
We would like you to experiment with at least 2 different distance measures (4 different distance measures for two-person teams) and tell us what they were and which one worked best and is used for your final results. Distance measures can vary in what attributes you used and also in the kind of metric you use to compare 2 vectors. Euclidean distance is only one such metric.

Create a query system in which you can select a query image Q and compare it to each image I in your database by computing RELDIST(Q,I). Then you order the images in the database according to their distance to the query and return the ordered list (the images and associated distances). Extra credit: Create a simple GUI for performing these queries.

Test your system as indicated below.
- The database should consist of the 40 provided images.
- The following images should be used as queries:
  1. beach_2
  2. boat_5
  3. cherry_3
  4. crater_3
  5. pond_2
  6. stHelens_2
  7. sunset1_2
  8. sunset2_2
- Use your distance measure to compare each query image to all 40 database images, recording the distances you get for all 320 tests.

Sample One Person Timeline

First week: Color clustering, connected components, and region attributes.
Second week: Develop distance measure and test.
Third half week: Write it up.

Sample Two Person Timeline

First week: Color clustering, connected components, region attributes and RAG data structure.
Second week: Develop distance measure and test.
Third half week: Write it up.

Turnin

You should turn in a report that describes all aspects of your system, and shows the required results. You should also submit all of your code. Here is a list of what must be included in the report.

Describe your color clustering and region-finding algorithm.
List the region attributes you used.
List the region relationships you used and explain how you determine the relationship between two regions (two person teams).
Describe your distance measure, including the region correspondence algorithm.

Use the provided thumbnail images to show the results for each of the required queries. Also include the distances between the query image and each result image. Print the images in order of ascending distance. Hopefully, the query image will have distance zero to itself, and similar images will have smaller distances than dissimilar ones.

Example Query Results for boat_2

boat_2
d = 0
boat_4
d = 0.05
boat_3
d = 0.07
boat_5
d = 0.07
beach_3
d = 0.12

beach_2
d = 0.13
beach_4
d = 0.15
crater_2
d = 0.16
boat_1
d = 0.20 ...

...

...

...

...

...

... ... ... ...
sunset1_5
d = 0.98

Write a report on your results.
Submit all the source code in a compilable state as well as the executables. Your source code must be well structured and well commented.

Report Template

Evaluation

Working implementation of all the required parts: 12 points
- Region attributes: 5 points
- Distance measure: 5 points (2 points for trying different distance measures + 3 points for discussing which worked best and why.)
- Query system: 2 points
Quality of code including code structure, comments and documentation: 3 points
Completion and quality of the report: 5 points
Quality of results: 5 points
Extra credit:
- Create a simple GUI for performing queries: 5 points
- Experiment with other features: 1-5 points, depending on your work.

Dropbox

Upload your report and code to the homework dropbox HERE.

Homework is due on December 8 (Monday) by 11:59 PM. Please plan your work early. You can submit until December 10 but you will lose 5% of your grade for every late day.

This is a one-person assignment. You may discuss it, but please turn in your own individual work.

boat_2 d = 0	boat_4 d = 0.05	boat_3 d = 0.07	boat_5 d = 0.07	beach_3 d = 0.12
beach_2 d = 0.13	beach_4 d = 0.15	crater_2 d = 0.16	boat_1 d = 0.20	...
...
...
...
...
...
...	...	...	...	sunset1_5 d = 0.98