EE/CSE 576: Content-Based Retrieval with Regions and Relationships

Images
Download the images you need: zip, tarred gzip
Download the image thumbnails: zip, tarred gzip, PowerPoint

(The image thumbnails, in jpeg format, should help keep your write-up a manageable size.)

In this assignment, you will develop a content-based image retrieval system that retrieves database images based on the similarity between their regions and those of a query image, using both region attributes and spatial relationships.


mountain image	segmented by color


another mountain image	segmented by color

What You Should Do

The main idea is to represent each image (the database images and the query image) by a set of regions obtained from color clustering, their attributes, and their simple spatial relationships.

For each image in the database you will perform the following procedure:
1. Run your color clustering on it to obtain a labeled cluster image.
2. Run connected components on it to obtain a labeled segmentation image. Possibly perform some noise cleaning/merging operations to improve the regions. (Something that you can do the same for all of them.)
3. For each major region (use a size threshold), compute at least the following attributes
  - size
  - mean color [(R,G,B) or whatever space you like]
  - a few texture attributes (for example, LBP and co-occurrence are easiest)
  - centroid (row, column)
  - bounding box or other representation of where the region is
  - RAG (region adjacency graph). This should probably be represented as a set of adjacency lists, one for each node (region) in the graph.
4. For each pair of (major) adjacent regions in the RAG, find and record the following possible relationships: inside, above_adjacency, below_adjacency, left_adjacency, right_adjacency, other_adjacency (if none of the others are satisfied by whatever requirements you impose to define them).
5. Store the attributes and relationships in a data structure that you define and that can be both used in memory and also saved in a file so you don't have to keep rerunning the analysis. We will refer to this structure for an image I as DS(I).
Develop a distance measure that will compute the distance RELDIST(I1,I2) between DS(I1) and DS(I2) for any two images I1 and I2. Ideas for this distance measure can come from Chapter 8, Chapter 11, and your own ideas. Basically, you will need to find the best correspondence between regions of I1 and regions of I2 by whatever algorithm you choose. (Keep it simple; a greedy approach is OK for basic assignment.) Then once you have the correspondence, you can develop the error in terms of region attributes of corresponding regions, missing or extra regions, and relationship errors.

Create a query system in which you can select a query image Q and compare it to each image I in your database by computing RELDIST(Q,I). Then you order the images in the database according to their distance to the query and return the ordered list (the images and associated distances). A fancy user interface is not required for the basic assignment.

Test your system as indicated below.
- The database has 40 ppm images for the tests.
- The following images should also be query images.
  1. beach_2
  2. boat_5
  3. cherry_3
  4. crater_3
  5. pond_2
  6. stHelens_2
  7. sunset1_2
  8. sunset2_2
- Use your distance measure to compare each query image to all 40 database images, recording the distances you get for all 320 tests.

Sample Timeline

First week: Color clustering, connected components, and a few attributes.
Second week: The rest of the attributes, the RAG, and the relationships.
Third week: Develop data structure and distance measure. Start testing.
Fourth week: Finish testing and write it up.

Extra Credit

Here are some ideas for improvements to the basic assignment, any of which will earn some extra credit:

Develop a better algorithm for finding the best correspondence between regions of the query image and those of the database image, using both attributes and relationships.
Develop a nice GUI for your system.

What Your Report Should Contain

Describe your system and give an outline of the training and the testing procedures

Describe your color clustering algorithm and how you improve the regions. Give several examples to show your color clustering results and improved regions.

List the attributes you selected to describe the regions. Explain why you want to use them.

List the relationships you used to describe the adjacent region pairs and explain why you select them.

Describe the definition of your distance measure and explain the motivation.

A section of the report that gives the results of the tests. For each of the 8 query image, use the THUMBNAILS to show the query and the results with their distances printed in ascending order, ie. smallest distance first. Hopefully, the one with smallest distance will be the query image itself, which ought to get a zero when compared to itself, and the one with the largest distance will be quite different.

(Example) Query Results for boat_2

boat_2 d = 0	boat_4 d = 0.05	boat_3 d = 0.07	boat_5 d = 0.07	beach_3 d = 0.12
beach_2 d = 0.13	beach_4 d = 0.15	crater_2 d = 0.16	boat_1 d = 0.20	...
...
...
...
...
...
...	...	...	...	sunset1_5 d = 0.98

A concluding section of the report that discusses your results.

An appendix containing your commented code and a readme file to describe how to run your code

Remember that you should put headers on all your routines with the following information:

NAME (of you)
DATE
TITLE (of routine)
PURPOSE (of routine)
PARAMETERS (of routine)