# CSE 576: Lab Exercise 2 Clustering for Image Segmentation

### Due Date: April 30

In this assignment, you will segment images into regions of similarly-colored pixels, using the K-means algorithm discussed in class.

 original image false-colored labeling

Your K-means implementation should take a color PPM image, such as the one shown above on the left, and a number of clusters K. It should output a grayscale image with each pixel labeled by its class. (You can pass this image to autocolor to produce a false-colored image such as the one above.)

You can also pass this output image to conrgn to separate out the connected components within each class. Example output looks like:

 false-color labeled image

You should use the input images fb06931, fb06932, fb06933, fb11117, fb11122, fb23460, and fb23469, available from the image data page. For each input image, turn in printouts (preferably color) of

• the original image
• the false-colored labeling
• the connected-components output
Also, note how many classes you divided the pixels into (the value of K) and the number of connected components produced.

This technique will likely produce too many connected components. How could you apply techniques covered in class to clean up the connected components results? Answer this question in your submission. You don't need to implement your answer, just describe a procedure and argue that it would work.

You should also turn in a listing of your K-means implementation.

Extra Credit:
• (Easy)
Color clustering is a kind of vector quantization and could be useful for image compression. For example, the following image only uses 7 colors to approximate fb06931. To earn this credit, approximate fb11117 and fb23469 by only using K colors.
 7-color version of fb06931

The K-means algorithm is sensitive to the initial clustering setup. The simplest initial setup is to assign them randomly (what you've probably implemented). There are some other variations for the initial setup:
1. Uniform linear: The initial cluster means are selected uniformly along the line defined by [0 0 0] and [1 1 1] in the RGB cube.
2. Uniform cube: The initial cluster means are selected uniformily throughout the RGB cube defined by [0 0 0] and [1 1 1]. This only works for segmentations where the cube root of the number of clusters is a positive integer greater than one. For example if the number of clusters is eight then the initial cluster means will be [0 0 0], [0 1 0], [1 1 0], [1 0 0], [0 0 1], [0 1 1], [1 1 1], and [1 0 1]. If the number of clusters is nine, [0.5 0.5 0.5] could be used.
3. Statistical: The initial cluster means are selected uniformly along the line defined by mean of the samples minus one standard deviation and the mean of the samples plus one standard deviation, [mean-sigma mean+sigma].

To earn this credit, to compare the random approach to some or all of the other three according to the following variables: weighted average cluster variance, number of iterations and missclassification rate when applying to the noisy version, noisy, of the synthetic image, gtruth (K=9). (The division of regions are coordinate 90 and 170 along x-axis and y-axis.) List results and statistics of these three measures for the chosen approaches.
 gtruth noisy

Interesting Sites (in Java Applet):
• Color cube visualization: It helps you to understand the color cube and color histogram used in content-based image retrieval. Note that it doesn't seem to work for your image as they claim.
• ZOMAX: You can play with many algorithms taught in the class on line.

 Last updated: 04/18/99 at 10PM.