
EE/CSE 576: Image Understanding
Spring 2001

EE/CSE 576: Lab Exercise 2
Clustering for Image Segmentation
Due Date: April 27
Download the software you may need.
(Code for reading and writing PPM images can be found in imgformat.h
and imgformat.c in the canedg directory.)
Download the test images.
In this assignment, you will segment images into regions of
similarlycolored pixels, using the Kmeans algorithm discussed in
class and given in Chapter 10.


original image
 falsecolored labeling after kmeans

Your Kmeans implementation should take a color PPM image, such as the
one shown above on the left, and a number of clusters K. It should
output a grayscale image with each pixel labeled by its class. (You
can pass this image to autocolor to produce a falsecolored
image such as the one above.)
You can also pass this output image to conrgn to separate out
the connected components within each class. Example output looks like:

falsecolor labeling after connected components

You should use the input images
P1010021, P7180319, Ssgp1908,
fb06931, fb11117 and fb23469, that are
available from the image data
page. For each input image, turn in printouts (preferably color) of
 the original image
 the falsecolored labeling
 the connectedcomponents output
Also, note how many clusters you divided the pixels into (the value of
K) and the number of connected components produced.
This technique will likely produce too many connected components. If
you want to get rid of the small ones, you will need to construct a
table of their areas (counts of pixels) and make a final pass over the
image to change tiny components to label 0 (which means background).
Finally, you should turn in a listing of your Kmeans implementation
including the headers as in Assignment 1 and internal comments to
clarify the code.
Extra Credit:
 Kmeans suffers from the fixed K problem. Implement MengHee's (or
any other) variant procedure and compare the results to the default
Kmeans results. For the comparison, set K higher than the number of
clusters the variant algorithm produced.
 Kmeans is also sensitive to the initial choice of the clusters.
The simplest initial setup is to choose the cluster centroids randomly
(what you've probably implemented). There are some possible variations
for the initial setup:
 Uniform: The initial cluster centroids are selected
uniformly along the line defined by [0 0 0] and [1 1 1] in the RGB
space or uniformly in the RGB cube.
 Randomly from data: The initial cluster centroids are selected
to be the RGB values of K randomly selected pixels in the image.
 Splitting: The initial cluster centroids are selected
as follows:
 Compute centroid of the whole data.
 Perturb the centroid (add small random numbers to its values)
and obtain a new centroid. Now we have two centroids.
 Run Kmeans with K = 2 and then perturb the centroids again to
obtain a total of four centroids.
 Run Kmeans with these centroids and continue splitting some of
the resulting centroids by perturbing until we have enough number of
centroids. These centroids will be the initial setup for the final
Kmeans.
To earn this credit, compare the effects of different initialization
methods using the noisy image shown below. Run Kmeans using different
initial settings and report the squared error (of Kmeans) and the
misclassification rate for each case. The groundtruth contains nine
regions with boundaries at 90 and 170 along both x any y axes. Therefore,
the misclassification rate is the number of pixels that were not correctly
assigned to the corresponding nine regions in the groundtruth image.


groundtruth
 noisy

Interesting Sites (in Java Applet):
 Color cube visualization:
It helps you to understand the color cube and color histogram used in contentbased
image retrieval. Note that it doesn't seem to work for your image as they claim.
 ZOMAX: You can play with
many algorithms taught in the class online.
Last updated on
04/15/01 at 11AM.