CSE/EE 576: Homework Set 2
Color Clustering and Skin Finding

Download the software you need
Download the images you need: faces(zip, tarred gzip) and scenes (zip, tarred gzip)

In this assignment, you will segment images by color, using the K-means algorithm and some variants. You will also try to classify the skin areas on the face images using the generated clusters.

face image segmented by color skin pixels highlighted

What You Should Do

Part I: Color Clustering

  1. First implement a basic K-Means Clustering Algorithm (your own code; do not use the MatLab K-Means) using RGB space.
    1. First with randomly selected cluster seeds
    2. Next try pre-clustering, that is you sample a small portion of the image to find the seeds
    3. Next with a method that you develop for selecting the seeds intelligently from the image using its color histogram (again your code). The seed selection should be automatic, given the histogram and the number of seeds to be selected. One way to go is to find the peaks in the color histogram as candidates for seeds.

  2. Now develop and implement a smarter K-Means variant (your own code again) that determines the best value for K by evaluating statistics of the clusters. Some possible methods to try:
    1. You can start from the color histogram, as K is closely related to the number of peaks in the histogram. Not all the peaks are necessary as you want only the dominant ones, so you should pick the ones that occupies a certain portion of image in terms of pixels.
    2. You can also try clustering using different Ks, and pick the best one. The metric could be related to the distance between clusters and the variance within each cluster.
    3. You are free to come up with your own ways.

  3. Test each variant of the above on both the face images and the scene images and report your results.

Part II: Skin Classification

The goal of this part is to develop a very simple skin detector from the results of Part I. For this part of the homework, we recommend that you work in normalized RGB space.

The common RGB representation of color images is not suitable for characterizing skin-color. In the RGB space, the triple component (r, g, b) represents not only color, but also luminance. Luminance may vary across a person's face due to the ambient lighting and is not a reliable measure in separating skin from non-skin regions. Luminance can be removed from the color representation in the normalized RGB space or chromatic color space. Chromatic colors, also known as "pure" colors in the absence of luminance, are defined by the simple normalization process shown below:

r = R/(R+G+B)

b = B/(R+G+B)

Note : Color green is redundant after the normalization because r+g+b = 1.

  1. Start with the face training image set. You should use half of the face images as training, and leave half of them as testing together with the scene images. Note that you should include skins of all ethnicity in your training set for better performance.
  2. Run your K-means algorithm on the face training set to get K clusters with small K, ie K < 9.
  3. Examine the clusters in a color-space histogram (can be by hand or automatic) and come up with a characterization for skin pixels. That is, create a classifier (which can be as simple as an if-then-else statement) that classifies pixels as "skin" or "not skin" based on the color values. One way to go is to model the skin color distribution as a Gaussian. With this Gaussian-fitted skin color model, you can now obtain the likelihood of skin for any pixel of an image.
  4. Run your skin finder on the face test image set.
  5. Report on its performance.

Things to keep in mind:

What You Should Turn In

Remember that you should put headers on all your routines with the following information:

You should also report the performance of your smart K-means algorith as well as the skin classification algorithm by providing examples. Also include some discussions on failure examples or limitations for your approach; this will shed light on future improvements.

If you choose to do your homework in Matlab, or C++ on linux, please turn in your homework to Colin (kzheng@cs).
If you choose to do your homework in C++ on Windows, please turn in your homework to Yi (yi@cs).
Your homework turn-in can be in Word document, pdf document or webpages.

Homework is due on April 30th, 5pm. Please plan your work early.