CSE455: Homework 2
Color Clustering and Skin Finding

Download the software you need: hw2.zip
Download the images you need: faces.zip and scenes.zip

In this assignment, you will segment images by color, using the K-means algorithm and some variants. You will also try to classify the skin areas on the face images using the generated clusters.

face image segmented by color skin pixels highlighted

What You Should Do

Part I: Color Clustering

  1. First implement a basic K-Means Clustering Algorithm. Use normalized (r,g) space where r = R / (R + B + G) and g = G / (R + B + G).
    1. First with randomly selected cluster seeds
    2. Next try sampling pixels from the image to find the seeds. Choose a pixel and make its value a seed if it is sufficiently different from already-selected seeds. Repeat till you get K different seeds.
    3. Next with a method that you develop for selecting the seeds intelligently from the image using its color histogram (again your code). The seed selection should be automatic, given the histogram and the number of seeds to be selected. One way to go is to find the peaks in the color histogram as candidates for seeds.

  2. Now develop and implement a smarter K-Means variant (your own code again) that determines the best value for K by evaluating statistics of the clusters. Some possible methods to try:
    1. You can start from the color histogram, as K is closely related to the number of peaks in the histogram. Not all the peaks are necessary as you want only the dominant ones.
    2. You can also try clustering using different Ks, and pick the best one. The metric could be related to the distance between clusters and the variance within each cluster.
    3. You are free to come up with your own ways.

  3. Test each variant of the above on both the face images and the scene images and report your results.

Part II: Skin Classification

The goal of this part is to develop a very simple skin detector from the results of Part I, sticking with normalized (r,g) space.

  1. Start with the face training image set. You should use half of the face images as training, and leave half of them for testing. Also use the scene images in your tests to see if your program thinks it finds skin in them. Note that you should include skins of all races in your training set for better performance.
  2. Try several values of K and choose the best (by hand) by how useful the clusters are for step 3.
  3. Examine the clusters in a color-space histogram (can be by hand or automatic) and come up with a characterization for skin pixels. That is, create a classifier (which can be as simple as an if-then-else statement) that classifies pixels (again your code) as "skin" or "not skin" based on the color values.
  4. Run your skin finder on the whole face image set: both the training and testing images.
  5. Report on its performance on both sets.

Things to keep in mind:

Turn in:

  1. All of your code that is created for the above Part I and II. Your code must be well commented and in the ASCII format so that the grader can compile them to working binaries.
  2. Write a brief report on the performance of your smart K-means algorithm as well as the skin classification algorithm, and provide examples (a few best, worst and average results will be fine). Your report must clearly describe and explain the algorithms you developed. Also include some discussions on failure examples or limitations for your approach. This part can be either in MS Word or pdf format.
    NOTE: Submission in the html format (web pages) is no longer allowed.
  3. Your report must include images that are output from your algorithms for the purpose of clear exposition. The more the better.
Please email your homework to Masa (mkbsh@cs). Homework is due on Feb 1 (Thu) by 11:59 PM. Please plan your work early, much earlier than homework 1, as it takes MUCH longer to do.