CSEP 576 Winter 2008: Project 1
Color Clustering and Skin Finding

Date released: Wednesday, January 09, 2008

Date due: Sunday, January 27, 2008 by 11:59pm

(Late policy: 5% off per day late till Tuesday 01/29/2008)
Download the C++ skeleton code for project 1.
Matlab users create your own code, do not use Matlab K-Means function.

Download the images you need:
    1. Faces training set
    2. Faces testing set (updated with all P6 header files)
    3. Scenes

Download .doc template for the report (proj1-report.doc). You are free to use other text processing tools like latex etc, however make sure that you have the same sections in your report.

Download  project grading guidelines for the project here.

In this assignment, you will segment images by color, using the K-means algorithm and some variants. You will also try to classify the skin areas on the face images using the generated clusters.

face image segmented by color skin pixels highlighted

What You Should Do

Part I: Color Clustering

  1. First implement a basic K-Means Clustering Algorithm (your own code; do not use the MatLab K-Means). Try two color spaces: RGB and one other (e.g. normalized RGB, HSV etc).
    1. First with randomly selected cluster seeds
    2. Next try sampling pixels from the image to find the seeds. Choose a pixel and make its value a seed if it is sufficiently different from already-selected seeds. Repeat till you get K different seeds.
  2. Now develop and implement a smarter K-Means variant (your own code again) that determines the best value for K by evaluating statistics of the clusters. Some possible methods to try:
    1. You can start from the color histogram, as K is closely related to the number of peaks in the histogram. Not all the peaks are necessary as you want only the dominant ones, so you should pick the ones that occupies a certain portion of image in terms of pixels.
    2. You can also try clustering using different Ks, and pick the best one. The metric could be related to the distance between clusters and the variance within each cluster.
    3. You are free to come up with your own ways.

  3. Test each variant of the above on both the face images and the scene images and report your results.
Your k-means code should output an image that can be used to show your clusters. This can be a grayscale image where each pixel's value is the number of the cluster to which it has been assigned, which the provided autocolor function will transform into something more easily interpretable. It could also be a ppm where each pixel has the mean color of the cluster it was assigned to (this generally makes a prettier picture, but it can be harder to tell the number of clusters), or better yet output both.

Part II: Skin Classification

The goal of this part is to develop a very simple skin detector from the results of Part I. For this part of the homework, we recommend that you work in normalized RGB space.

The common RGB representation of color images is not suitable for characterizing skin-color. In the RGB space, the triple component (r, g, b) represents not only color, but also luminance. Luminance may vary across a person's face due to the ambient lighting and is not a reliable measure in separating skin from non-skin regions. Luminance can be removed from the color representation in the normalized RGB space or chromatic color space. Chromatic colors, also known as "pure" colors in the absence of luminance, are defined by the simple normalization process shown below:

r = R/(R+G+B)

g = G/(R+G+B)

Note: r+g+b = 1.

  1. Start with the face training image set (face-training.zip).
  2. Run your K-means algorithm on the face training set to get K clusters with small K, ie K < 9.
  3. Examine the clusters in a color-space histogram (can be by hand or automatic) and come up with a characterization for skin pixels. That is, create a classifier (which can be as simple as an if-then-else statement) that classifies pixels as "skin" or "not skin" based on the color values. One way to go is to model the skin color distribution as a Gaussian. With this Gaussian-fitted skin color model, you can now obtain the likelihood of skin for any pixel of an image.
  4. Run your skin finder on the face test image set.
  5. Report on its performance.

What You Should Turn In

1. All of your code that is created for the above Part I and II. Your code must be well commented and in the ASCII format so that the grader can compile them to working binaries.    
    You should put headers on all your routines with the following information:
2. Write a brief report on the performance of your smart K-means algorith as well as the skin classification algorithm, and provide examples (a few best, worst and average results will be fine).
    Your report must clearly describe and explain the algorithms you developed.Also include some discussions on failure examples or limitations for your approach; this will shed light on future improvements.         This report can be a Word document or  pdf document. HTML or webpages are not accepted.
    Your report must include output images of your algorithm.

Please email your code and report to Indri (indria@cs) in a zip file with your name as the zip file name e.g. JohnDoe.zip, by Sunday, January 27, 2008 11:59pm.