|
|
|
The purpose of this assignment is to try out the Information Gain technique for constructing decision trees in the context of trying to identify skin pixels in color images. There are two parts to the assignment: the computer vision part and the decision tree part.
In the computer vision part, you will read color images (RGB) into three two-dimensional arrays and extract feature vectors from some of the pixels. The training image set consists of 15 color images of people. Each of these has a corresponding mask image in which the pixels of the skin are 1's and the rest 0's. You will choose from each training image a subset of the skin pixels and a subset of the nonskin pixels for use in training.
Suppose you have a color image stored in three 2D arrays R, G, and B. Suppose that you have identified pixel [i,j] as either a skin pixel or a nonskin pixel and want to add it to the training vector set. You will construct a feature vector with 11 components. Ten are features, which we will later refer to as F1 to F10, and the last one is the class. The vector will be:
[R, G, B, r, g, b, H, S, V, T, C]
The final set of training vectors should come from all 15 face images, which are of people from several different races. All of the skin pixels and all of the nonskin pixels are too much. You should select 50000 skin pixels and 50000 nonskin pixels from the whole set of training images as your training set (less if you are short on space). As you create the training vectors, write them to disk, so you can then use the data set to work on creating the decision tree without running the computer vision part over and over again. You probably also want to examine some of the training vectors yourself, to make sure they make sense. Do this on a small subset.
The second part of the assignment is to use the training vectors to construct a decision tree classifier and use it to label the pixels of the test set of images as skin and not skin. Then you will display the results with code we provide. You should use the Information Gain method of constructing the classifier, which is in the lecture notes and book. Here are some details.
Here is an example using just the features R and G:
The plot above shows the values of R vs G in deciding the type of a pixel. The horizontal axis is R and the vertical axis is G. The red spots are skin and the blue spots are not skin. The following is a handcrafted decision tree that can classify pixels as skin.
Decision tree using two features: R and G:
R <= 56
| G <= 38
| | R <= 42
| | | G <= 27
| | | G > 27
| | R > 42
| | | G <= 20: 0 (171.0/43.0)
| | | G > 20
| G > 38: 0 (8356.0/174.0)
R > 56
| G <= 168
| | R <= 96
| | | G <= 78
| | | G > 78
| | R > 96
| | | G <= 109
| | | G > 109
| G > 168
| | R <= 198
| | | R <= 191: 0 (3133.0/99.0)
| | | R > 191
| | R > 198
| | | G <= 198
| | | G > 198
In the project, we provide skeleton code that consists of a file "face.java" and a file "decisionTree.java" (Click to download) (New Version of face.java with overlay capability) . The face class contains steps for reading and writing image files, as well as extracting image features. In "decisionTree.java", you will need to fill in code for the "decisionTree" class, including class members that store the data structure of the decision tree, as well as the "Train" and "Test" functions of the class. You will need to use some of the computer vision code here to produce the results. You can also add other functions in the class. The Java Imaging (JAI) Library needed for running the code is here.
In the main file, training is performed by calling on the method "decisionTree.Train(int dataN, float[][] feature, boolean[] label)" in "face.java", where "dataN" is the number of training data, "feature" is a 2-D array with size "dataN*10" which stores the feature vectors for all training data, "label" is an array which stores the labels of all training data (true represents face pixel and false represents non-face pixel).
To test whether a pixel is a face pixel or not, the main file calls "decisionTree.Test(float[] feature)" in "face.java", where "feature" is an array of length 10, representing the feature vector of the current pixel. The returned value is a boolean type, either true (representing face pixel) or false (representing non-face pixel).
You will test the classifier on both the training set and the testing set of color images. For the training set, you know which pixels are skin and which are not from the binary masks. Thus for each training image, you can compute the following statistics:
Use the average over all training images of these statistics to produce a confusion matrix for the results of classifying the training set. For the testing set, we don't have the masks, so you will only be able to show the results as images using code we provide.
Write a report for your project that describes any important details of your program and shows the results. For the training set, you will show for each training image, its 4 test statistics and the original image with classified skin pixels overlayed. Also show the confusion matrix for the whole training set. For the testing set, you will show for each testing image the original image with classified skin pixels overlayed. Discuss your results: how well the classifier worked and how you think it could be improved. In summary, for each set of experiments you will show
Run the following two experiments for both the training set and the testing set, and write the results in the report.
Discuss your results: how well the classifier worked and how you think it could be improved.