Project 4: Eigenfaces

In this project we create a facial recognition system which reduces each facial image to a vector, then uses Principal Component Analysis to find the space of faces. Below are some results.

Experiments

Testing recognition with cropped class images

Procedure

Use the cropped, non-smiling students (in class_nonsmiling_cropped) to compute 7 eigenfaces.
Use the same set of images to compute a userbase.
Have the program recognize the cropped, smiling student images (in class_smiling_cropped) in the smiling userbase.

Questions

How many faces did the program recognize correctly? Incorrectly?
- Of the 22 images in the class, only 3 were incorrectly recognized (and therefore, 22 - 3 = 19 images were correctly recognized). Below are the three mis-matched faces and the top five matches.
For instances where the program was wrong, what was the average position of the correct answer in the list of closest matches?
1. Of the three mismatched faces, the average position of the correct answer was about 4.333.

If you recognize a female face, will the second, third and fourth matches usually be female? How about for male faces? Give supporting data.

The results do not support a gender preference (to match males with males or females with females). The second match was almost always of the same gender, but half of the top five were of the opposite sex.

In a real-life scenario with thousands of users would you use the entire user set to compute the face space? Why or why not?

No. I would initially try to accurately sample the users to have approximately the correct proportion of males, females, and races as the expected proportions of the problem I wish to work on. These proportions would change depending on the purpose of the problem and the location.

Why might it be better to use a face set independent of the user set to compute the eigenfaces?

Since the conditions in the images of the problem are not likely to be the same as those in the user set, the program will not be trained so strongly to the conditions of the user set.

The average face and seven eigenfaces constructed.

The average face and twelve eigenfaces constructed.

The three faces mis-matched and the five best matches.

Recognizing the undergraduate faces

Procedure

Use the cropped undergraduate students (in ugrads_cropped) to compute 12 eigenfaces.
Use the small set of class undergraduate images (in class_ugrads_cropped) to compute a userbase.
Have the program recognize the cropped, non-smiling student images in this userbase.
Repeat your experimentation with 5, 10, 15, and 20 eigenfaces. Remember to regenerate the userbase each time.
Why is it best to use as few eigenfaces as possible while still getting good results?

As the number of eigenfaces increases, the face-like quality of the eigenfaces decreases (the eigen faces with smaller eigenvalue are simply used to better refine the space). With eigenfaces which do not have much face-like qualities, other images which are not faces will more likely fall within the face space (closer to these extra eigenfaces).

Questions

Of the students in the class set who are also in the class undergraduate set, how many did the program recognize correctly? Incorrectly?
- In the best case, four students were correctly recognized. In the other cases, only four of the students showed up in the top five best matches (infact, the same four which were correctly matched).
Are the incorrect identifications reasonable? Do they look similar to the actual person? Give some example images.
- No, the incorrect identifications are not reasonable because not everyone looks like the same four people.
Why does the program perform more poorly in this recognition task than the previous one? Give at least three reasons.
- The difference in conditions when taking the pictures factors largely into this. Although the images are normalized before calculations are made, different lighting conditions will cause different reflections on the skin and eye glasses. Secondly, although not as much of a problem, the images in the udergraduate set are not aligned as well as those in the class set and our algorithm is very particular in this way. Finally, and even less important, our class set is not necessarily representive of the race and gender proportions of the undergraduate set which influence the eigenfaces computed.
How did changing the number of eigenfaces used change your results? What number worked best?

Too few eigenfaces did not span large enough a hyperplane and too many images were projected close to faces not similar. Too many eigenfaces might have allowed for minor details to be exaggerated by adding more weight to less important eigenfaces with less face like qualities. I found that 12 or 15 eigenfaces worked as well as it was going to get (and on other images 12 eigenfaces has given significantly better results than a smaller number).

The five students who looked like everyone else (or the five students that everyone else looked like).

Cropping the undergraduate faces

Use the eigenfaces file computed in the previous problem
Use your program to crop at least ten of the uncropped undergraduate images
Experiment with min_scale, max_scale, and step parameters to find ones that work robustly and accurately without taking more than 10 seconds or so to run on each image

Questions

What min_scale, max_scale, and scale step did you end up using?
- A min_scale of 0.2, max_scale of 0.3, and step of 0.01 gave good results without taking too much time. Too large of a range took too long, and too small of a range or too large of a step caused for the program to not find the faces.
How many of your crop results look correct (cropped to the same part of the face as the pre-cropped images)? How many look incorrect?
- All of the cropped faces looked the same as the pre-cropped images when the min_scale was 0.1, the max_scale was 0.3, and the step was 0.01, but these took too long to compute. All except one image was correct when the step was changed to 0.5, four were incorrect when the step was 0.7. The images looked reasonably well on a min_scale of 0.2, max_scale of 0.3, and step of 0.01 and this was quick to compute.
What is the problem with using a min_scale that is too small?

None of the images I tested on were incorrectly cropped due to a min_scale being too small. But it did take so long that I did not try smaller scales (changing only the min_scale and keeping the max_scale and the step the same caused the program to scan the image many times even though small scales are quicker to scan). It is possible that with a scale too small, all of the detail would be lost from the image and it might end up having a small mse and be considered a face.

Ten cropped images.

Finding faces in a group photo

Procedure

Find two group photos, one of family or friends and one from the web of TV or movie characters, or famous people. Each image should have at least four faces.
Use your program to find the faces in the photos (use the crop=false option)
Does the program tend to think any non-face items in the image are actually faces? Give some examples.

The program works as well as I could have hoped on the image of my family. My older brother's head was turned and therefore did not appear face like to the algorithm. The image with Paul Erdos and other Mathematicians did not turn out as well. Four of the ten I had hoped for were found and the remaining six "faces" were detected throughout the image, mostly containing horizontal edges which might correspond to face like features.

Questions

Show the results of the face detections
What min_scale, max_scale, and scale_step did you use for each image?
- I used a min_scale of 0.5, max_scale of 0.8, and step of 0.01 for both images. These values were approximated before hand.
If there are any errors, explain why the program might have failed and how you could improve the input or the algorithm to correct this.

My older brother's face was missed because his face was turned. The program could be trained on faces in pointing in different directions, but this would probably add more error in other cases.

Marked family and marked image including Paul Erdos, famous Mathematician.

Extra Credit

I implemented image warping by simply traveling a given distance in the direction between the two images. Here are some results. Warped TAS and some other warped classmates

I also implemented verify face which determines whether an image is that of a given user's image given a magic number, max_reconstructed_mse.

Procedure

Use the cropped, non-smiling students to compute 6 eigenfaces.
Use the same set of images to compute a userbase.
Have the program verify the cropped, smiling student images in the smiling userbase. Test by verifying each student against his or her smiling face, as well as each student against a smiling face which is of someone else.
Experiment with various thresholds to find the one that gives the fewest false positives and fewest false negatives

Questions

What MSE thresholds did you try? Which one worked best? What search method did you use to find it?
- I wrote a function in main called verify faces which takes a userbase file, an eigenface file, a list of images, and a double max_reconstructed_mse. Then verify face is ran for every image in the list against every user in the userbase with the given max_mse. Whenever an image is matched with a user, the user is printed out under the image's name.
  
  When verifying the cropped, smiling students in class_smiling_cropped, any MSE threshold I chose worked, even down to 1.0. When I tried it on the class_undergrad_cropped list none were verified until I tried a threshold of 6000, and then only a few were matched. At 60000 only a few more were matched, but a lot more false postives occurred.
  
  Since I had a function to run all these verifications, it was easy simply to type in different values for the mse threshold to see what worked best.
Using the best MSE threshold, what was the false negative rate? What was the false positive rate?
- For the class_smiling vs class_nonsmiling, the false positive and the false negative were both zero."
In a real-life verification scenario, why might it be better to have a low false positive rate than a low false negative rate?

Actually, right now it seems like it might be better to have a higher false positive rate. I suppose it depends on the situation, but if we were scanning survelliance videos and there was a userbase of people we were trying to watch for, it would seem better to incorrectly alarm the security and have a person check the image and make a verification rather than to miss them if it wasn't a perfect match.

Then again, perhaps a lot of people look similar enough that the security would be constantly checking the images and no advantages would be made by having a program to find and verify faces.