Project 4: Face Recognition and Detection


Face Recognition

Face Recognition was implemeted using a Pricipal Component Analysis approach. Each image in the given library was converted to grayscale and treated as a vector.

First, the average face of the face images provided was calculated:

Then, the "eigenfaces," which represent the face hyperplane, were obtained by computing the top K eigenvectors of the entire image set. Here are the top 10 eigenfaces ranked by eigenvalue, largest on left:

Face recognition was then tested by obtaining an image of a face from one library and seeing if it could be correctly identified in another. A smiling and non-smiling library was used to recognize faces. The non-smiling library was used to create the hyperplane, while the smiling faces were used as inputs. The faces are recognized by projecting the given image on to the hyperplane and obtaining the mean-squared error between it and the hyperplane. The following graph shows the relationship between the number of eigenfaces used to generate the hyperplane and the accuracy of the recognition:

The trends suggest that there is a limit on how many eigenfaces are optimal for recognition. There seems to be a peak around 11 after which the number of eigenfaces used no longer matters.

The recognition has made a few recurring mistakes. It would consistently mistake the following faces:

 with

as well as

 with  

and others. The face below on the left was recognized correctly for 7, 9 and 11 eigenfaces, but then an increase in the number of eigenfaces actually hindered this recognition and from then on the face was recognized as the face on the right:

 with

In the first case, the correct match does not even make it into the top 10! It is most likely due to the fact that this is the only face that has closed eyes, while it's smiling counterpart has open eyes. Other mismatches have the correct match very close to the top of the list, usually in the top 3. These mistakes seem unreasonable from a human perspective, but perhaps are logical otherwise. The algortihm seems to have tremendous emphasis on the orientation of the face, in particular the location of the main features; eyes, nose, mouth. If those features are displaced, chances of correct recognition diminish.


Face Detection

Face detection was implemented by looking at every single face-sized window in the input image, projecting it on to the face hyperplane and seeing if it was close enough to a threshold to be considered a face. If it was, it was added to a list of potential faces, assuming it did not overlap a previous, and better contender. The top N contenders were then taken, depending on the input parameter (# faces to be found).

First, lets see how this ranks the pixels in the image as potential faces. Using the following image:

I can obtain the ranking of every pixel as a center of a potential face window. The ranking is just the inverse of the MSE normalized:

The brightest areas represent the locations with the highest scores, as the MSE is lowest. As can be seen the algorithm does a fairly good job in this case, highlighting the three face centers as the best areas.

Using the following image as input and selecting one face to crop:

 

I was able to obtain the following output at a scale of .43:

Here I must admit there is a bug with my program, I must do all the scale inputs myself, as the automatic scale stepping part of the program currently only outputs the faces found at the last step. But...it can all be fixed with a little script. The only real problem is when an image has faces of different scale, as we will see later. Besides that, being able to find a face that is not in the library (and smiling) is still quite an achievement, perhaps this really works...

Marking the faces is much more iteresting however, as there are much more results to be concerned with, here's an image of our class with 27 faces found, at a scale of 1.1:

As can be seen, six faces were not detected (or rather they were thought to be elsewhere), which is actually very consistent with the results obtained in the recognition part of the experiment, six or seven faces were always mismatched to something else. False positives are definetly present. Of particular interest is the algorithm's effort to interpret a crotch as a face, which can be better seen in the following examples, the scales are .48 and 1 respectively:

Note that the crotch is detected only in dark khaki.Well, knowing the algorithm's weakness, we can cover the distraction and proceed to obtain very reasonable results:

Actually this problem is related to my bug, which was mentioned earlier. It turns out that the person on the right is best identified at a scale different than the other two people, thus this problem would not occur if the scaling is allowed to step through and keep the best matches irrelevant to scale.

Another problem with the current algorithm is that it is easily fooled by low texture areas. Using the following image as input at a scale of .57:

The MSE scores are as follows:

The face centers can easily be seen as glowing dots, however, there are numerous other large areas that are of nearly identical intensity. These areas can sometimes fool the algorithm, as can be seen here:

A potential fix to this problem would be to look at "dots" i.e. small clusters of intensity vs. large areas. This would definetly improve the accuracy of the current algorithm. The current method definitely requires some refining, here's an image full of people:

And here is the result at a scale of 1.2:

Note how careful the algorithm is to avoid any potential face. Looking at the input image, it would seem hard NOT to find a face, yet, here we are. This is most likely due to the fact that each face has a different background behind it and is not as distinct as those found in earlier images. Here the background often changes color and texture behind a face, making it harder to distinguish.

Vehicle Finding

Relating this project to my research on vehicle tracking, I have tried this approach on some surviellance camera footage. The image library was very small, only 3 vehicles:

The average car obtained was:

While the "eigencars" were:

All feature a distinct bumber line and two headlight regions. The input used was a typical grayscale surveillance camera shot of 320 x 240 resolution:

Note that none of the cars in the image are present in the library. The results seem very promising, at a scale of .5 and 3 cars to mark the algorithm does its job:

The vehicles whose "faces" are visible are detected very well, the advantages of using this method in this context is that the viewpoint is unlikely to change from your that which you had in the library (assuming you learned on the spot) thus providing much better recognition abilities. Also, the features of vehicles seem less subtle than those of a human face.

The problem with this approach of course is that it takes about 20 secs to identify the vehicles, which is unadmissable in any real time application. One way to reduce the computation time would be to watch and learn the "active regions" - cars will always follow a similiar trajectory to the stopline, if only pixels near that trajectory are monitored, there are far fewer windows to consider. 

Extra Credit

An extra credit method called verifyFace was implemented in the code but never tested. Cars were examined and found to have better results than people.

Conclusion

Face detection and recognition is a difficult problem indeed. This algorithm seems to fair well sometimes, and would do much better with a few extra helpers implemented such as the cluster detector and a color cue filter. However, changing texture areas as well as changing lighting conditions will still have a significant impact on this algorithm. Vehicle tracking with this algorithm would probably work better than for people due to the extra constrains, but the lengthy computation time is likely to weigh it heavily against something like feature tracking or background subtraction. All that said, this was still fun!