Will Johnson
CSE 455, Winter 09
Project 3
March 12, 2009

Eigenfaces for Recognition

Testing recognition with cropped class images

I found ten 25 x 25 eigenfaces based on the 24 pictures of non-smiling faces, by running the following:
main --eigenfaces 10 25 25 nonsmiling_cropped\list.txt eig.face
This produced the following faces:

Average Face

eigen face 0

eigen face 1

eigen face 2

eigen face 3

eigen face 4

eigen face 5

eigen face 6

eigen face 7

eigen face 8

eigen face 9
Then I computed a userbase, by commanding:
main --constructuserbase eig.face nonsmiling_cropped\list.txt nosmile.user
Next, I had the program recognize the cropped, smiling students. I used this command:
for %A in (smiling_cropped/*.tga) do (main --recognizeface smiling_cropped/%A nosmile.user eig.face 1) >> yam.txt

By examining the output, I observed that 14 people were correctly matched to their non-smiling selves. In spite of what the assignment says, 14 matches is 14/24 = 58.3% accuracy.

As instructed by the writeup, I varied the number of eigenfaces used, trying every odd number from 1 to 23. I assume this is what "granularity 2" means. Incidentally, I didn't try more than 23 eigenfaces, because there are only 23 meaningful eigenfaces. It can be shown relatively easily that the covariance matrix for 24 images has rank at most 23, so therefore there are at most 23 eigenvectors which have non-zero eigenvalues. Any vector perpendicular to these is an eigenvector of the covariance matrix, and there are plenty of vectors perpendicular to the 23-dimensional space spanned by the first 23 eigenvectors, so it's meaningless to discuss the 24th or 25th eigenvectors, which could be just about anything.

To see that the covariance matrix has rank 23 at most, note that the matrix A described on the first extra credit (which I implemented... there's really nothing to say about it, other than that it made things much faster, and I was able to calculate 200 by 200 eigenfaces in a matter of seconds...) has 24 columns (assuming there are 24 images), and they sum to zero (by definition of xmean). Therefore, the column space of A has dimension at most 23, so A has rank at most 23. Therefore, AAT has rank at most 23, too.

At any rate, I used this script to test each possible number of eigenfaces. In other words, I said

script 1
script 3
script 5
...
script 23,
carefully analyzing the output after each step. This produced output, such as this sample file, which was from 23 eigenfaces. The line
Face 'smiling_cropped/02.tga' recognized as being closest too:
0: nonsmiling_cropped\20; MSE: 19913.9
indicates that the 2nd smiling image was seen as closest to the 20th nonsmiling image, and the MSE between the two was 19913.9.

I put the results in this excel spreadsheet, and plotted the results:

It's clear from this plot that increasing the number of eigenfaces improves recognition, which is generally what we'd expect. Although the rate of recognition continues to increase with the number of eigenfaces, for up to 22 eigenfaces, it seems like the graph levels off at around 11 or 13 eigenfaces.

Using more eigenfaces increases the amount of time needed to recognize a face, for several reasons. First of all, to even compute the coefficients of the query image, it must be dot-producted with each of the eigenfaces, which takes time proportional to the number of eigenfaces. Also, to compare the query to each image in the userbase, we have to calculate a distance between vectors whose length is the number of eigenvalues. So we expect that the rate of time to process a query should be proportionaly to the number of eigenfaces used. Increasing the number of eigenfaces utilized also increases the amount of time needed to compute the initial database of users and eigenfaces. Since the graph seems to level off at 11 or 13 eigenvalues, perhaps the best number of eigenvalues to use is 12. In real life, the best decision would depend on the desired accuracy.

As expected, there were several errors in the recognition results. Several of these persisted to even the case where there were 23 eigenfaces. For example, the 2nd smiling image was consistently identified with the 20th nonsmiling image:

  smiling_cropped\02.tga    nonsmiling_cropped\20.tga    nonsmiling_cropped\02.tga
(the desired match)
  

When I checked what the top 10 images were for smiling_cropped\02.tga, this was the result:

Face 'smiling_cropped/02.tga' recognized as being closest too:
0: nonsmiling_cropped\20; MSE: 19913.9
1: nonsmiling_cropped\13; MSE: 39451.3
2: nonsmiling_cropped\03; MSE: 40071.3
3: nonsmiling_cropped\08; MSE: 40482.2
4: nonsmiling_cropped\09; MSE: 47792.3
5: nonsmiling_cropped\14; MSE: 50933.9
6: nonsmiling_cropped\12; MSE: 53313.5
7: nonsmiling_cropped\17; MSE: 54884.5
8: nonsmiling_cropped\02; MSE: 57241.4
9: nonsmiling_cropped\07; MSE: 59047.3
So it did eventually reach the correct match (nonsmiling_cropped\02), but it was rather far down on the list. It does seem that, if one blurs one's eyes, the two matched images do look fairly similar, and the 2nd smiling image looks different because it was cropped differently and the person's teeth are visible. So this match is marginally reasonable.

Another image which was consistently misidentified, even with 23 eigenfaces, was the 14th smiling image, which was generally identified with the 16th nonsmiling image:

  smiling_cropped\14.tga    nonsmiling_cropped\16.tga    nonsmiling_cropped\14.tga
(the desired match)
  

Again, I checked to see what the best matches were. This was the result:

Face 'smiling_cropped/14.tga' recognized as being closest too:
0: nonsmiling_cropped\16; MSE: 18091.4
1: nonsmiling_cropped\18; MSE: 19608.8
2: nonsmiling_cropped\19; MSE: 27616.1
3: nonsmiling_cropped\22; MSE: 27700.8
4: nonsmiling_cropped\12; MSE: 31488.5
5: nonsmiling_cropped\09; MSE: 31793.4
6: nonsmiling_cropped\04; MSE: 33314.9
7: nonsmiling_cropped\11; MSE: 33811.4
8: nonsmiling_cropped\02; MSE: 34562.6
9: nonsmiling_cropped\01; MSE: 34874.8
In this case, the desired image was not even one of the top 10 images. This is likely because the two images of 14 are rather different: the box is cropped closer to the eyebrows in the smiling picture, and the lips are spread wider. Since our algorithm has no conception of spacial proximity between the pixels in the image, it is not robust under minor shifts, and therefore probably can't identify that these two faces have anything in common. (This could be tested by permuting the bits in all the images by some fixed pattern, showing them to people, and seeing whether anyone could connect the smiling and non-smiling pictures of 14).

Another interesting point in these data is that, unlike the previous example, the closest two matches to smiling_cropped\14.tga are both approximately the same distance away. This would be a good example of what would be weeded out by using a "ratio of SSD" rather than a pure SSD match, as we discussed in the unit on feature-matching.

The likely cause of these "errors" is that the algorithm is doing what it's supposed to do. It's clear from the mathematics of the recognition algorithm that the algorithm is trying to determine which image in the database is closest to the query image in the MSE sense (i.e., the SSD distance or the l2 norm). This follows from the fact that the projection into the so-called "v coefficients" is orthogonal, in some sense. At any rate, it's not hard to see that when all 23 eigenfaces are used, this process is simply projecting the query image directly onto the (affine) subspace generated by the original faces, and then finding the closest face within that subspace. So therefore, this algorithm is doing nothing but finding the user face that is closest to the query (measuring distances in the SSD/MSE/l2 norm). Therefore, no amount of increasing the number of eigenfaces will make smiling_cropped\14.tga match with nonsmiling_cropped\14.tga -- the 16th nonsmiling image is simply closer, and nothing can be done about that. So in this mathematical sense, both of the mistakes shown above are "reasonable."

Cropping and finding faces

Using 10 eigenfaces from the previous section (those derived from the non_smiling images), I used the program to crop elf.tga:

main --findface group\elf.tga  eig.face 0.45 0.55 0.01 crop 1 cropped_elf.tga
This was quite sucessful:


The Original


The cropped image

Next, I used the same set of eigenfaces to crop a picture of myself:

main --findface myself.tga eig.face 0.4 0.55 0.01 crop 1 cropped_myself.tga
Again, this was pretty successful (though I should note that when I tried scales between 0.45 and 0.55 instead, it found my neck, possibly because my head was too big for such scales to work):


The Original


The cropped image

Next, I tried to find the 3 faces in the file group1.tga:

main --findface group1.tga eig.face 0.7 0.8 0.01 mark 3 marked_group_1.tga

The best value of scale I could find was from 0.7 to 0.8, and this found only one face:

The cause of this error is almost certainly flaws in the algorithm, which really has no rhyme or reason, especially with the hacks suggested in the write-up, such as dividing by the variance of the image and multiplying by the distance from the average face. It turns out that the variance of the selected region above the left man's head is higher than on anyone's faces. Also, that region is actually closer to the subspace generated by the faces (it returns a smaller error value when given to isFace than the correct location on the man's real face). This is likely caused by problems with the subspace generated by the nonsmiling faces. These images were taken under uniform lighting conditions, so we don't expect that the subspace they generate should be uniform under a lot of illumination transformations. There are countless other reasons why this algorithm isn't very likely to succeed at the moment. The point is that there's really nothing surprising about finding this sort of false negatives.

Finally, I tried to find faces in this photograph with many people:

main --findface UCF.tga eig.face 0.7 0.9 0.04 mark 13 marked_UCF.tga
The result:

In this case, none of the faces were found, (except for one, myself. This is likely do to the fact that I was in the database of nonsmiling_cropped images.) While the boxes might appear to be spaced approximately near people's faces, notice that almost every box is sitting on the boundary between two regions of varying contrast, such as someone's face and the background, or my coat and pants. This clearly shows the defects of the variance hack that was suggested in the assignment write-up. The variance technique was more successful with the images taken indoors in front of white walls, since in these cases, people's faces tended to not have sharp contrasts with their surroundings. however, for this case, in which everyone is standing in front of some dark shrubbery, the algorithm was pulled away from the proper locations to the neighboring areas of high-contrast. This example shows the uselessness of the algorithm we were given for general-purpose face detection.

It's possible that using a larger collection of initial faces would have given better results. One problem with using such a small collection of images is that it makes things not very resilient to minor spacing changes. For example, consider the average face:

While this image is substantially blurred, some of the features, such as lips and nose, seem to be concentrated in very specific locations. Consequently, if a test image has the lips or nose spaced slightly differently, it will create a great disturbance in the difference image. It seems like it would have been better for the algorithm to do some amount of gaussian blurring of the image before processing the image, since this would make it slightly more resilient to distortions and slight displacements.

Both group1.tga and the second group image give good demonstrations of false positives and false negatives, and I believe I've explained in detail why I think these errors tended to occur.

So apparently, the assignment has just been changed. Now I'm supposed to show one of the smiling group pictures, one of the nonsmiling group pictures, IMG_0002, and a group photo. Well, I did the group photo already, and IMG_0002 is at the end. Here are IMG_0003.tga and IMG_0006.tga, which I happened to have already done, by a fortunate coincidence:

Both of these were done with scales ranging from 0.45 to 0.55, with a gap of 0.01. For the IMG_0002.tga, I used 1.0 to 1.5 ranging by 0.01.

Verify Face

I calculated six eigenfaces from the nonsmiling_cropped images. Of course, they're just the first 6 of the 10 eigenfaces shown above. I also created a userbase for the nonsmiling images, using these 6 eigenfaces, in the manner explained above.

Now I knew form the first section, and the results of the script described therein, that typical valuse of the MSE between an image and it's correct match were around 15000, but some were much lower. Therefore, I first tested --verifyface with a threshold of 15000. For example, to test whether smiling_cropped\01.tga could be verified as nonsmiling_cropped\01.tga, I used this command:

main --verifyface smiling_cropped\01.tga nosmile.user nonsmiling_cropped\01 eig.face 15000
This produced the output:
MSE: 9395.38
Image 'smiling_cropped\01.tga' is a picture of nonsmiling_cropped\01's face.
To test this on all the images, I used this command:
for %A in (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24) do main --verifyface smiling_cropped\%A.tga
nosmile.user nonsmiling_cropped\%A eig.face 15000 >> verify_results.txt
After examining the results in verify_results.txt, I noted that 18 of the images were not verified, and only 6 were correctly verified. This is pretty unacceptable for most applications, so I raised the threshold to 20000:
for %A in (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24) do main --verifyface smiling_cropped\%A.tga
nosmile.user nonsmiling_cropped\%A eig.face 20000 >> verify_results.txt
However, with this threshold, there were still only six matches, in verify_results20000.txt.

At this point, I remembered that I was using a different number of eigenvectors, which would invalidate the idea of using similar numbers to what I had gotten in the first part of the project. I examined the numbers in verify_results.txt and verify_results2000.txt, and observed that the MSE values were actually on the order of 10000 to 60000. So I chose a tolerance of 40000, but this only verified 14 of the images, which is still kind of unacceptable, so I ended up choosing 50000 for the cutoff:

for %A in (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24) do main --verifyface smiling_cropped\%A.tga
nosmile.user nonsmiling_cropped\%A eig.face 50000 >> verify_results_final.txt
In the final results, seven images were not verified, but the other 19 were correctly verified.

To test the cutoff value of 50000, I tried to identify images with incorrect matches. To do this, I tried to match the Nth smiling image with the N+1st non_smiling image. This is essentially random, as there is no reason that the Nth person should have any relation to the N+1st person. To do this, I wrote a script. The results show that only three images were incorrectly verified. That is, there were only three false positives. This is pretty good, in my view.

The small number of false positives, compared to the seven false negatives, suggests raising the threshold. I tried the same experiments with a tolerance of 60000, and for further experiments I changed the script so replace the 50000 with %1, the first command-line parameter, to ease the use of the script, which now looked like this.

This time, the results showed that there were 6 false negatives and 4 false positives. While this may be more balanced, it's not clear that the total number of errors has improved. With 70000, there were 5 false negatives and 5 false positives. It seems like this is probably for the best, since clearly we aren't decreasing the total number of errors, and now the two types of errors are balanced evenly.

In summary, the best cutoff seems to be 70000, though 50000 and 60000 also work. For 70000, the positive experiment results are here and the negative results are here. This value seems to balance the two types of errors, and presumably the error rate is about 5/24 = 21%. This is somewhat decent, but it's a far cry from perfection. I determined the number 70000 by increasing until I had a reasonably small number of false positives, and then increasing until there were the same number of false positives and false negatives. It's really pretty fuzzy to determine what the "best" possibility is, since we're trying to minimize two different values. So this really is pretty subjective. Anyhow, the false positive rate and the false negative rate were both 20.83%. IMHO, it's not worth the effort to get a precise value of the "best" MSE cut-off, since this is so subjective that such a precise value would be mostly meaningless. Besides, it looks like 70000 works well enough, so we'll leave it at that.

For kicks, here's the marked up version of group\IMG_0002.tga, which took several hours to run with my program:

I used scales ranging from 1.0 to 1.5, stepping by 0.01 (which is why it took so long).

It looks like I found four people, which is the same as the sample solution apparently. Unlike the sample solution, I didn't find any on the ceiling or the desks, but mine tended to be around people's faces. So I think that mine is superior to whatever generated the sample one that was at the very end of the project description.