Automatic Classification of Crystallography Images

by
Shen Pan

Protein crystallography labs have been performing an increasing number of experiments of crystal growth. Better automation has enabled researchers to prepare and run more experiments in a shorter time. However, the problem of identifying which experiments are successful remains difficult. In fact, most of this work is still being done manually by humans who have to examine thousands of images for each experiment. Automating this task is therefore an important goal. As part of the ACAPELLA project, we have been developing a new image classification subsystem to greatly reduce the number of images that require human viewing. This system must have extremely low rates of false negatives (missed crystals), possibly at the cost of an acceptable number of false positives. Our system consists of three stages. During the first stage, the system preprocesses the images. In the second stage, for each image, the system finds the solution drop area in the image, divides the area into 20.20 pixel blocks, and computes a set of numeric features for each block. Finally, during the third stage, the system utilizes a support vector machine (SVM) for classification of the blocks. The input to our SVM classifier, which is computed for each block, is a set of numeric features such as the mean and variance of intensity values, textures, and perceptual groupings that capture highlevel structures like parallel and perpendicular lines. If an image contains one block that is classified as containing a crystal, then the system classifies the image as being positive. We have achieved significant results, reducing false positives by roughly 60%, while maintaining the false negative rate under 10%. We expect that these results can be significantly improved by applying multiple classifiers and using the combined output. In addition, we are investigating ways to significantly speed up the processing time.

Advised by Richard Ladner, Eve Riskin, Linda Shapiro

CSE 403
Wednesday
April 13, 2005
3:30 - 4:20 pm