Automatic Classification of Crystallography Images
by
Shen Pan
Protein crystallography labs have been performing an increasing number of
experiments of crystal growth. Better automation has enabled researchers
to prepare and run more experiments in a shorter time. However, the
problem of identifying which experiments are successful remains difficult.
In fact, most of this work is still being done manually by humans who have
to examine thousands of images for each experiment. Automating this task
is therefore an important goal. As part of the ACAPELLA project, we have
been developing a new image classification subsystem to greatly reduce the
number of images that require human viewing. This system must have
extremely low rates of false negatives (missed crystals), possibly at the
cost of an acceptable number of false positives. Our system consists of
three stages. During the first stage, the system preprocesses the images.
In the second stage, for each image, the system finds the solution drop
area in the image, divides the area into 20.20 pixel blocks, and computes
a set of numeric features for each block. Finally, during the third stage,
the system utilizes a support vector machine (SVM) for classification of
the blocks. The input to our SVM classifier, which is computed for each
block, is a set of numeric features such as the mean and variance of
intensity values, textures, and perceptual groupings that capture
highlevel structures like parallel and perpendicular lines. If an image
contains one block that is classified as containing a crystal, then the
system classifies the image as being positive. We have achieved
significant results, reducing false positives by roughly 60%, while
maintaining the false negative rate under 10%. We expect that these
results can be significantly improved by applying multiple classifiers and
using the combined output. In addition, we are investigating ways to
significantly speed up the processing time.
Advised by Richard Ladner, Eve Riskin, Linda Shapiro
CSE 403
Wednesday
April 13, 2005
3:30 - 4:20 pm