Projects can be done individually, or in small groups. Large groups
are also okay, but often require additional effort to organize and
divide the work. Groups combining people from different
fields are particularly encouraged. Feel free to use the class
email list cse527.washington.edu
to brainstorm project ideas, try to round
up partners, etc.
Choices for the project include, but are not limited to:
- Literature review: Read 4-5 papers on a coherent topic, and
report on them.
- Implementation: Read 2-3 background papers, implement the
algorithms (or find existing software which implements the algorithms),
find some test data, and report on the results.
Each group should send me a paragraph describing their topic, the
initial papers, and the implementations and test data (if
applicable). Please try to do this by 10/29.
Early in finals week, hand in a paper (approximately 5-10 pages) describing
the project, and give a 20-30 minute presentation.
Some ideas:
- Look at last year's project list.
- One large and important area of work in
microarray analysis that I did not touch on in lecture is
microarray normalization and/or evaluation of normalizations.
Artifacts often appear in microarray data. How can we best get
rid of them? [I have a specific scheme in mind that I'd be
interested to see compared to published ones, if this area
interests you, but doubtless you can think of others.]
- Another is sensitive, reliable detection of genes that are
differentially expressed between two or more conditions, say
tumor vs normal. Lots of things have been tried: fold-change,
plain old t-test, ANOVA, non-parametric tests like Wilcoxon,
etc. Read about some of them, maybe compare them on some
published data. I think this would be especially interesting if
you or one of your partners knows enough biology to make some
informed guesses as to whether there's a good "story" behind
some of the genes turned up in a given data set.
- There is also interesting work on learning not just
individual genes, but pathways or regulatory networks from
microarray or other high-throughput data. I could suggest a
few papers in this area if you need some starting points.
- All clustering algorithms get increasingly impractical as the
data set size grows. How might you cope with this?
- Compare Gibbs and MEME: literature review or test on some data
- Gibbs greedy vs. sampling. Is greedy better or worse?
- Try your favorite algorithm on your favorite organism
- Sequence data from a few regions is now available from a
dozen+ vertebrates. What can you learn, especially about conserved
non-coding regions?
- Many others possible, too ...
A Nobel prize is great; so is a result of the form ``I tried x to
solve y, but it didn't work'' (but give some thought to why it
failed and maybe suggest a next step to try).