CSE 527, Au '04: Course Project, Due: Early in Finals Week (12/13-12/17)

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Projects can be done individually, or in small groups. Large groups are also okay, but often require additional effort to organize and divide the work. Groups combining people from different fields are particularly encouraged. Feel free to use the class email list cse527.washington.edu to brainstorm project ideas, try to round up partners, etc. Choices for the project include, but are not limited to:

Literature review: Read 4-5 papers on a coherent topic, and report on them.
Implementation: Read 2-3 background papers, implement the algorithms (or find existing software which implements the algorithms), find some test data, and report on the results.
Each group should send me a paragraph describing their topic, the initial papers, and the implementations and test data (if applicable). Please try to do this by 10/29.
Early in finals week, hand in a paper (approximately 5-10 pages) describing the project, and give a 20-30 minute presentation.
Some ideas:

Look at last year's project list.
One large and important area of work in microarray analysis that I did not touch on in lecture is microarray normalization and/or evaluation of normalizations. Artifacts often appear in microarray data. How can we best get rid of them? [I have a specific scheme in mind that I'd be interested to see compared to published ones, if this area interests you, but doubtless you can think of others.]
Another is sensitive, reliable detection of genes that are differentially expressed between two or more conditions, say tumor vs normal. Lots of things have been tried: fold-change, plain old t-test, ANOVA, non-parametric tests like Wilcoxon, etc. Read about some of them, maybe compare them on some published data. I think this would be especially interesting if you or one of your partners knows enough biology to make some informed guesses as to whether there's a good "story" behind some of the genes turned up in a given data set.
There is also interesting work on learning not just individual genes, but pathways or regulatory networks from microarray or other high-throughput data. I could suggest a few papers in this area if you need some starting points.
All clustering algorithms get increasingly impractical as the data set size grows. How might you cope with this?
Compare Gibbs and MEME: literature review or test on some data
Gibbs greedy vs. sampling. Is greedy better or worse?
Try your favorite algorithm on your favorite organism
Sequence data from a few regions is now available from a dozen+ vertebrates. What can you learn, especially about conserved non-coding regions?
Many others possible, too ...
A Nobel prize is great; so is a result of the form ``I tried x to solve y, but it didn't work'' (but give some thought to why it failed and maybe suggest a next step to try).

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse527-webmaster@cs.washington.edu]


	Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to cse527-webmaster@cs.washington.edu]