CSE 527, Au '05: Course Project, Due: Early in Finals Week (12/12-12/16)

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Projects can be done individually, or in small groups. Large groups are also okay, but often require additional effort to organize and divide the work. Groups combining people from different fields are particularly encouraged. Feel free to use the class email list cse527.washington.edu to brainstorm project ideas, try to round up partners, etc. Choices for the project include, but are not limited to:

Literature review: Read 4-5 papers on a coherent topic, and report on them.
Implementation: Read 2-3 background papers, implement the algorithms (or find existing software that implements the algorithms), find some test data, and report on the results.

Each group should send me a paragraph to tell me who's in the group, describe their topic, the initial papers, and the implementations and test data (if applicable). Please try to do this by 11/7.

Early in finals week, hand in a paper (approximately 5-10 pages) describing the project, and give a 20-30 minute presentation.

Students consistently impress me with creative, cogent project ideas, so by all means fell free to come up with your own ideas. Here are a few of mine to get you started:

Look at last year's project list.
One large and important area of work in microarray analysis that I did not touch on in lecture is microarray normalization and/or evaluation of normalizations. Artifacts often appear in microarray data. How can we best get rid of them? [I have a specific scheme in mind that I'd be interested to see compared to published ones, if this area interests you, but doubtless you can think of others. I can also point you to a paper or two.]
Recently, it has become feasible to do gene expression measurements of one or a few hundred genes via quantitative PCR---not as large-scale as microarrays, but generally considered more accurate. It's increasingly common to see microarray papers include a "validation" component using this technology to confirm the array results. However, normalization is again an important (and I think under-appreciated) issue. I have another paper in mind, and an idea for an extension that would be both timely and interesting to try. (Some knowledge of statistics would be helpful for this one.)
Another microarray issue is sensitive, reliable detection of genes that are differentially expressed between two or more conditions, say tumor vs normal. Lots of things have been tried: fold-change, plain old t-test, ANOVA, non-parametric tests like Wilcoxon, etc. Read about some of them, maybe compare them on some published data. I think this would be especially interesting if you or one of your partners knows enough biology to make some informed guesses as to whether there's a good "story" behind some of the genes turned up in a given data set.
There is also interesting work on learning not just individual genes, but pathways or regulatory networks from microarray or other high-throughput data. I could suggest a few papers in this area if you need some starting points.
All clustering algorithms get increasingly impractical as the data set size grows. How might you cope with this?
Compare Gibbs and MEME: literature review and/or test on some data
Gibbs greedy vs. sampling. Is greedy better or worse?
Evaluating significance of scores from, say, matches to an HMM can be very slow, involving running it on thousands of random sequences. I think we can do better...
Try your favorite algorithm on your favorite organism
Sequence data from a few regions is now available from a dozen+ vertebrates. What can you learn, especially about conserved non-coding regions?
Many others possible, too ...

A Nobel prize is great; so is a result of the form ``I tried x to solve y, but it didn't work'' (but give some thought to why it failed and maybe suggest a next step to try).

Again, please email me a paragraph by 11/7 saying who's in your group & outlining what you want to do.

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse527-webmaster@cs.washington.edu]