|
CSE 527, Au '05: Course Project, Due: Early in Finals Week (12/12-12/16)
|
|
Projects can be done individually, or in small groups. Large groups
are also okay, but often require additional effort to organize and
divide the work. Groups combining people from different
fields are particularly encouraged. Feel free to use the class
email list cse527.washington.edu
to brainstorm project ideas, try to round
up partners, etc.
Choices for the project include, but are not limited to:
- Literature review: Read 4-5 papers on a coherent topic, and
report on them.
- Implementation: Read 2-3 background papers, implement the
algorithms (or find existing software that implements the algorithms),
find some test data, and report on the results.
Each group should send me a paragraph to tell me who's in the group,
describe their topic, the initial papers, and the implementations
and test data (if applicable). Please try to do this by 11/7.
Early in finals week, hand in a paper (approximately 5-10 pages) describing
the project, and give a 20-30 minute presentation.
Students consistently impress me with creative, cogent project ideas,
so by all means fell free to come up with your own ideas. Here are a
few of mine to get you started:
- Look at last year's project list.
- One large and important area of work in
microarray analysis that I did not touch on in lecture is
microarray normalization and/or evaluation of normalizations.
Artifacts often appear in microarray data. How can we best get
rid of them? [I have a specific scheme in mind that I'd be
interested to see compared to published ones, if this area
interests you, but doubtless you can think of others. I can
also point you to a paper or two.]
- Recently, it has become feasible to do gene expression
measurements of one or a few hundred genes via quantitative
PCR---not as large-scale as microarrays, but generally considered
more accurate. It's increasingly common to see microarray papers
include a "validation" component using this technology to confirm
the array results. However, normalization is again an important
(and I think under-appreciated) issue. I have another paper in
mind, and an idea for an extension that would be both timely and
interesting to try. (Some knowledge of statistics would be
helpful for this one.)
- Another microarray issue is sensitive, reliable detection of
genes that are differentially expressed between two or more
conditions, say tumor vs normal. Lots of things have been tried:
fold-change, plain old t-test, ANOVA, non-parametric tests like
Wilcoxon, etc. Read about some of them, maybe compare them on
some published data. I think this would be especially interesting
if you or one of your partners knows enough biology to make some
informed guesses as to whether there's a good "story" behind some
of the genes turned up in a given data set.
- There is also interesting work on learning not just
individual genes, but pathways or regulatory networks from
microarray or other high-throughput data. I could suggest a
few papers in this area if you need some starting points.
- All clustering algorithms get increasingly impractical as the
data set size grows. How might you cope with this?
- Compare Gibbs and MEME: literature review and/or test on some data
- Gibbs greedy vs. sampling. Is greedy better or worse?
- Evaluating significance of scores from, say, matches to an HMM
can be very slow, involving running it on thousands of random
sequences. I think we can do better...
- Try your favorite algorithm on your favorite organism
- Sequence data from a few regions is now available from a
dozen+ vertebrates. What can you learn, especially about conserved
non-coding regions?
- Many others possible, too ...
A Nobel prize is great; so is a result of the form ``I tried x to
solve y, but it didn't work'' (but give some thought to why it
failed and maybe suggest a next step to try).
Again, please email me a paragraph by 11/7 saying who's in your
group & outlining what you want to do.