image University of Washington Computer Science & Engineering
  CSE 527Au '09:  Course Project, Due: Finals Week (12/14-12/18)
  CSE Home   About Us    Search    Contact Info 

Projects can be done individually, or in small groups. Large groups are also okay, provided you have a thoughtful plan to organize and divide the work. Groups combining people from different fields are particularly encouraged. Feel free to use the class email list cse527a_au09@u.washington.edu to brainstorm project ideas, try to round up partners, etc. Choices for the project include, but are not limited to:

I'd like each individual/group to send me a paragraph or drop by to tell me who's in the group, describe your topic, the initial papers, and the implementations and test data (if applicable). Maybe I can give you some pointers. Please try to do this by early next week...

Deliverables: During finals week, hand in a paper (approximately 5-10 pages) describing your project, and give a 20-30 minute presentation. (These will be open to everyone in the class.) See schedule page for details. I would like to get electronic versions of any code you write, together with associated specialized input files, make files, sample outputs, etc. (Please do NOT include vast swaths of genomic sequence, but do tell me what you used, where you got it.) Electronic copies of your report and presentation are also welcome, but I would also like your report on paper, if possible. Use the Catalyst dropbox for electronic turnin.

Students consistently impress me with creative, cogent project ideas, so by all means fell free to come up with your own ideas. Here are a few of mine to get you started: If none of these excite you, here's a more concrete option, that I think will be instructive---implement HMM training/inference to predict CpG islands in the human genome. My slides and Durbin, Eddy text, chapter 3, provide background on this problem. I'd suggest you use data from human chromosone 21, downloaded from the UCSC genome browser as your training and test data. Use the "Feb 2009 Human Assembly (hg19)." The CpG island track should be visible by default; if not, select it in the "Regulation" section at the bottom of a typical browser page. You can look at stuff in the browser, but in general data is downloaded through the "table browser" interface. Get the chromosome 21 sequence, and the chromosome 21 'CpG' track. (I recommend the ".bed" file format for the latter, but suit yourself.) Ideally, you can discover the difference between CpG islands and non-islands from the sequence alone, but it is also OK to use "labeled" training data, i.e., exploit the CpG track for training. In either case, use just the first half of Ch 21 for training your HMM, and the other half for testing. Any combination of Viterbi/Baum-Welch training with Viterbi/posterior decoding is fine. Compare different combinations if you're feeling ambitious. As I said in class, I haven't tried this, so it may fail spectacularly. If so, give thought to why, and to how it might be resurrected. If you want some simple data to test you HMM implementations, dice.txt contains the loaded dice example from Durbin Chapter 3.

Whatever your project, a Nobel prize is great; so is a result of the form ``I tried x to solve y, but it didn't work,'' or ''I didn't have time to finish, but here's how far I got.'' (In the latter cases, give some thought to why it failed and/or outline the next steps to try.)


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX