image University of Washington Computer Science & Engineering
  CSE 427Au '17:  Approximate Schedule
  CSE Home   About Us    Search    Contact Info 

Schedule details will evolve as we go; check back periodically to see the latest updates.

    Due Lecture Topic Reading
Week 1
9/25-9/29
Th   Introduction & Overview Papers Below
Week 2
10/2-10/6
Tu   Sequence Alignment, Search & Scoring Papers Below
Th
Week 3
10/9-10/13
Tu HW #1 (Due F)
Th
Week 4
10/16-10/20
Tu   Gene Finding Papers Below
Th
Week 5
10/23-10/27
Tu   Markov and Hidden Markov Models Papers Below
Th
Week 6
10/30-11/3
Tu HW #2 (Due F)
Th Sequence Motif Modeling & Discovery Papers Below
Week 7
11/6-11/10
Tu  
Th
Week 8
11/13-11/17
Tu HW #3 (Due F)
Th
Week 9
11/20-11/24
Tu  
Th Holiday
Week 10
11/27-12/1
Tu   RNA and RNAseq  
Th
Week 11
12/4-12/8
Tu HW #4 (Due F)
Th

Textbook: None.

Supplementary Text: The following book is an excellent reference for many of the topics we will cover.

  1. Richard Durbin, Sean R. Eddy, Anders Krogh and Graeme Mitchison, Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids, Cambridge, 1998. (U. Book Store, Amazon) Errata.

References:  As we go forward, I will be updating this list and explicitly designating some of these papers as assigned reading. The rest are optional---good supplementary references, recommended if you want more depth in any of the areas.

Most links below take you to PubMed, the NIH bibliographic database. Usually, but counterintuitively, from a PubMed abstract you click the icon of the publisher (or sometimes the icon saying "UW article online") to get to the actual article.

padlock   Journal access: Some of the journals and articles cited below are completely open access, or are freely available via PubMed Central (look for the "Free in PMC" icon).  Electronic access to other cited articles is generally free from on-campus IP addresses.  For off-campus access, follow the "[offcampus]" links or look at the UW library "proxy server" instructions.  Let me know if none of these work for you. padlock

References -- Introduction & Overview: Required: Read #2 and #3. Stephens, et al. (#2) is a nice recent article exploring the "big data" character of modern genomics. Hunter (#3), while a bit dated, is a good overview of molecular biology, at a level that CSE students should find approachable. Optional: Chapter 1 of #1. If you want more biology, #4 is in some ways a book-length expansion of #3; I haven't read it, but the portions I've sampled are good, and not as (potentially) overwhelming as #7. Former students have recommended Gonick, (also a bit dated, but inexpensive). Schultz (#6) also comes recommended, looks amusing, and is also inexpensive; seems to focus more on genetics. Alberts (#7) is a popular undergrad textbook, very comprehensive and very well written. If you locate other, better background references, please email a link to me and to the class list.

  1. ZD Stephens, SY Lee, F Faghri, RH Campbell, C Zhai, MJ Efron, R Iyer, MC Schatz, S Sinha, GE Robinson, "Big Data: Astronomical or Genomical?" PLoS Biol., 13, #7 (2015) e1002195.
  2. Lawrence Hunter, "Molecular Biology for Computer Scientists," Chapter 1 of Artificial Intelligence and Molecular Biology Lawrence Hunter, ed. AAAI press, 1993. (Also here.)
  3. Lawrence Hunter, The Processes of Life: An Introduction to Molecular Biology, The MIT Press, 2009, ISBN 978-0262013055, 320 pages. (Amazon)
  4. Larry Gonick, Mark Wheelis, "The Cartoon Guide to Genetics" (Updated Edition, 1991) ISBN 0062730991, Collins. (Amazon)
  5. Mark Schultz, "The Stuff of Life, a graphic guide to genetics and DNA," Farrar, Straus and Giroux, 2009 (Amazon)
  6. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter, "Molecular Biology of the Cell", Fifth Edition, 2007, ISBN 978-0815341055, Garland, 1392 pages. (Amazon) The 4th edition of this book is available online (although the user interface leaves something to be desired; in particular, you must use the "search" feature on that page instead of browsing...).

References -- Sequence Alignment, Search & Scoring: Required: Read #11-13. Optional: Chapter 2 of #1. The original BLAST paper is #8; #9 describes important extensions to it. The Myers review is a bit dated, but still a good overview of algorithms and algorithmic issues.

  1. SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman, "Basic local alignment search tool." J. Mol. Biol., 215, #3 (1990) 403-10. [offcampus]
  2. SF Altschul, TL Madden, AA Schäffer, J Zhang, Z Zhang, W Miller, DJ Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res., 25, #17 (1997) 3389-402. [offcampus]
  3. Myers, E. (1991) "An overview of sequence comparison algorithms in molecular biology", Tech. Rep. TR-91-29, Dept. of Computer Science, Univ. of Arizona.
  4. SR Eddy, "What is dynamic programming?" Nat. Biotechnol., 22, #7 (2004) 909-10. [offcampus]
  5. SR Eddy, "Where did the BLOSUM62 alignment score matrix come from?" Nat. Biotechnol., 22, #8 (2004) 1035-6. [offcampus]
  6. A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002. [offcampus]

References -- Gene Finding: Required: Read #14. Optional: #16 is a good survey of computational gene finding. #15 is a careful comparison of leading programs of its day. #17 and #18 are the landmark initial human genome sequence papers. Sharp's Nobel lecture is a nice account of the history leading to the discovery of splicing. #20 is a more recent survey of the mechanism. #21 surveys self-splicing.

  1. C Burge, S Karlin, "Prediction of complete gene structures in human genomic DNA." J. Mol. Biol., 268, #1 (1997) 78-94. [offcampus]
  2. M Burset, R Guigó, "Evaluation of gene structure prediction programs." Genomics, 34, #3 (1996) 353-67. [offcampus]
  3. JM Claverie, "Computational methods for the identification of genes in vertebrate genomic sequences." Hum. Mol. Genet., 6, #10 (1997) 1735-44. [offcampus]
  4. ES Lander, LM Linton, B Birren, C Nusbaum, MC Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, et 247 al., "Initial sequencing and analysis of the human genome." Nature, 409, #6822 (2001) 860-921. [offcampus]
  5. JC Venter, MD Adams, EW Myers, PW Li, RJ Mural, GG Sutton, HO Smith, M Yandell, CA Evans, RA Holt, et 264 al., "The sequence of the human genome." Science, 291, #5507 (2001) 1304-51. [offcampus]
  6. Phillip A. Sharp, Split Genes and RNA Splicing, Nobel Lecture, December 8, 1993, http://nobelprize.org/nobel_prizes/medicine/laureates/1993/sharp-lecture.pdf [offcampus]
  7. JP Staley, C Guthrie, "Mechanical devices of the spliceosome: motors, clocks, springs, and things." Cell, 92, #3 (1998) 315-26. [offcampus]
  8. DR Edgell, VR Chalamcharla, M Belfort, "Learning to live together: mutualism between self-splicing introns and their hosts." BMC Biol., 9, (2011) 22.

References -- Markov and Hidden Markov Models: Required: Read #22 and Chapter 3 of #1. (The later is available HERE). #23 is also a very good tutorial/intro to HMMs from a different perspective (speech analysis). #24 is an interesting application of HMMs relevant to RNA gene finding, a later topic.

  1. SR Eddy, "What is a hidden Markov model?" Nat. Biotechnol., 22, #10 (2004) 1315-6. [offcampus]
  2. LR Rabiner, "A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition," Proceedings of the IEEE, v 77 #2,Feb 1989, 257-286. here.
  3. RJ Klein, Z Misulovin, SR Eddy, "Noncoding RNA genes identified in AT-rich hyperthermophiles." Proc. Natl. Acad. Sci. U.S.A., 99, #11 (2002) 7542-7. [offcampus]

References -- Sequence Motif Modeling & Discovery: Required: Read #28. Dempster et al. is the "classic" paper on EM. Stormo is a nice survey from a slightly mmore biological perspective. Tompa et al. is a comprehensive comparison of several motif finding methods. Blanchette et al. is an important example of use of comparative genomics for this problem.

  1. AP Dempster, NM Laird, DB Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38. Available here. [offcampus]
  2. CB Do, S Batzoglou, "What is the expectation maximization algorithm?" Nat. Biotechnol., 26, #8 (2008) 897-9. [offcampus]
  3. GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23. [offcampus]
  4. TL Bailey, C Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol, 2, (1994) 28-36. [offcampus] Available here: (.ps) (.pdf) See also http://meme.sdsc.edu/meme/ for related papers and resources.
  5. TL Bailey, C Elkan, "The value of prior knowledge in discovering motifs with MEME." Proc Int Conf Intell Syst Mol Biol, 3, (1995) 21-9. [offcampus]
  6. M Tompa, N Li, TL Bailey, GM Church, B De Moor, E Eskin, AV Favorov, MC Frith, Y Fu, WJ Kent, et 15 al., "Assessing computational tools for the discovery of transcription factor binding sites." Nat. Biotechnol., 23, #1 (2005) 137-44. [offcampus]
  7. Emily Rocke and Martin Tompa An Algorithm for Finding Novel Gapped Motifs in DNA Sequences RECOMB98: Proceedings of the Second Annual International Conference on Computational Molecular Biology, New York, NY, March 1998, 228-233.
  8. M Blanchette, B Schwikowski, M Tompa, "Algorithms for phylogenetic footprinting." J. Comput. Biol., 9, #2 (2002) 211-23. [offcampus]
  9. M Blanchette, M Tompa, "FootPrinter: A program designed for phylogenetic footprinting." Nucleic Acids Res., 31, #13 (2003) 3840-2. [offcampus]

References -- RNA and RNAseq: Required: Read #39. Optional: these papers survey recent surprising discoveries about the roles of non-coding RNA, and some recent computational techniques for RNA analysis. The papers on 6S might give you some picture of one nice biological example and how computational approaches are useful in this arena. The last few papers focus on discovery of ncRNAs in vertebrates.

  1. G Storz, "An expanding universe of noncoding RNAs." Science, 296, #5571 (2002) 1260-3. [offcampus]
  2. SR Eddy, "Computational genomics of noncoding RNA genes." Cell, 109, #2 (2002) 137-40. [offcampus]
  3. A Hüttenhofer, P Schattner, N Polacek, "Non-coding RNAs: hope or hype?" Trends Genet., 21, #5 (2005) 289-97. [offcampus]
  4. G Ruvkun, "The perfect storm of tiny RNAs." Nat. Med., 14, #10 (2008) 1041-5. [offcampus]
  5. J Gorodkin, IL Hofacker, E Torarinsson, Z Yao, JH Havgaard, WL Ruzzo, "De novo prediction of structured RNAs from genomic sequences." Trends Biotechnol., 28, #1 (2010) 9-19. [offcampus]
  6. SR Eddy, "How do RNA folding algorithms work?" Nat. Biotechnol., 22, #11 (2004) 1457-8. [offcampus]
  7. JS McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA secondary structure." Biopolymers, 29, #6-7 (1990 May-Jun) 1105-19. [offcampus]
  8. RB Lyngsø, M Zuker, CN Pedersen, "Fast evaluation of internal loops in RNA secondary structure prediction." Bioinformatics, 15, #6 (1999) 440-5. [offcampus]
  9. PP Gardner, R Giegerich, "A comprehensive comparison of comparative RNA structure prediction approaches." BMC Bioinformatics, 5, (2004) 140.
  10. SR Eddy, R Durbin, "RNA sequence analysis using covariance models." Nucleic Acids Res., 22, #11 (1994) 2079-88. [offcampus]
  11. SR Eddy, "A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure." BMC Bioinformatics, 3, (2002) 18.
  12. S Griffiths-Jones, A Bateman, M Marshall, A Khanna, SR Eddy, "Rfam: an RNA family database." Nucleic Acids Res., 31, #1 (2003) 439-41. [offcampus]
  13. S Griffiths-Jones, S Moxon, M Marshall, A Khanna, SR Eddy, A Bateman, "Rfam: annotating non-coding RNAs in complete genomes." Nucleic Acids Res., 33, #Database issue (2005) D121-4. [offcampus]
  14. B Knudsen, J Hein, "RNA secondary structure prediction using stochastic context-free grammars and evolutionary history." Bioinformatics, 15, #6 (1999) 446-54. [offcampus]
  15. B Knudsen, J Hein, "Pfold: RNA secondary structure prediction using stochastic context-free grammars." Nucleic Acids Res., 31, #13 (2003) 3423-8. [offcampus]
  16. Z Weinberg, WL Ruzzo, "Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy." Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004) , pp 243-251, March 2004, San Diego, CA. Preprint.
  17. Z Weinberg, WL Ruzzo, "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy." Bioinformatics, 20 Suppl 1, (2004) i334-41. [offcampus]
  18. Z Weinberg, WL Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families." Bioinformatics, 22, #1 (2006) 35-9. [offcampus]
  19. M Mandal, M Lee, JE Barrick, Z Weinberg, GM Emilsson, WL Ruzzo, RR Breaker, "A glycine-dependent riboswitch that uses cooperative binding to control gene expression." Science, 306, #5694 (2004) 275-9. [offcampus]
  20. JE Barrick, N Sudarsan, Z Weinberg, WL Ruzzo, RR Breaker, "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter." RNA, 11, #5 (2005) 774-84. [offcampus]
  21. AE Trotochaud, KM Wassarman, "A highly conserved 6S RNA structure is required for regulation of transcription." Nat. Struct. Mol. Biol., 12, #4 (2005) 313-9. [offcampus]
  22. DK Willkomm, J Minnerup, A Hüttenhofer, RK Hartmann, "Experimental RNomics in Aquifex aeolicus: identification of small non-coding RNAs and the putative 6S RNA homolog." Nucleic Acids Res., 33, #6 (2005) 1949-60. [offcampus]
  23. T Babak, BJ Blencowe, TR Hughes, "Considerations in the identification of functional RNA structural elements in genomic alignments." BMC Bioinformatics, 8, (2007) 33.
  24. Z Yao, Z Weinberg, WL Ruzzo, "CMfinder--a covariance model based RNA motif finding algorithm." Bioinformatics, 22, #4 (2006) 445-52. [offcampus]
  25. S Washietl, IL Hofacker, PF Stadler, "Fast and reliable prediction of noncoding RNAs." Proc. Natl. Acad. Sci. U.S.A., 102, #7 (2005) 2454-9. [offcampus]
  26. S Washietl, IL Hofacker, M Lukasser, A Hüttenhofer, PF Stadler, "Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome." Nat. Biotechnol., 23, #11 (2005) 1383-90. [offcampus]
  27. JS Pedersen, G Bejerano, A Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of conserved RNA secondary structures in the human genome." PLoS Comput. Biol., 2, #4 (2006) e33.
  28. S Washietl, JS Pedersen, JO Korbel, C Stocsits, AR Gruber, J Hackermüller, J Hertel, M Lindemeyer, K Reiche, A Tanzer, et 14 al., "Structured RNAs in the ENCODE selected regions of the human genome." Genome Res., 17, #6 (2007) 852-64. [offcampus]
  29. E Torarinsson, Z Yao, ED Wiklund, JB Bramsen, C Hansen, J Kjems, N Tommerup, WL Ruzzo, J Gorodkin, "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions." Genome Res., 18, #2 (2008) 242-51. [offcampus]
  30. SE Seemann, AH Mirza, C Hansen, CH Bang-Berthelsen, C Garde, M Christensen-Dalsgaard, E Torarinsson, Z Yao, CT Workman, F Pociot, H Nielsen, N Tommerup, WL Ruzzo, J Gorodkin, "The identification and functional annotation of RNA structures conserved in vertebrates." Genome Res., 27, #8 (2017) 1371-1383. [offcampus]

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX