image University of Washington Computer Science & Engineering
  CSE P527Wi '18:  Approximate Schedule
  CSE Home   About Us    Search    Contact Info 

Schedule details will evolve as we go; check back periodically to see the latest updates.

    Due Lecture Topic Reading
Week 1
1/1-1/5
Th   Introduction & Overview See HW#1.
Week 2
1/8-1/12
Th HW #1 Sequence Alignment, Search & Scoring Durbin Ch 1-2; Papers Below
Week 3
1/15-1/19
Th  
Week 4
1/22-1/26
Th HW #2 Sequence Motif Modeling & Discovery: MLE & the EM Algorithm Durbin Ch 11 (excl Mix. of Dirchlets, Est. Priors in 11.5; skim 11.6); Papers Below
Week 5
1/29-2/2
Th   Sequence Motif Modeling & Discovery: MEME & Gibbs Sampling
Week 6
2/5-2/9
Th HW #3 HMMs & Gene Finding Durbin Ch 3-5; Papers Below
Week 7
2/12-2/16
Th  
Week 8
2/19-2/23
Th HW #4 RNAseq; RNA Structure, Alignment, & Search Durbin Ch 9-10; Papers Below
Week 9
2/26-3/2
Th  
Week 10
3/5-3/9
Th HW #5 Special Topic & Wrapup Papers Below

Textbook:

  1. Richard Durbin, Sean R. Eddy, Anders Krogh and Graeme Mitchison, Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids, Cambridge, 1998.  (Available from U Book Store, Amazon, etc.)  Errata.

References:  As we go forward, I will be updating this list and explicitly designating some of these papers as assigned reading. The rest are optional---good supplementary references, recommended if you want more depth in any of the areas.

Most links below take you to PubMed, the NIH bibliographic database. Usually, but counterintuitively, from a PubMed abstract you click the icon of the publisher (or sometimes the icon saying "UW article online") to get to the actual article.

padlock   Journal access: Some of the journals and articles cited below are completely open access, or are freely available via PubMed Central (look for the "Free in PMC" icon).  Electronic access to other cited articles is generally free from on-campus IP addresses.  For off-campus access, follow the "[offcampus]" links or look at the UW library "proxy server" instructions.  Let me know if none of these work for you. padlock

References -- Introduction & Overview: Read #2; a bit dated, but a good overview. Optional: If you want more biology, #3 is in some ways a book-length expansion of #2; I haven't read it, but the portions I've sampled are good, and not as (potentially) overwhelming as #7. Former students have recommended Gonick, (also a bit dated, but inexpensive). Schultz (#5) also comes recommended, looks amusing, and is also inexpensive; seems to focus more on genetics. Benfey (#6) is new and targeted towards math/cs/engineering types. (I haven't read this one either; I'd be interested to hear your impressions if you look at it.) Alberts (#7) is a popular undergrad textbook, very comprehensive and very well written.

  1. Lawrence Hunter, "Molecular Biology for Computer Scientists," Chapter 1 of Artificial Intelligence and Molecular Biology Lawrence Hunter, ed. AAAI press, 1993. (Also here.)
  2. Lawrence Hunter, The Processes of Life: An Introduction to Molecular Biology, The MIT Press, 2009, ISBN 978-0262013055, 320 pages. (Amazon)
  3. Larry Gonick, Mark Wheelis, "The Cartoon Guide to Genetics" (Updated Edition, 1991) ISBN 0062730991, Collins. (Amazon)
  4. Mark Schultz, "The Stuff of Life, a graphic guide to genetics and DNA," Farrar, Straus and Giroux, 2009 (Amazon)
  5. Philip N. Benfey, Quickstart Molecular Biology: An Introductory Course for Mathematicians, Physicists, and Engineers, Cold Spring Harbor Press, 2014, (CSH Press)
  6. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter, "Molecular Biology of the Cell", Fifth Edition, 2007, ISBN 978-0815341055, Garland, 1392 pages. (Amazon) The 4th edition of this book is available online (although the user interface leaves something to be desired; in particular, you must use the "search" feature on that page instead of browsing...).

References -- Sequence Alignment, Search & Scoring: Read #11-13. Optional: The original BLAST paper is #8; #10 describes important extensions to it. The Myers review is a bit dated, but still a good overview of algorithms and algorithmic issues. #15 gives a much deeper look at the theory underlying permutation p-values.

  1. SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman, "Basic local alignment search tool." J. Mol. Biol., 215, #3 (1990) 403-10. [offcampus] (Available here or here.)
  2. Myers, E. (1991) "An overview of sequence comparison algorithms in molecular biology", Tech. Rep. TR-91-29, Dept. of Computer Science, Univ. of Arizona.
  3. SF Altschul, TL Madden, AA Schäffer, J Zhang, Z Zhang, W Miller, DJ Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res., 25, #17 (1997) 3389-402. [offcampus]
  4. SR Eddy, "What is dynamic programming?" Nat. Biotechnol., 22, #7 (2004) 909-10. [offcampus]
  5. SR Eddy, "Where did the BLOSUM62 alignment score matrix come from?" Nat. Biotechnol., 22, #8 (2004) 1035-6. [offcampus]
  6. A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002. [offcampus]
  7. Lobo, I. (2008) Basic Local Alignment Search Tool (BLAST). Nature Education 1(1):215
  8. B Phipson, GK Smyth, "Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn." Stat Appl Genet Mol Biol, 9, (2010) Article39. [offcampus]

References -- Sequencing and Assembly: Optional: Shendure and Ji is a nice review of "Next Gen" sequencing technologies, as of 2008. The others are some of many articles on computational challenges in using this data for genome assembly, etc., including my student Daniel's nice paper on compression.

  1. J Shendure, H Ji, "Next-generation DNA sequencing." Nat. Biotechnol., 26, #10 (2008) 1135-45. [offcampus]
  2. N Nagarajan, M Pop, "Sequence assembly demystified." Nat. Rev. Genet., 14, #3 (2013) 157-67. [offcampus]
  3. E Góngora-Castillo, CR Buell, "Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence." Nat Prod Rep, 30, #4 (2013) 490-500. [offcampus]
  4. DC Jones, WL Ruzzo, X Peng, MG Katze, "Compression of next-generation sequencing reads aided by highly efficient de novo assembly." Nucleic Acids Res., 40, #22 (2012) e171. [offcampus]
  5. DC Jones, WL Ruzzo, X Peng, MG Katze, "A new approach to bias correction in RNA-Seq." Bioinformatics, 28, #7 (2012) 921-8. [offcampus]

References -- Sequence Motif Modeling & Discovery: Read #22-24, 26, 27. (Eddy is another nice, succinct intro to an important class of methods that we will use, but I didn't directly lecture on it.) Optional: Dempster et al. is the "classic" paper on EM. Tompa et al. is a comprehensive comparison of several motif finding methods. Blanchette et al. is an important example of use of comparative genomics for this problem; see also the text, esp. section 7.4, for relevant background on phylogenetics. #33 is a very readable introduction to the Gaussian model-based clustering method outlined in lecture (and to BIC scores as used in the homework), and #34 gives a biological application of it.

  1. AP Dempster, NM Laird, DB Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38. Available here. [offcampus]
  2. CB Do, S Batzoglou, "What is the expectation maximization algorithm?" Nat. Biotechnol., 26, #8 (2008) 897-9. [offcampus]
  3. GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23. [offcampus]
  4. TL Bailey, C Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol, 2, (1994) 28-36. [offcampus] Available here. See also http://meme.sdsc.edu/meme/ for related papers and resources.
  5. TL Bailey, C Elkan, "The value of prior knowledge in discovering motifs with MEME." Proc Int Conf Intell Syst Mol Biol, 3, (1995) 21-9. [offcampus]
  6. CE Lawrence, SF Altschul, MS Boguski, JS Liu, AF Neuwald, JC Wootton, "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment." Science, 262, #5131 (1993) 208-14. [offcampus] Available here. [offcampus]
  7. SR Eddy, "What is Bayesian statistics?" Nat. Biotechnol., 22, #9 (2004) 1177-8. [offcampus]
  8. FP Roth, JD Hughes, PW Estep, GM Church, "Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation." Nat. Biotechnol., 16, #10 (1998) 939-45. [offcampus]
  9. M Tompa, N Li, TL Bailey, GM Church, B De Moor, E Eskin, AV Favorov, MC Frith, Y Fu, WJ Kent, et 15 al., "Assessing computational tools for the discovery of transcription factor binding sites." Nat. Biotechnol., 23, #1 (2005) 137-44. [offcampus]
  10. Emily Rocke and Martin Tompa An Algorithm for Finding Novel Gapped Motifs in DNA Sequences RECOMB98: Proceedings of the Second Annual International Conference on Computational Molecular Biology, New York, NY, March 1998, 228-233.
  11. M Blanchette, B Schwikowski, M Tompa, "Algorithms for phylogenetic footprinting." J. Comput. Biol., 9, #2 (2002) 211-23. [offcampus]
  12. M Blanchette, M Tompa, "FootPrinter: A program designed for phylogenetic footprinting." Nucleic Acids Res., 31, #13 (2003) 3840-2. [offcampus]
  13. Fraley, C, and A E Raftery. 1998. "How Many Clusters? Which Clustering Method? Answers via Model-Based Cluster Analysis." The Computer Journal 41 (8): 578-588. [offcampus]
  14. KY Yeung, C Fraley, A Murua, AE Raftery, WL Ruzzo, "Model-based clustering and data transformations for gene expression data." Bioinformatics, 17, #10 (2001) 977-87. [offcampus]

References -- HMMs & Gene Finding: Read #35, 37. Burget and Guigo is a careful comparison of leading programs of its day. Optional: The Rabiner tutorial is a very good intro to HMMs if you want a different perspective from the text. Claverie is a good survey of computational gene finding. Lander et al. and Venter are the landmark initial human genome sequence papers. Klein et al. is an interesting application of HMMs relevant to RNA gene finding, our next topic. #44, 45 are review articles about spliceosomal, and self-splicing introns, respectively.

  1. SR Eddy, "What is a hidden Markov model?" Nat. Biotechnol., 22, #10 (2004) 1315-6. [offcampus]
  2. LR Rabiner, "A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition," Proceedings of the IEEE, v 77 #2,Feb 1989, 257-286.
  3. C Burge, S Karlin, "Prediction of complete gene structures in human genomic DNA." J. Mol. Biol., 268, #1 (1997) 78-94. [offcampus]
  4. M Burset, R Guigó, "Evaluation of gene structure prediction programs." Genomics, 34, #3 (1996) 353-67. [offcampus]
  5. JM Claverie, "Computational methods for the identification of genes in vertebrate genomic sequences." Hum. Mol. Genet., 6, #10 (1997) 1735-44. [offcampus]
  6. ES Lander, LM Linton, B Birren, C Nusbaum, MC Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, et 247 al., "Initial sequencing and analysis of the human genome." Nature, 409, #6822 (2001) 860-921. [offcampus]
  7. JC Venter, MD Adams, EW Myers, PW Li, RJ Mural, GG Sutton, HO Smith, M Yandell, CA Evans, RA Holt, et 264 al., "The sequence of the human genome." Science, 291, #5507 (2001) 1304-51. [offcampus]
  8. RJ Klein, Z Misulovin, SR Eddy, "Noncoding RNA genes identified in AT-rich hyperthermophiles." Proc. Natl. Acad. Sci. U.S.A., 99, #11 (2002) 7542-7. [offcampus]
  9. Sharp's Nobel address on discovery of introns.
  10. JP Staley, C Guthrie, "Mechanical devices of the spliceosome: motors, clocks, springs, and things." Cell, 92, #3 (1998) 315-26. [offcampus]
  11. DR Edgell, VR Chalamcharla, M Belfort, "Learning to live together: mutualism between self-splicing introns and their hosts." BMC Biol., 9, (2011) 22.

References -- RNAseq; RNA Structure, Alignment, & Search: Read #51, 56, 58. Optional: Refs 46-49 are good surveys of recent surprising discoveries about the roles of non-coding RNA. Refs 64-66 might give you some picture of one nice biological example and how computational approaches are useful in this arena. Ref #67 and following describe our recent approaches to RNA motif discovery in bacteria and vertebrates.

  1. G Storz, "An expanding universe of noncoding RNAs." Science, 296, #5571 (2002) 1260-3. [offcampus]
  2. SR Eddy, "Computational genomics of noncoding RNA genes." Cell, 109, #2 (2002) 137-40. [offcampus]
  3. A Hüttenhofer, P Schattner, N Polacek, "Non-coding RNAs: hope or hype?" Trends Genet., 21, #5 (2005) 289-97. [offcampus]
  4. G Ruvkun, "The perfect storm of tiny RNAs." Nat. Med., 14, #10 (2008) 1041-5. [offcampus]
  5. J Gorodkin, IL Hofacker, E Torarinsson, Z Yao, JH Havgaard, WL Ruzzo, "De novo prediction of structured RNAs from genomic sequences." Trends Biotechnol., 28, #1 (2010) 9-19. [offcampus]
  6. SR Eddy, "How do RNA folding algorithms work?" Nat. Biotechnol., 22, #11 (2004) 1457-8. [offcampus]
  7. JS McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA secondary structure." Biopolymers, 29, #6-7 (1990 May-Jun) 1105-19. [offcampus]
  8. RB Lyngsø, M Zuker, CN Pedersen, "Fast evaluation of internal loops in RNA secondary structure prediction." Bioinformatics, 15, #6 (1999) 440-5. [offcampus]
  9. PP Gardner, R Giegerich, "A comprehensive comparison of comparative RNA structure prediction approaches." BMC Bioinformatics, 5, (2004) 140.
  10. Y Sakakibara, M Brown, R Hughey, IS Mian, K Sjölander, RC Underwood, D Haussler, "Stochastic context-free grammars for tRNA modeling." Nucleic Acids Res., 22, #23 (1994) 5112-20. [offcampus]
  11. SR Eddy, R Durbin, "RNA sequence analysis using covariance models." Nucleic Acids Res., 22, #11 (1994) 2079-88. [offcampus]
  12. SR Eddy, "A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure." BMC Bioinformatics, 3, (2002) 18.
  13. S Griffiths-Jones, A Bateman, M Marshall, A Khanna, SR Eddy, "Rfam: an RNA family database." Nucleic Acids Res., 31, #1 (2003) 439-41. [offcampus]
  14. S Griffiths-Jones, S Moxon, M Marshall, A Khanna, SR Eddy, A Bateman, "Rfam: annotating non-coding RNAs in complete genomes." Nucleic Acids Res., 33, #Database issue (2005) D121-4. [offcampus]
  15. Z Weinberg, WL Ruzzo, "Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy." Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004) , pp 243-251, March 2004, San Diego, CA. Preprint.
  16. Z Weinberg, WL Ruzzo, "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy." Bioinformatics, 20 Suppl 1, (2004) i334-41. [offcampus]
  17. Z Weinberg, WL Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families." Bioinformatics, 22, #1 (2006) 35-9. [offcampus]
  18. M Mandal, M Lee, JE Barrick, Z Weinberg, GM Emilsson, WL Ruzzo, RR Breaker, "A glycine-dependent riboswitch that uses cooperative binding to control gene expression." Science, 306, #5694 (2004) 275-9. [offcampus]
  19. JE Barrick, N Sudarsan, Z Weinberg, WL Ruzzo, RR Breaker, "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter." RNA, 11, #5 (2005) 774-84. [offcampus]
  20. AE Trotochaud, KM Wassarman, "A highly conserved 6S RNA structure is required for regulation of transcription." Nat. Struct. Mol. Biol., 12, #4 (2005) 313-9. [offcampus]
  21. DK Willkomm, J Minnerup, A Hüttenhofer, RK Hartmann, "Experimental RNomics in Aquifex aeolicus: identification of small non-coding RNAs and the putative 6S RNA homolog." Nucleic Acids Res., 33, #6 (2005) 1949-60. [offcampus]
  22. Z Yao, Z Weinberg, WL Ruzzo, "CMfinder--a covariance model based RNA motif finding algorithm." Bioinformatics, 22, #4 (2006) 445-52. [offcampus]
  23. Z Yao, J Barrick, Z Weinberg, S Neph, R Breaker, M Tompa, WL Ruzzo, "A computational pipeline for high- throughput discovery of cis-regulatory noncoding RNA in prokaryotes." PLoS Comput. Biol., 3, #7 (2007) e126.
  24. Z Weinberg, JE Barrick, Z Yao, A Roth, JN Kim, J Gore, JX Wang, ER Lee, KF Block, N Sudarsan, et 4 al., "Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline." Nucleic Acids Res., 35, #14 (2007) 4809-19. [offcampus]
  25. EE Regulski, RH Moy, Z Weinberg, JE Barrick, Z Yao, WL Ruzzo, RR Breaker, "A widespread riboswitch candidate that controls bacterial genes involved in molybdenum cofactor and tungsten cofactor metabolism." Mol. Microbiol., 68, #4 (2008) 918-32. [offcampus]
  26. Z Weinberg, EE Regulski, MC Hammond, JE Barrick, Z Yao, WL Ruzzo, RR Breaker, "The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches." RNA, 14, #5 (2008) 822-8. [offcampus]
  27. E Torarinsson, Z Yao, ED Wiklund, JB Bramsen, C Hansen, J Kjems, N Tommerup, WL Ruzzo, J Gorodkin, "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions." Genome Res., 18, #2 (2008) 242-51. [offcampus]
  28. SE Seemann, AH Mirza, C Hansen, CH Bang-Berthelsen, C Garde, M Christensen-Dalsgaard, E Torarinsson, Z Yao, CT Workman, F Pociot, et 4 al., "The identification and functional annotation of RNA structures conserved in vertebrates." Genome Res., 27, #8 (2017) 1371-1383. [offcampus]
  29. J Gorodkin, IL Hofacker, E Torarinsson, Z Yao, JH Havgaard, WL Ruzzo, "De novo prediction of structured RNAs from genomic sequences." Trends Biotechnol., 28, #1 (2010) 9-19. [offcampus]

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX