Schedule details will evolve as we go; check back periodically to see the latest updates.
|
|
Due |
Lecture Topic |
Reading |
Week 1 9/25-9/29 |
Th |
|
Introduction & Overview |
Papers Below |
Week 2 10/2-10/6 |
Tu |
|
Sequence Alignment, Search & Scoring |
Papers Below |
Th |
Week 3 10/9-10/13 |
Tu |
HW #1 (Due F) |
Th |
Week 4 10/16-10/20 |
Tu |
|
Gene Finding |
Papers Below |
Th |
Week 5 10/23-10/27 |
Tu |
|
Markov and Hidden Markov Models |
Papers Below |
Th |
Week 6 10/30-11/3 |
Tu |
HW #2 (Due F) |
Th |
Sequence Motif Modeling & Discovery |
Papers Below |
Week 7 11/6-11/10 |
Tu |
|
Th |
Week 8 11/13-11/17 |
Tu |
HW #3 (Due F) |
Th |
Week 9 11/20-11/24 |
Tu |
|
Th |
Holiday |
Week 10 11/27-12/1 |
Tu |
|
RNA and RNAseq |
|
Th |
Week 11 12/4-12/8 |
Tu |
HW #4 (Due F) |
Th |
Textbook: None.
Supplementary Text: The following book is an excellent reference for many of the topics we will cover.
Richard Durbin, Sean R. Eddy, Anders Krogh and Graeme Mitchison,
Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids,
Cambridge, 1998.
(U. Book Store,
Amazon)
Errata.
References:
As we go forward, I will be updating this list and explicitly designating some of these papers as assigned
reading. The rest are optional---good supplementary references, recommended if you want more depth in any of the
areas.
Most links below take you to PubMed, the NIH bibliographic database. Usually, but counterintuitively, from a
PubMed abstract you click the icon of the publisher (or sometimes the icon saying "UW article online") to get to
the actual article.
|
Journal access: Some of the journals and articles cited below are completely open access, or
are freely available via PubMed Central (look for the "Free in PMC" icon). Electronic access to
other cited articles is generally free from on-campus IP addresses. For off-campus access,
follow the "[offcampus]" links or look at the UW library "proxy server"
instructions. Let me know if none of these work for you. |
|
References -- Introduction & Overview:
Required: Read #2 and #3. Stephens, et al. (#2) is a nice recent article exploring
the "big data" character of modern genomics. Hunter (#3), while a bit dated, is a good overview of molecular
biology, at a level that CSE students should find approachable. Optional:
Chapter 1 of #1.
If you want more biology, #4 is
in some ways a book-length expansion of #3; I haven't read it, but the portions I've sampled are good, and not
as (potentially) overwhelming as #7. Former students have recommended Gonick, (also a bit dated, but
inexpensive). Schultz (#6) also comes recommended, looks amusing, and is also inexpensive; seems to focus
more on genetics. Alberts (#7) is a popular undergrad textbook, very comprehensive and very well written. If
you locate other, better background references, please email a link to me and to the class list.
- ZD Stephens, SY Lee, F Faghri, RH Campbell, C Zhai, MJ Efron, R Iyer, MC Schatz, S Sinha, GE Robinson, "Big Data: Astronomical or Genomical?" PLoS Biol., 13, #7 (2015) e1002195.
- Lawrence Hunter, "Molecular Biology for Computer
Scientists," Chapter 1 of Artificial
Intelligence and Molecular Biology Lawrence Hunter,
ed. AAAI press, 1993. (Also here.)
- Lawrence Hunter, The Processes of Life: An
Introduction to Molecular Biology, The MIT Press, 2009, ISBN
978-0262013055, 320 pages.
(Amazon)
- Larry Gonick, Mark Wheelis, "The Cartoon Guide to Genetics"
(Updated Edition, 1991) ISBN 0062730991, Collins. (Amazon)
- Mark Schultz, "The Stuff of Life, a graphic guide to genetics and DNA," Farrar, Straus and Giroux, 2009
(Amazon)
- Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter, "Molecular
Biology of the Cell", Fifth Edition, 2007, ISBN 978-0815341055, Garland, 1392 pages.
(Amazon) The 4th edition
of this book is available
online
(although the user interface leaves something to be desired; in particular, you must use the "search" feature on
that page instead of browsing...).
References -- Sequence Alignment, Search & Scoring:
Required: Read #11-13. Optional:
Chapter 2 of #1.
The original BLAST paper is #8; #9 describes
important extensions to it. The Myers review is a bit dated, but still a good overview of algorithms and algorithmic
issues.
- SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman, "Basic local alignment search tool." J. Mol. Biol., 215, #3 (1990) 403-10.
[offcampus]
- SF Altschul, TL Madden, AA Schäffer, J Zhang, Z Zhang, W Miller, DJ Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res., 25, #17 (1997) 3389-402.
[offcampus]
- Myers, E. (1991) "An overview of sequence comparison algorithms in molecular biology",
Tech. Rep. TR-91-29, Dept. of Computer
Science, Univ. of Arizona.
- SR Eddy, "What is dynamic programming?" Nat. Biotechnol., 22, #7 (2004) 909-10.
[offcampus]
- SR Eddy, "Where did the BLOSUM62 alignment score matrix come from?" Nat. Biotechnol., 22, #8 (2004) 1035-6.
[offcampus]
- A Pertsemlidis, JW Fondon, "Having a BLAST with bioinformatics (and avoiding BLASTphemy)." Genome Biol., 2, #10 (2001) REVIEWS2002.
[offcampus]
References -- Gene Finding:
Required: Read #14. Optional: #16 is a good survey of computational gene finding.
#15 is a careful comparison of leading programs of its day. #17 and #18 are the landmark initial
human genome sequence papers. Sharp's Nobel lecture is a nice account of the history leading to the discovery of
splicing. #20 is a more recent survey of the mechanism. #21 surveys self-splicing.
- C Burge, S Karlin, "Prediction of complete gene structures in human genomic DNA." J. Mol. Biol., 268, #1 (1997) 78-94.
[offcampus]
- M Burset, R Guigó, "Evaluation of gene structure prediction programs." Genomics, 34, #3 (1996) 353-67.
[offcampus]
- JM Claverie, "Computational methods for the identification of genes in vertebrate genomic sequences." Hum. Mol. Genet., 6, #10 (1997) 1735-44.
[offcampus]
- ES Lander, LM Linton, B Birren, C Nusbaum, MC Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, et 247 al., "Initial sequencing and analysis of the human genome." Nature, 409, #6822 (2001) 860-921.
[offcampus]
- JC Venter, MD Adams, EW Myers, PW Li, RJ Mural, GG Sutton, HO Smith, M Yandell, CA Evans, RA Holt, et 264 al., "The sequence of the human genome." Science, 291, #5507 (2001) 1304-51.
[offcampus]
- Phillip A. Sharp, Split Genes and RNA Splicing, Nobel Lecture, December 8, 1993,
http://nobelprize.org/nobel_prizes/medicine/laureates/1993/sharp-lecture.pdf [offcampus]
- JP Staley, C Guthrie, "Mechanical devices of the spliceosome: motors, clocks, springs, and things." Cell, 92, #3 (1998) 315-26.
[offcampus]
- DR Edgell, VR Chalamcharla, M Belfort, "Learning to live together: mutualism between self-splicing introns and their hosts." BMC Biol., 9, (2011) 22.
References -- Markov and Hidden Markov Models:
Required: Read #22 and Chapter 3 of #1. (The later is available HERE). #23 is also a very good tutorial/intro to HMMs from a different
perspective (speech analysis). #24 is an interesting application of HMMs relevant to RNA gene finding, a later
topic.
- SR Eddy, "What is a hidden Markov model?" Nat. Biotechnol., 22, #10 (2004) 1315-6.
[offcampus]
- LR Rabiner, "A Tutorial on Hidden Markov Models and Selected
Application in Speech Recognition," Proceedings of the IEEE, v 77 #2,Feb 1989, 257-286.
here.
- RJ Klein, Z Misulovin, SR Eddy, "Noncoding RNA genes identified in AT-rich hyperthermophiles." Proc. Natl. Acad. Sci. U.S.A., 99, #11 (2002) 7542-7.
[offcampus]
References -- Sequence Motif Modeling & Discovery:
Required: Read #28. Dempster et al. is the "classic" paper on EM. Stormo is a
nice survey from a slightly mmore biological perspective. Tompa et al. is a comprehensive comparison of
several motif finding methods. Blanchette et al. is an important example of use of comparative genomics for
this problem.
- AP Dempster, NM Laird, DB Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal
of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38. Available here. [offcampus]
- CB Do, S Batzoglou, "What is the expectation maximization algorithm?" Nat. Biotechnol., 26, #8 (2008) 897-9.
[offcampus]
- GD Stormo, "DNA binding sites: representation and discovery." Bioinformatics, 16, #1 (2000) 16-23.
[offcampus]
- TL Bailey, C Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers." Proc Int Conf Intell Syst Mol Biol, 2, (1994) 28-36.
[offcampus] Available here:
(.ps)
(.pdf)
See also http://meme.sdsc.edu/meme/ for related papers and resources.
- TL Bailey, C Elkan, "The value of prior knowledge in discovering motifs with MEME." Proc Int Conf Intell Syst Mol Biol, 3, (1995) 21-9.
[offcampus]
- M Tompa, N Li, TL Bailey, GM Church, B De Moor, E Eskin, AV Favorov, MC Frith, Y Fu, WJ Kent, et 15 al., "Assessing computational tools for the discovery of transcription factor binding sites." Nat. Biotechnol., 23, #1 (2005) 137-44.
[offcampus]
- Emily Rocke and Martin Tompa An Algorithm for Finding Novel
Gapped Motifs in DNA Sequences RECOMB98: Proceedings of the Second Annual International Conference on
Computational Molecular Biology, New York, NY, March 1998, 228-233.
- M Blanchette, B Schwikowski, M Tompa, "Algorithms for phylogenetic footprinting." J. Comput. Biol., 9, #2 (2002) 211-23.
[offcampus]
- M Blanchette, M Tompa, "FootPrinter: A program designed for phylogenetic footprinting." Nucleic Acids Res., 31, #13 (2003) 3840-2.
[offcampus]
References -- RNA and RNAseq:
Required: Read #39. Optional: these papers survey recent surprising discoveries about the
roles of non-coding RNA, and some recent computational techniques for RNA analysis. The papers on 6S might give you
some picture of one nice biological example and how computational approaches are useful in this arena. The last few
papers focus on discovery of ncRNAs in vertebrates.
- G Storz, "An expanding universe of noncoding RNAs." Science, 296, #5571 (2002) 1260-3.
[offcampus]
- SR Eddy, "Computational genomics of noncoding RNA genes." Cell, 109, #2 (2002) 137-40.
[offcampus]
- A Hüttenhofer, P Schattner, N Polacek, "Non-coding RNAs: hope or hype?" Trends Genet., 21, #5 (2005) 289-97.
[offcampus]
- G Ruvkun, "The perfect storm of tiny RNAs." Nat. Med., 14, #10 (2008) 1041-5.
[offcampus]
- J Gorodkin, IL Hofacker, E Torarinsson, Z Yao, JH Havgaard, WL Ruzzo, "De novo prediction of structured RNAs from genomic sequences." Trends Biotechnol., 28, #1 (2010) 9-19.
[offcampus]
- SR Eddy, "How do RNA folding algorithms work?" Nat. Biotechnol., 22, #11 (2004) 1457-8.
[offcampus]
- JS McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA secondary structure." Biopolymers, 29, #6-7 (1990 May-Jun) 1105-19.
[offcampus]
- RB Lyngsø, M Zuker, CN Pedersen, "Fast evaluation of internal loops in RNA secondary structure prediction." Bioinformatics, 15, #6 (1999) 440-5.
[offcampus]
- PP Gardner, R Giegerich, "A comprehensive comparison of comparative RNA structure prediction approaches." BMC Bioinformatics, 5, (2004) 140.
- SR Eddy, R Durbin, "RNA sequence analysis using covariance models." Nucleic Acids Res., 22, #11 (1994) 2079-88.
[offcampus]
- SR Eddy, "A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure." BMC Bioinformatics, 3, (2002) 18.
- S Griffiths-Jones, A Bateman, M Marshall, A Khanna, SR Eddy, "Rfam: an RNA family database." Nucleic Acids Res., 31, #1 (2003) 439-41.
[offcampus]
- S Griffiths-Jones, S Moxon, M Marshall, A Khanna, SR Eddy, A Bateman, "Rfam: annotating non-coding RNAs in complete genomes." Nucleic Acids Res., 33, #Database issue (2005) D121-4.
[offcampus]
- B Knudsen, J Hein, "RNA secondary structure prediction using stochastic context-free grammars and evolutionary history." Bioinformatics, 15, #6 (1999) 446-54.
[offcampus]
- B Knudsen, J Hein, "Pfold: RNA secondary structure prediction using stochastic context-free grammars." Nucleic Acids Res., 31, #13 (2003) 3423-8.
[offcampus]
- Z Weinberg, WL Ruzzo, "Faster Genome Annotation of
Non-coding RNA Families Without Loss of Accuracy." Eighth Annual International
Conference on Research in Computational Molecular Biology (RECOMB
2004) , pp 243-251,
March 2004, San Diego, CA. Preprint.
- Z Weinberg, WL Ruzzo, "Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy." Bioinformatics, 20 Suppl 1, (2004) i334-41.
[offcampus]
- Z Weinberg, WL Ruzzo, "Sequence-based heuristics for faster annotation of non-coding RNA families." Bioinformatics, 22, #1 (2006) 35-9.
[offcampus]
- M Mandal, M Lee, JE Barrick, Z Weinberg, GM Emilsson, WL Ruzzo, RR Breaker, "A glycine-dependent riboswitch that uses cooperative binding to control gene expression." Science, 306, #5694 (2004) 275-9.
[offcampus]
- JE Barrick, N Sudarsan, Z Weinberg, WL Ruzzo, RR Breaker, "6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter." RNA, 11, #5 (2005) 774-84.
[offcampus]
- AE Trotochaud, KM Wassarman, "A highly conserved 6S RNA structure is required for regulation of transcription." Nat. Struct. Mol. Biol., 12, #4 (2005) 313-9.
[offcampus]
- DK Willkomm, J Minnerup, A Hüttenhofer, RK Hartmann, "Experimental RNomics in Aquifex aeolicus: identification of small non-coding RNAs and the putative 6S RNA homolog." Nucleic Acids Res., 33, #6 (2005) 1949-60.
[offcampus]
- T Babak, BJ Blencowe, TR Hughes, "Considerations in the identification of functional RNA structural elements in genomic alignments." BMC Bioinformatics, 8, (2007) 33.
- Z Yao, Z Weinberg, WL Ruzzo, "CMfinder--a covariance model based RNA motif finding algorithm." Bioinformatics, 22, #4 (2006) 445-52.
[offcampus]
- S Washietl, IL Hofacker, PF Stadler, "Fast and reliable prediction of noncoding RNAs." Proc. Natl. Acad. Sci. U.S.A., 102, #7 (2005) 2454-9.
[offcampus]
- S Washietl, IL Hofacker, M Lukasser, A Hüttenhofer, PF Stadler, "Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome." Nat. Biotechnol., 23, #11 (2005) 1383-90.
[offcampus]
- JS Pedersen, G Bejerano, A Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of conserved RNA secondary structures in the human genome." PLoS Comput. Biol., 2, #4 (2006) e33.
- S Washietl, JS Pedersen, JO Korbel, C Stocsits, AR Gruber, J Hackermüller, J Hertel, M Lindemeyer, K Reiche, A Tanzer, et 14 al., "Structured RNAs in the ENCODE selected regions of the human genome." Genome Res., 17, #6 (2007) 852-64.
[offcampus]
- E Torarinsson, Z Yao, ED Wiklund, JB Bramsen, C Hansen, J Kjems, N Tommerup, WL Ruzzo, J Gorodkin, "Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions." Genome Res., 18, #2 (2008) 242-51.
[offcampus]
- SE Seemann, AH Mirza, C Hansen, CH Bang-Berthelsen, C Garde, M Christensen-Dalsgaard, E Torarinsson, Z Yao, CT Workman, F Pociot, H Nielsen, N Tommerup, WL Ruzzo, J Gorodkin, "The identification and functional annotation of RNA structures conserved in vertebrates." Genome Res., 27, #8 (2017) 1371-1383.
[offcampus]
|
|
Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
|