image University of Washington Computer Science & Engineering
 CSE 527, Au '05: Computational Biology
  CSE Home   About Us    Search    Contact Info 

 Info Sheet
Lecture Slides
 Overview (1-up4-up)
 Microarray Overview (1-up4-up)
 Microarray Case Study (1-up4-up)
 Clustering: Basics (1-up4-up)
 Clustering: Model-based (1-up4-up)
 MLE and EM (1-up4-up)
 Entropy, EM & WMM (1-up4-up)
 Motifs (1-up4-up)
 Gibbs Sampler (1-up4-up)
 Gibbs w/ Gaps (1-up4-up)
 Parsimony (1-up4-up)
 Phylogenetic Footprinting
 Markov Models (1-up4-up)
 Gene Finding (1-up4-up)
 RNA Secondary Structure (1-up4-up)
 Covariance Models, Rfam (1-up4-up)
 CMfinder (1-up4-up)
Lecture Notes
 0. (All of last year's notes)
 1. Overview
 3. Microarray Case Study
 4. Clustering: Basics
 7. EM; Weight Matrices
 9. Gibbs Sampler
 11. Phylogenetic Footprinting
 12. Hidden Markov Models, I
 13. Hidden Markov Models, II
 14. Hidden Markov Models, III
 15. Gene Finding, I
 16. Gene Finding, II
 18. Covariance Models
 19. Covariance Models & Rfam
 20. CM Speedup via Rigorous Filtering
 21. CMfinder
 HW #1
Notes on Readings
 527 Wiki
Project Information
 Project Description
 Project Presentations/Reports

Time: MW 12:00-1:20
Place: EE1 031 (schematic)
Instructor: Larry Ruzzo, ruzzo@cs, M 1:30-2:20 &
by arrangement
CSE 554, 543-6298
TA: Jonathan Carlson, jcarlson@cs,

An introduction to the use of computational methods for the understanding of biological systems at the molecular level. Intended for graduate students in biological sciences interested in learning about algorithms and computational methods, and for graduate students in computer science, mathematics or statistics interested in applications of those fields to molecular biology.

Mail archive of all mail sent to cse527@cs. Read it regularly or subscribe.


  1. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I. "The transcriptional program of sporulation in budding yeast." Science. 1998 Oct 23;282(5389):699-705.
  2. Raychaudhuri S, Stuart JM, Altman RB. "Principal components analysis to summarize microarray experiments: application to sporulation time series." Pac Symp Biocomput. 2000:455-66.
  3. Bibliography on Microarray Data Analysis
  4. Yeung, Haynor, Ruzzo, Validating Clustering for Gene Expression Data. Bioinformatics, 2001 v 17 #4: 309-318.
  5. Yeung and Ruzzo, Principal component analysis for clustering gene expression data. Bioinformatics,17 (9) 763-774 (2001).
  6. A. Ben-Dor, R. Shamir, Z. Yakhini, "Clustering Gene Expression Patterns", Journal of Computational Biology, v 6 # 3/4 (1999) pp 281-297.
  7. Yeung, Fraley, Murua, Raftery, and Ruzzo: Model-Based Clustering and Data Transformations for Gene Expression Data. Bioinformatics, 17 (10) 977-987 (2001) and The Third Georgia Tech-Emory International Conference on Bioinformatics, Atlanta, GA, Nov. 2001.   Preprint
  8. Ziv Bar-Joseph, Erik D. Demaine, David K. Gifford, Angele M. Hamel, Tommy S. Jaakkola and Nathan Srebro. "K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data." Bioinformatics, Vol. 19, No. 9, 2003.
  9. Z. Bar-Joseph, D. Gifford, and T. Jaakkola. "Fast optimal leaf ordering for hierarchical clustering." Bioinformatics (Proceedings of ISMB 2001),, 17(S1), 2001, pp 22-29B
  10. Gary D. Stormo, "DNA binding sites: representation and discovery", Bioinformatics Vol. 16 no. 1 2000 Pages 16-23
  11. Timothy L. Bailey and Charles Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.    [See also for many related papers.]
  12. Charles E. Lawrence; Stephen F. Altschul; Mark S. Boguski; Jun S. Liu; Andrew F. Neuwald; John C. Wootton, "Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment", Science, New Series, Vol. 262, No. 5131. (Oct. 8, 1993), pp. 208-214.
  13. Roth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol. 16, (1998) 939-945.
  14. Emily Rocke and Martin Tompa An Algorithm for Finding Novel Gapped Motifs in DNA Sequences RECOMB98: Proceedings of the Second Annual International Conference on Computational Molecular Biology, New York, NY, March 1998, 228-233.
  15. Mathieu Blanchette, Benno Schwikowski and Martin Tompa Algorithms for Phylogenetic Footprinting Journal of Computational Biology, vol. 9, no. 2, 2002, 211-223.
  16. Mathieu Blanchette and Martin Tompa FootPrinter: a Program Designed for Phylogenetic Footprinting Nucleic Acids Research, vol. 31, no. 13, July 2003, 3840-3842.
  17. M. Tompa, N. Li, T. L. Bailey , G. M. Church , B. De Moor, E. Eskin, A. V. Favorov, M. C. Frith, Y. Fu, W. J. Kent, V. J. Makeev, A. A. Mironov, W. S. Noble, G. Pavesi, G. Pesole, M. Regnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu, Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology, vol. 23, no. 1, January 2005, 137 - 144.
  18. Durbin, Richard and Eddy, Sean R. and Krogh, Anders and Mitchison, Graeme, "Biological Sequence Analysis: Probabilistic models of proteins and nucliec acids, Cambridge,1998.
  19. JM Claverie (1997) "Computational methods for the identification of genes in vertebrate genomic sequences", Human Molecular Genetics, 6(10)(review issue): 1735-1744.
  20. M Burset, R Guigo (1996), "Evaluation of gene structure prediction programs", Genomics, 34(3): 353-367.
  21. C Burge, S Karlin (1997), "Prediction of complete gene structures in human genomic DNA", Journal of Molecular Biology , 268: 78-94.
  22. Lyngso RB, Zuker M, Pedersen CN. Fast evaluation of internal loops in RNA secondary structure prediction. Bioinformatics. 1999 Jun;15(6):440-5.
  23. J. McCaskill. The equilibrium partition function and base pair bindings probabilities for RNA secondary structure. Biopolymers, 29:1105-1119, 1990.
  24. Paul P Gardner and Robert Giegerich, A comprehensive comparison of comparative RNA structure prediction approaches, BMC Bioinformatics 2004, 5:140, doi:10.1186/1471-2105-5-140
  25. Patterson, Yasuhara, and Ruzzo: Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction. Pacific Symposium on Biocomputing, Kauai, Hawaii, Jan., 2002, pp. 223-234. Preprint
  26. Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002 May;3(5):370-9.
  27. Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994 Jun 11;22(11):2079-88.
  28. Eddy S.R. (2002) A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics, 3, 18.
  29. Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR, Rfam: an RNA family database. Nucleic Acids Res. 2003 Jan 1;31(1):439-41.
  30. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D121-4.
  31. Weinberg, Z. and Ruzzo, W.L. Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy. Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004) , pp 243-251, March 2004, San Diego, CA. Preprint.
  32. Weinberg, Z. and Ruzzo, W.L. Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy. Bioinformatics, 20 (suppl_1) i334-i341, 2004 and 12th International Conference on Intelligent Systems for Molecular Biology (ISMB 2004) , July 2004, Glasgow, Scottland. Preprint.
  33. Weinberg and Ruzzo: Sequence-based heuristics for faster annotation of non-coding RNA families. To appear, Bioinformatics. Advance access version (2 Nov 2005): Abstract, PDF.

  34. Yao, Weinberg and Ruzzo. CMfinder: A Covariance Model Based RNA Motif Finding Algorithm. Bioinformatics, ePub 12/15/2005.

  Note on Electronic Access to Journals:  Links to papers aboveare often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, follow the "[offcampus]" links or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password once per session.  

Portions of the CSE 527 Web may be reprinted or adapted for academic nonprofit purposes, providing the source is accurately quoted and duly credited. The CSE 527 Web: © 1993-2005, Department of Computer Science and Engineering, University of Washington.

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse527-webmaster at]