image University of Washington Computer Science & Engineering
 CSE 527, Au '04: Computational Biology
  CSE Home   About Us    Search    Contact Info 

 Info Sheet
Lecture Slides
 Overview (4-up)
 Microarray Overview (4-up)
 Microarray Case Study (4-up)
 Clustering: Basics (4-up)
 Clustering: Model-based (4-up)
 MLE and EM (4-up)
   MLE Example (Mathematica code)
   MLE Example (pdf of above plots)
 Entropy, EM & WMM (4-up)
 Motifs (4-up)
 Gibbs Sampler (4-up)
 Gibbs w/ Gaps (4-up)
 Parsimony (4-up)
 Phylogenetic Footprinting
 Markov Models (4-up)
 Gene Finding (4-up)
 Secondary Structure & Splicing (4-up)
 RNA Secondary Structure (4-up)
 Covariance Models, Rfam & Filtering (4-up)
Lecture Notes
 1. Overview
 2. Microarray Overview
 3. Microarray Case Study
 4. Clustering: Basics
 5. Clustering: Model-Based
 6. MLE and EM
 7. EM; Weight Matrices
 8. ('03 #12) MEME
 9. Gibbs Sampler
 10. ('03 #15) Gibbs w/ Gaps
 11. ('03 #16) Phylogenetic Footprinting
 12. ('03 #17) Hidden Markov Models, I
 13. ('03 #18) Hidden Markov Models, II
 14. ('03 #19) Pfam; Gene Finding
 15. Gene Finding; Splicing
 16. Splicing
 17. RNA Secondary Structure
 18. Covariance Models
 19. Covariance Models & Rfam
 HW #1
 HW #2
 HW #3
 HW #4
Notes on Readings
 HW #1: Primers
 HW #2: Microarrays
 HW #3: Microarray Analysis
Project Information
 Project Description
 Project Presentations/Reports

Time: MW 12:00-1:20
Place: EE1 026
Instructor: Larry Ruzzo, ruzzo, CSE 554, 543-6298
TA: Kasia Wilamowska, kasiaw,

An introduction to the use of computational methods for the understanding of biological systems at the molecular level. Intended for graduate students in biological sciences interested in learning about algorithms and computational methods, and for graduate students in computer science, mathematics or statistics interested in applications of those fields to molecular biology.

Mail archive of all mail sent to cse527. Read it regularly or subscribe.


  1. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I. "The transcriptional program of sporulation in budding yeast." Science. 1998 Oct 23;282(5389):699-705.
  2. Raychaudhuri S, Stuart JM, Altman RB. "Principal components analysis to summarize microarray experiments: application to sporulation time series." Pac Symp Biocomput. 2000:455-66.
  3. Bibliography on Microarray Data Analysis
  4. Yeung, Haynor, Ruzzo, Validating Clustering for Gene Expression Data. Bioinformatics, 2001 v 17 #4: 309-318.
  5. Yeung and Ruzzo, Principal component analysis for clustering gene expression data. Bioinformatics,17 (9) 763-774 (2001).
  6. A. Ben-Dor, R. Shamir, Z. Yakhini, "Clustering Gene Expression Patterns", Journal of Computational Biology, v 6 # 3/4 (1999) pp 281-297.
  7. Yeung, Fraley, Murua, Raftery, and Ruzzo: Model-Based Clustering and Data Transformations for Gene Expression Data. Bioinformatics, 17 (10) 977-987 (2001) and The Third Georgia Tech-Emory International Conference on Bioinformatics, Atlanta, GA, Nov. 2001.   Preprint
  8. Ziv Bar-Joseph, Erik D. Demaine, David K. Gifford, Ang{e`}le M. Hamel, Tommy S. Jaakkola and Nathan Srebro. "K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data." Bioinformatics, Vol. 19, No. 9, 2003.
  9. Z. Bar-Joseph, D. Gifford, and T. Jaakkola. "Fast optimal leaf ordering for hierarchical clustering." Bioinformatics (Proceedings of ISMB 2001),, 17(S1), 2001, pp 22-29B
  10. Timothy L. Bailey and Charles Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.    [See also for many related papers.]
  11. Charles E. Lawrence; Stephen F. Altschul; Mark S. Boguski; Jun S. Liu; Andrew F. Neuwald; John C. Wootton, "Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment", Science, New Series, Vol. 262, No. 5131. (Oct. 8, 1993), pp. 208-214.
  12. Roth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol. 16, (1998) 939-945.
  13. Emily Rocke and Martin Tompa An Algorithm for Finding Novel Gapped Motifs in DNA Sequences RECOMB98: Proceedings of the Second Annual International Conference on Computational Molecular Biology, New York, NY, March 1998, 228-233.
  14. Mathieu Blanchette, Benno Schwikowski and Martin Tompa Algorithms for Phylogenetic Footprinting Journal of Computational Biology, vol. 9, no. 2, 2002, 211-223.
  15. Mathieu Blanchette and Martin Tompa FootPrinter: a Program Designed for Phylogenetic Footprinting Nucleic Acids Research, vol. 31, no. 13, July 2003, 3840-3842.
  16. Durbin, Richard and Eddy, Sean R. and Krogh, Anders and Mitchison, Graeme, "Biological Sequence Analysis: Probabilistic models of proteins and nucliec acids, Cambridge,1998.
  17. JM Claverie (1997) "Computational methods for the identification of genes in vertebrate genomic sequences", Human Molecular Genetics, 6(10)(review issue): 1735-1744.
  18. M Burset, R Guigo (1996), "Evaluation of gene structure prediction programs", Genomics, 34(3): 353-367.
  19. C Burge, S Karlin (1997), "Prediction of complete gene structures in human genomic DNA", Journal of Molecular Biology , 268: 78-94.
  20. Lyngso RB, Zuker M, Pedersen CN. Fast evaluation of internal loops in RNA secondary structure prediction. Bioinformatics. 1999 Jun;15(6):440-5.
  21. J. McCaskill. The equilibrium partition function and base pair bindings probabilities for RNA secondary structure. Biopolymers, 29:1105-1119, 1990.
  22. Paul P Gardner and Robert Giegerich, A comprehensive comparison of comparative RNA structure prediction approaches, BMC Bioinformatics 2004, 5:140, doi:10.1186/1471-2105-5-140
  23. Patterson, Yasuhara, and Ruzzo: Pre-mRNA Secondary Structure Prediction Aids Splice Site Prediction. Pacific Symposium on Biocomputing, Kauai, Hawaii, Jan., 2002, pp. 223-234. Preprint
  24. Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002 May;3(5):370-9.
  25. Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994 Jun 11;22(11):2079-88.
  26. Weinberg, Z. and Ruzzo, W.L. Faster Genome Annotation of Non-coding RNA Families Without Loss of Accuracy. Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 2004) , pp 243-251, March 2004, San Diego, CA. Preprint.
  27. Weinberg, Z. and Ruzzo, W.L. Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy. Bioinformatics, 20 (suppl_1) i334-i341, 2004 and 12th International Conference on Intelligent Systems for Molecular Biology (ISMB 2004) , July 2004, Glasgow, Scottland. Preprint.

Portions of the CSE 527 Web may be reprinted or adapted for academic nonprofit purposes, providing the source is accurately quoted and duly credited. The CSE 527 Web: © 1993-2004, Department of Computer Science and Engineering, University of Washington.

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse527-webmaster]