CSE 590 C, Sp '06: Reading & Research in Comp. Bio.

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Course Info CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.

When/Where: Mondays, 3:30 - 4:50, EEB 026

Organizers: Joe Felsenstein, Bill Noble, Larry Ruzzo, Martin Tompa

Credit: 1-3 Variable

Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.

Email

cse590cb@cs.washington.edu Course-related announcements and discussions
Manage Your Subscription List Archives

compbio-seminars@cs.washington.edu Biology seminar announcements from all around campus
Manage Your Subscription List Archives

compbio-group@cs.washington.edu Discussions about computational biology
Manage Your Subscription List Archives

Schedule

Date Presenters/Participants Topic Papers

03/27 ---- Organizational Meeting ----
04/03 Amol Prakash, CSE Estimating significance of whole genome multiple alignments Abstract
04/10 Adrienne Wang, CSE Using ncRNA as a Test of Whole-Genome Multiple Alignments
04/17 Amir Ben-Dor, Agilent Array CGH data analysis - theory and practice Abstract
04/24 Jian Qiu, GS tba
05/01 Bob Thurman, GS Discovery of higher-order functional features in the human genome Abstract
05/08 Jason Mcdermott, Microbiology Computational exploration of biological organization with the Bioverse Slides
05/15 Charles Mader, Microbiology Protein Structure Prediction: an alternative model Abstract
05/22 Zizhen Yao, CSE CMfinder: A Covariance Model Based RNA Motif Finding Algorithm Abstract
05/29 Holiday

Papers, etc.
Note on Electronic Access to Journals
Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password once per session.

03/27: ---- Organizational Meeting ----
04/03: Estimating significance of whole genome multiple alignments -- Amol Prakash, CSE
    Abstract:   Whole genome alignments are being widely used by biologists for multiple purposes related to comparative genomics. Before doing any such analysis on a particular portion of the alignment, it is critical to have confidence that that portion is correctly aligned. Unfortunately, no method to estimate the significance of these whole genome alignments has been suggested to date. In this work we provide a methodology to do so and assess significance of the 8-vertebrate MultiZ alignment present on the UCSC Genome Browser. We report approximately 0.75% (1.57Mbp) of the human chromosome 1 alignment as having high p-value. This number increases to 14% if we consider only the alignments containing zebrafish. The results for chromosome 1 are available as a UCSC browser track at http://bio.cs.washington.edu/SigMA-w/. This is also the first tool that can compute the significance of every portion of an alignment, and not just the entire alignment.

04/17: Array CGH data analysis - theory and practice -- Amir Ben-Dor, Agilent
    Abstract:   Cancer typically arises as a result of an acquired genomic instability and subsequent clonal evolution of neoplastic cells. Consequently, cancer cells contain multiple regions of copy number gains and losses throughout their genomes. The patterns of copy number aberrations present in a cancer genome consist of selected and non-selected lesions and vary within and across different tissues of origin. For example loss of CDKN2A (9p21) is frequent in melanomas and lung carcinomas, SMAD4 (18q21) deletions are present in colon cancers, and HER2/NEU (17q12) amplification is often seen in breast carcinomas. Recent technology developments introduced an oligonucleotide array platform for array based comparative genomic hybridization (aCGH) analyses. This platform provides increased resolution in determining the boundaries of measured genome alterations.
In the talk I will review the biological background of cancer genome instabilities, and the measurement technologies, in particular array CGH, and discuss in details data analysis goals, tasks and solutions.. I will briefly describe the data analysis workflow of a multi-sample array cgh cancer study, and provide details on the more complicated steps of the analysis - aberration calling, data centering, detecting common aberrations and, if time permits, joint analysis of cgh and gene expression data. To demonstrate the workflow, I'll share some examples from on-going collaboration with John Weinstein group (NCI), analyzing array CGH data for the NCI-60 drug discovery panel of cell lines. The NCI-60 panel includes multiple highly annotated, well characterized samples from nine different tissue types and thus represents a valuable resource for studying the patterns of genomic lesions that may be present in human cancers.

05/01: Discovery of higher-order functional features in the human genome -- Bob Thurman, GS
    Abstract:   It has long been hypothesized that the human and other large genomes are organized into higher-order (i.e., greater than gene-sized) functional domains. Recent technological advances have enabled the rapid emergence of large-scale biological data sets comprising specific functional variables (e.g., transcription, histone modifications, etc.) sampled in a nearly continuous fashion across the genome. A major outstanding question is to what degree such data reveal coherent higher-order features that may in turn illuminate the underlying functional architecture of the genome. To address this, we developed novel approaches based on wavelet analysis for discovery of \`\`domain-level'' behavior in fine scale functional genomic data, and for correlating apparently disparate functional data types collected at different resolutions and scales. Wavelets represent a powerful mathematical framework for decomposing a given genomic data type into increasingly coarse scales, allowing broader and broader trends in the data to reveal themselves. We apply this approach to a variety of continuously sampled data types from the NHGRI ENCODE project to visualize distinct higher order features of the human genomic landscape. We then applied Hidden Markov Models (HMMs) to the wavelet decomposition to provide segmentations of the ENCODE regions into discrete functional states or domains. We also correlate multiple continuous data types at multiple scales to uncover important similarities and differences, a major feature of which is that such relationships (e.g., the correspondence between transcription, histone modification patterns, and fine-scale evolutionary conservation) are often highly localized in nature, disappearing and reappearing again from region to region and locus to locus. The results highlight an analytical framework which may be applied broadly to other complex genomes.

05/08: Computational exploration of biological organization with the Bioverse -- Jason Mcdermott, Microbiology
    Slides:
   http://compbio.washington.edu/local/people/mcdermottj/presentations/May082006/Presentation.ppt
   http://compbio.washington.edu/local/people/mcdermottj/presentations/May082006/Presentation.htm

05/15: Protein Structure Prediction: an alternative model -- Charles Mader, Microbiology
    Abstract:   I will present an introduction to the protein structure prediction problem in computational biology. The Poisson-Boltzmann equation is commonly used to predict electrostaict interactions in protein structure prediction. I present acritique of the Poisson-Boltzmann approach to protein electrostatics. Based on this critique I derive the RB equation. The RB equation provides a way to parameterize an energy function such that the native conformation is the minimum engery. I show how to use the p-space elipsoid to determine the resolution of this model, and describe how the volume of the p-space eplipsoid can be used to evaluate second order corrections to the model.
Charles provides Additional Information:

A copy of the presentation slides is available at http://compbio.washington.edu/local/people/cmader/presentations/May052006/Billharz_Method_beamer.pdf
An _early_ draft of an paper about the method is in http://compbio.washington.edu/local/people/cmader/presentations/May052006/Billharz_Method_article.pdf
Two good URL's for people wishing to know more about proteins and protein structure are: The Protein Data Bank's Molecule of the Month http://www.rcsb.org/pdb/static.do?p=education_discussion/molecule_of_the_month/alphabetical_list.html [offcampus] and Loren William's Structure Teaching Tool at http://web.chemistry.gatech.edu/~williams/bCourse_Information/6521/syllabus_98.html#schedule
For those wishing to learn more about the experimental methods of protein structure determination Loren WIlliams provides a good introduction at: http://web.chemistry.gatech.edu/~williams/xtal/index.html A more detailed graduate course level discussion is at: http://www.doe-mbi.ucla.edu/~sawaya/m230d/ I also really enjoyed Gale Rhodes' Crystallography Made Crystal Clear. Unfortunately this book is not in the UW libraries. I got a copy via interlibrary loan.

05/22: CMfinder: A Covariance Model Based RNA Motif Finding Algorithm -- Zizhen Yao, CSE
    Abstract:   The recent discoveries of large numbers of non-coding RNAs creates a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal. CMfinder is a new tool for RNA motif prediction. It is an expectation maximization algorithm using covariance models for motif description, carefully crafted heuristics for effective motif search, and a novel Bayesian framework for structure prediction combining folding energy and sequence covariation. When testing on known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy --- 79% compared with at most 60% for alternative methods. In this talk, I will discuss the algorithmic issues in CMfinder, and a systematic framework for discovering ncRNAs at genomic scale. In a continuing collaboration with biologists, we have identified several dozens of promising candidates in different bacterial clades, with one experimentally validated novel riboswitch, and a few others under close investigation.

CMfinder: A Covariance Model Based RNA Motif Finding Algorithm, Z. Yao, Z. Weinberg and W.L. Ruzzo, Bioinformatics, 2006, 22(4): 445-452. [offcampus]

Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Applied Math Department Mathematical Biology Journal Club
Biostatistics Seminars
Microbiology Department Seminars
Zoology 525, Mathematical Biology Seminar Series

Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
S-Star Bioinformatics Online Course Schedule, a collection of video primers
A very comprehensive FAQ at bioinformatics.org, including annotated references to online tutorials and lectures.
CSE 527: Computational Biology
CSE 590TV: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse590c-webmaster@cs.washington.edu]