image University of Washington Computer Science & Engineering
  CSE 590CSp '13:  Reading & Research in Comp. Bio.
  CSE Home   About Us    Search    Contact Info 

 Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.
When/Where:  Mondays, 3:30 - 4:50, MGH 254 (schematic)
Organizers:  Joe Felsenstein, Bill Noble, Larry Ruzzo, Martin Tompa
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Course-related announcements and discussions
  Manage Your Subscription List Archives Biology seminar announcements from all around campus
  Manage Your Subscription List Archives Computational biology discussions, conference/job postings, etc.,
  Manage Your Subscription List Archives
 Date  Presenters/Participants Topic Details
04/01---- Organizational Meeting ----
04/08Madan Musuvathi & Todd Mytkowicz, Microsoft ResearchAccurate DNA Sequence Alignment on Data Parallel HardwareDetails
04/15Hamid Bolouri, FHCRCIdentifying dysregulated pathways in cancerDetails
04/22Ben Logsdon, UW CSEMulti cancer analysis of a leukemia stem cell signature 
04/29Max Libbrecht, UW CSEEntropic Graph-based Posterior Regularization for Learning Probabilistic ModelsDetails
05/06Daniel Jones, UW CSECompression and Assembly of Next Gen Sequence DataDetails
05/13Erick Matsen, FHCRCFrom the Ramayana to Reverend Bayes: host defenses and zoonotic transmission of simian foamy virusDetails
05/20Larsson Omberg, Sage BionetworksTransparent and Collaborative Research within The Cancer Genome Atlas 
06/03Tony Chiang, UW OceanographyExploring seven oceanic strains of the cosmopolitian diatom T. pseudonana 

 Papers, etc.

  Note on Electronic Access to Journals

Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password once per session.  

04/01: ---- Organizational Meeting ----

04/08: Accurate DNA Sequence Alignment on Data Parallel Hardware -- Madan Musuvathi & Todd Mytkowicz, Microsoft Research

    Abstract:   Smith-Waterman (SW) is a dynamic programming algorithm that produces the best alignment of two DNA sequences. Current implementations of SW have data dependencies which make them difficult to parallelize on data parallel hardware like multicore, GPUs, and clusters. As a consequence, algorithms like BLAST use heuristics to perform fast but approximate alignments and can therefore miss the best alignment. In this talk, we will describe a data parallel SW such that it can take advantage of almost any form of data parallel hardware available today (i.e., multicore, GPUs, FPGAs or clusters of each). Our algorithm splits a target DNA sequence into multiple subsequences and aligns a query against each of these subsequences in parallel, while producing the same result as the sequential SW. Using this approach, we have obtained near-linear speedup on a 12 core machine. By appropriately utilizing data parallel hardware, we think our approach can be as fast as approximate algorithms like BLAST while at the same time not sacrificing alignment accuracy.

04/15: Identifying dysregulated pathways in cancer -- Hamid Bolouri, FHCRC

    Abstract:   In the first part of this talk, I will briefly present results from a collaboration with the laboratories of Soheil Meshinchi (FHCRC) and Bob Arceci (JHU) in which we are combining whole genome sequencing with clinical records, genome-wide promoter-methylation, and mRNA expression to identify the pathways and mechanisms underlying pediatric Acute Myeloid Leukemia.

The second part of the talk will be a discussion of present obstacles to network/pathway analysis of cancers: (1) lack of sufficient pathway knowledge; (2) high degree of overlap among pathways; (3) disparities between pathway DBs; (4) ambiguities in public datasets such as ENCODE.

04/22: Multi cancer analysis of a leukemia stem cell signature -- Ben Logsdon, UW CSE

04/29: Entropic Graph-based Posterior Regularization for Learning Probabilistic Models -- Max Libbrecht, UW CSE

    Abstract:   Large graphical models often use factorization assumptions to enable tractable exact or approximate inference. We define a new class of entropic graph-based regularizers that combine probabilistic inference with iterative graph-based methods. These regularizers can represent arbitrary patterns of interaction between variables in a probabilistic model while maintaining tractable inference. We present a method for performing inference on this joint model and for learning its parameters using an algorithm akin to a generalized version of the EM algorithm. We are motivated by applications in computational biology in which generative time-series models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. We use our approach to enable interaction over great distances in the genome. In doing so, we integrate evidence across cell types for semi-automated genome annotation, an important problem which has previously been addressed only crudely.

05/06: Compression and Assembly of Next Gen Sequence Data -- Daniel Jones, UW CSE

    Abstract:   We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from

05/13: From the Ramayana to Reverend Bayes: host defenses and zoonotic transmission of simian foamy virus -- Erick Matsen, FHCRC

    Abstract:   Simian Foamy Virus (SFV) is a DNA retrovirus that is enzoonotic among nonhuman primates (NHP). It can be transmitted to humans through bites, however, it does not appear to replicate in humans in the same way it does in NHP. In this talk I will report the results of trying to understand that difference between hosts using sequence data generated from a five-year project sampling both macaques and humans in Bangladesh. Along the way I will describe a new Bayesian method we developed to detect the activity of the APOBEC hypermutation host defense; this plays a key part in our interpretation of the data. This work is a collaboration with the labs of the virologist Maxine Linial (FHCRC) and the primatologist Lisa Jones-Engel (UW).

05/20: Transparent and Collaborative Research within The Cancer Genome Atlas -- Larsson Omberg, Sage Bionetworks

05/27:   -- Holiday

06/03: Exploring seven oceanic strains of the cosmopolitian diatom T. pseudonana -- Tony Chiang, UW Oceanography

 Other  Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Applied Math Department Mathematical Biology Journal Club
Biostatistics Seminars
Microbiology Department Seminars

 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
A very comprehensive FAQ at, including annotated references to online tutorials and lectures.
CSE 527: Computational Biology
CSE 590TV/CSEP 590A: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX