CSE 590C, Sp '13: Reading & Research in Comp. Bio.

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Course Info CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.

When/Where: Mondays, 3:30 - 4:50, MGH 254 (schematic)

Organizers: Joe Felsenstein, Bill Noble, Larry Ruzzo, Martin Tompa

Credit: 1-3 Variable

Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.

Email

cse590cb@cs.washington.edu Course-related announcements and discussions
Manage Your Subscription List Archives

compbio-seminars@cs.washington.edu Biology seminar announcements from all around campus
Manage Your Subscription List Archives

compbio-group@cs.washington.edu Computational biology discussions, conference/job postings, etc.,
Manage Your Subscription List Archives

Schedule

Date Presenters/Participants Topic Details

04/01 ---- Organizational Meeting ----

04/08 Madan Musuvathi & Todd Mytkowicz, Microsoft Research Accurate DNA Sequence Alignment on Data Parallel Hardware Details

04/15 Hamid Bolouri, FHCRC Identifying dysregulated pathways in cancer Details

04/22 Ben Logsdon, UW CSE Multi cancer analysis of a leukemia stem cell signature

04/29 Max Libbrecht, UW CSE Entropic Graph-based Posterior Regularization for Learning Probabilistic Models Details

05/06 Daniel Jones, UW CSE Compression and Assembly of Next Gen Sequence Data Details

05/13 Erick Matsen, FHCRC From the Ramayana to Reverend Bayes: host defenses and zoonotic transmission of simian foamy virus Details

05/20 Larsson Omberg, Sage Bionetworks Transparent and Collaborative Research within The Cancer Genome Atlas

05/27 Holiday

06/03 Tony Chiang, UW Oceanography Exploring seven oceanic strains of the cosmopolitian diatom T. pseudonana

Papers, etc.
Note on Electronic Access to Journals
Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password once per session.

04/01: ---- Organizational Meeting ----

04/08: Accurate DNA Sequence Alignment on Data Parallel Hardware -- Madan Musuvathi & Todd Mytkowicz, Microsoft Research

    Abstract:   Smith-Waterman (SW) is a dynamic programming algorithm that produces the best alignment of two DNA sequences. Current implementations of SW have data dependencies which make them difficult to parallelize on data parallel hardware like multicore, GPUs, and clusters. As a consequence, algorithms like BLAST use heuristics to perform fast but approximate alignments and can therefore miss the best alignment. In this talk, we will describe a data parallel SW such that it can take advantage of almost any form of data parallel hardware available today (i.e., multicore, GPUs, FPGAs or clusters of each). Our algorithm splits a target DNA sequence into multiple subsequences and aligns a query against each of these subsequences in parallel, while producing the same result as the sequential SW. Using this approach, we have obtained near-linear speedup on a 12 core machine. By appropriately utilizing data parallel hardware, we think our approach can be as fast as approximate algorithms like BLAST while at the same time not sacrificing alignment accuracy.

04/15: Identifying dysregulated pathways in cancer -- Hamid Bolouri, FHCRC

    Abstract:   In the first part of this talk, I will briefly present results from a collaboration with the laboratories of Soheil Meshinchi (FHCRC) and Bob Arceci (JHU) in which we are combining whole genome sequencing with clinical records, genome-wide promoter-methylation, and mRNA expression to identify the pathways and mechanisms underlying pediatric Acute Myeloid Leukemia.
The second part of the talk will be a discussion of present obstacles to network/pathway analysis of cancers: (1) lack of sufficient pathway knowledge; (2) high degree of overlap among pathways; (3) disparities between pathway DBs; (4) ambiguities in public datasets such as ENCODE.

04/22: Multi cancer analysis of a leukemia stem cell signature -- Ben Logsdon, UW CSE

04/29: Entropic Graph-based Posterior Regularization for Learning Probabilistic Models -- Max Libbrecht, UW CSE

    Abstract:   Large graphical models often use factorization assumptions to enable tractable exact or approximate inference. We define a new class of entropic graph-based regularizers that combine probabilistic inference with iterative graph-based methods. These regularizers can represent arbitrary patterns of interaction between variables in a probabilistic model while maintaining tractable inference. We present a method for performing inference on this joint model and for learning its parameters using an algorithm akin to a generalized version of the EM algorithm. We are motivated by applications in computational biology in which generative time-series models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. We use our approach to enable interaction over great distances in the genome. In doing so, we integrate evidence across cell types for semi-automated genome annotation, an important problem which has previously been addressed only crudely.

05/06: Compression and Assembly of Next Gen Sequence Data -- Daniel Jones, UW CSE

DC Jones, WL Ruzzo, X Peng, MG Katze, "Compression of next-generation sequencing reads aided by highly efficient de novo assembly." Nucleic Acids Res., 40, #22 (2012) e171. [offcampus]

    Abstract:   We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip.

05/13: From the Ramayana to Reverend Bayes: host defenses and zoonotic transmission of simian foamy virus -- Erick Matsen, FHCRC

    Abstract:   Simian Foamy Virus (SFV) is a DNA retrovirus that is enzoonotic among nonhuman primates (NHP). It can be transmitted to humans through bites, however, it does not appear to replicate in humans in the same way it does in NHP. In this talk I will report the results of trying to understand that difference between hosts using sequence data generated from a five-year project sampling both macaques and humans in Bangladesh. Along the way I will describe a new Bayesian method we developed to detect the activity of the APOBEC hypermutation host defense; this plays a key part in our interpretation of the data. This work is a collaboration with the labs of the virologist Maxine Linial (FHCRC) and the primatologist Lisa Jones-Engel (UW).

05/20: Transparent and Collaborative Research within The Cancer Genome Atlas -- Larsson Omberg, Sage Bionetworks

05/27:   -- Holiday

06/03: Exploring seven oceanic strains of the cosmopolitian diatom T. pseudonana -- Tony Chiang, UW Oceanography

Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Applied Math Department Mathematical Biology Journal Club
Biostatistics Seminars
Microbiology Department Seminars

Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
A very comprehensive FAQ at bioinformatics.org, including annotated references to online tutorials and lectures.
CSE 527: Computational Biology
CSE 590TV/CSEP 590A: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX