|
![]() |
![]() |
![]() |
![]() |
Course Info |
CSE 590C is a weekly seminar on Readings and Research in
Computational Biology, open to all graduate students in computational,
biological, and mathematical sciences.
| |||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Schedule |
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Papers, etc. |
Links to full papers below are often to journals that require a
paid subscription. The UW Library is generally a paid
subscriber, and you can freely access these articles if you do
so from an on-campus computer. For off-campus access,
follow the "[offcampus]" links below or
look at the
library "proxy server" instructions.
You will be prompted for your UW net ID and password once per
session.
|
Abstract: Smith-Waterman (SW) is a dynamic programming algorithm that produces the best alignment of two DNA sequences. Current implementations of SW have data dependencies which make them difficult to parallelize on data parallel hardware like multicore, GPUs, and clusters. As a consequence, algorithms like BLAST use heuristics to perform fast but approximate alignments and can therefore miss the best alignment. In this talk, we will describe a data parallel SW such that it can take advantage of almost any form of data parallel hardware available today (i.e., multicore, GPUs, FPGAs or clusters of each). Our algorithm splits a target DNA sequence into multiple subsequences and aligns a query against each of these subsequences in parallel, while producing the same result as the sequential SW. Using this approach, we have obtained near-linear speedup on a 12 core machine. By appropriately utilizing data parallel hardware, we think our approach can be as fast as approximate algorithms like BLAST while at the same time not sacrificing alignment accuracy. |
04/15: Identifying dysregulated pathways in cancer -- Hamid Bolouri, FHCRC
Abstract:
In the first part of this talk, I will briefly present results from a collaboration with the laboratories of Soheil
Meshinchi (FHCRC) and Bob Arceci (JHU) in which we are combining whole genome sequencing with clinical records,
genome-wide promoter-methylation, and mRNA expression to identify the pathways and mechanisms underlying pediatric
Acute Myeloid Leukemia.
The second part of the talk will be a discussion of present obstacles to network/pathway analysis of cancers: (1) lack of sufficient pathway knowledge; (2) high degree of overlap among pathways; (3) disparities between pathway DBs; (4) ambiguities in public datasets such as ENCODE. |
04/22: Multi cancer analysis of a leukemia stem cell signature -- Ben Logsdon, UW CSE
04/29: Entropic Graph-based Posterior Regularization for Learning Probabilistic Models -- Max Libbrecht, UW CSE
Abstract: Large graphical models often use factorization assumptions to enable tractable exact or approximate inference. We define a new class of entropic graph-based regularizers that combine probabilistic inference with iterative graph-based methods. These regularizers can represent arbitrary patterns of interaction between variables in a probabilistic model while maintaining tractable inference. We present a method for performing inference on this joint model and for learning its parameters using an algorithm akin to a generalized version of the EM algorithm. We are motivated by applications in computational biology in which generative time-series models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. We use our approach to enable interaction over great distances in the genome. In doing so, we integrate evidence across cell types for semi-automated genome annotation, an important problem which has previously been addressed only crudely. |
05/06: Compression and Assembly of Next Gen Sequence Data -- Daniel Jones, UW CSE
Abstract: We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. |
05/13: From the Ramayana to Reverend Bayes: host defenses and zoonotic transmission of simian foamy virus -- Erick Matsen, FHCRC
Abstract: Simian Foamy Virus (SFV) is a DNA retrovirus that is enzoonotic among nonhuman primates (NHP). It can be transmitted to humans through bites, however, it does not appear to replicate in humans in the same way it does in NHP. In this talk I will report the results of trying to understand that difference between hosts using sequence data generated from a five-year project sampling both macaques and humans in Bangladesh. Along the way I will describe a new Bayesian method we developed to detect the activity of the APOBEC hypermutation host defense; this plays a key part in our interpretation of the data. This work is a collaboration with the labs of the virologist Maxine Linial (FHCRC) and the primatologist Lisa Jones-Engel (UW). |
05/20: Transparent and Collaborative Research within The Cancer Genome Atlas -- Larsson Omberg, Sage Bionetworks
05/27: -- Holiday
06/03: Exploring seven oceanic strains of the cosmopolitian diatom T. pseudonana -- Tony Chiang, UW Oceanography
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology
![]() |
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX |