|
![]() |
![]() |
![]() |
![]() |
Course Info | CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all
graduate students in computational, biological, and mathematical sciences.
| |||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Theme | Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our Spring schedule is: | |||||||||||||||||||||||||||||||||||||||||||||||||
Schedule |
| |||||||||||||||||||||||||||||||||||||||||||||||||
Papers, etc. |
The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these
articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the
library "proxy server" instructions. You will be
prompted for your UW net ID and password.
|
Abstract: Semi-automated genome annotation algorithms such as Segway and ChromHMM are widely used to model diverse genomics data sets. These algorithms take as input a collection of genomics data sets and simultaneously partition the genome and label each segment with an integer such that positions with the same label have similar patterns of activity. These algorithms are ``semi-automated'' because a human performs a functional interpretation of the labels after the annotation process. Previous attempts to annotate multiple cell types using these methods primarily trained a single model to apply to all cell types, but this approach requires that all cell types have exactly the same data sets available and is sensitive to artifactual differences between genomics experiments. Training an independent model for each cell type avoids these limitations, but was previously impractical because doing so would require performing manual interpretation separately for each cell type. We propose a method for automating the annotation interpretation step by using a machine learning classifier trained on previous human interpretations. The use of this classifier allows the annotation process to proceed from raw data to final output in a fully automated way. We applied Segway with automated interpretation to all available data sets for all 166 human cell types with sufficient data, the most comprehensive genome annotation to date. We compiled these annotations together to produce a unified encyclopedia of all function-associated elements in the human genome, using evolutionary conservation to identify function-associated types of activity. The resulting encyclopedia annotates each functional element that is active in at least one cell types with its type and its pattern of activity across these cell types. We found that the activity marked by this encyclopedia explains most noncoding evolutionary conservation and identifies functional variants marked by GWAS tag SNPs. This unified encyclopedia therefore enables easy and intuitive interpretation of the effect of sequence variants on phenotype, such for investigation of disease, evolutionary conservation or positive selection. |
04/11: Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer -- Safiye Celik, CSE
Abstract:
We present the INSPIRE (INferring Shared modules from multiPle gene expREssion datasets) method to infer
highly coherent and robust modules of co-expressed genes and the dependencies among the modules from multiple
expression datasets. INSPIRE increases the power to detect robust and relevant patterns (modules and dependencies
among modules by enabling the use of multiple datasets that contain different sets of genes due to, e.g., the
difference in microarray platforms.
Our evaluations on synthetically generated datasets and gene expression datasets from multiple ovarian cancer studieâs âshow that the model learned by INSPIRE can explain unseen data better and can reveal prior knowledge on gene functions more accurately than alternative methods. Applying INSPIRE to nine ovarian cancer datasets leads to the identification of a new marker and potential molecular driver of tumor-associated stroma - HOPX. HOPX module strongly overlaps with the genes defining the mesenchymal patient subtype identified in The Cancer Genome Atlas (TCGA) ovarian cancer data. We provide evidence for a previously unknown molecular basis of tumor resectability efficacy involving tumor-associated mesenchymal stem cells represented by HOPX. |
04/18: Imputing Missing Data in the Roadmap Epigenomics Project Using Tensor Decomposition Approaches -- Tim Durham, GS
04/25: Predicting desaturation events in the operating room using models that retain both trust and accuracy -- Scott Lundberg, CSE
Abstract: During a typical operating room procedure there are many different sensors and data points recorded about an individual. Using this data to predict adverse events is a promising application of machine learning in the operation room. Here we predict oxygen desaturation events during anesthesia, and show how extremely complex models can be succinctly explained to a doctor visually. Model accuracy can match or exceed human doctors, and the ability to explain "why" a prediction was made aids in its practical use. |
05/02: A computational framework to learn principal developmental graph and to detect novel drivers for cell fate transition from single-cell measurements -- Xiaojie Qiu, GS
Abstract: Development were long regarded as a hierarchical branching process. Conventional studies utilize population measurements on bulk samples which hamper us to investigate the intricate developmental dynamics. The recent emergence of single-cell RNA-seq makes it possible to track the hierarchical branching process by taking advantage of the collective behavior of each individual cells during cell fate transition. However, how to accurately reconstruct the developmental trajectory from the high-dimension, snap-shot, nosy sc RNA-seq data poises a huge computational challenges. In this talk, I will introduce the manifold learning algorithm, DDRTree, originally developed for inferring cancer progression and a novel feature selection method, fstree, for reconstruct the accurate developmental trajectories. Comparing to other existing algorithms, this algorithm is dramatically more accurate and robust. We also build a statistical framework, BEAM (branch expression analysis modeling), for detecting genes dynamically change along different developmental lineages. The unprecedented high resolution of the reconstructed developmental trajectories not only enables us to determine the driver genes play an important role at the critical time point of cell fate transition but also to directly infer causal gene regulatory networks. |
05/09: Identifying patterns of chromatin remodeling during skeletal muscle myogenesis -- Hannah Pliner, GS
05/16: Inferring information flow in genetic pathways from single cell data -- Sreeram Kannan, EE
05/30: -- Holiday
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology
![]() |
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX |