Course Info |
|
CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all
graduate students in computational, biological, and mathematical sciences.
|
Papers, etc. |
Note on Electronic Access to Journals
The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these
articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the
library "proxy server" instructions. You will be
prompted for your UW net ID and password.
03/26: -- ---- Organizational Meeting ----
04/02: Deep Learning of millions of random Alternative Polyadenylation variants -- Johannes
Abstract:
Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we
use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on
isoform expression data from over three million APA reporters, built by inserting random sequence into twelve
distinct 3' UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked
with inferring APA in human 3' UTRs, APARENT outperforms a model trained exclusively on endogenous data. Visualizing
features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA
regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these
features into a comprehensive, interpretable cis-regulatory code.
For background reading, Johannes recommends:
04/09: Inferring developmental trajectories and causal regulations with single-cell genomics -- Xiaojie Qiu
04/16: Multi-scale Deep Tensor Factorization Learns a Latent Representation of the Human Epigenome -- Jacob
04/23: -- No Meeting
04/30: Hypoxemia + DeepProfile -- Alex + Ayse
- Alex: "Predicting Hypoxemia During Surgery"
-
Ayse: "DeepProfile: Deep learning of patient molecular profiles for precision medicine in acute myeloid leukemia"
Abstract: Motivation: Learning robust prediction models based on molecular profiles (e.g., expression data)
and phenotype data (e.g., drug response) is a crucial step toward the development of precision medicine. Extracting
a meaningful low-dimensional feature representation from patient's molecular profile is the key to success in
overcoming the high-dimensionality problems. Deep learning-based unsupervised feature learning has enormously
improved image classification by enabling us to use large amounts of "unlabeled" images informative of the
prediction task.
Approach: We present the DeepProfile framework that attempts to extract latent variables from publicly
available expression data using the variational autoencoders (VAEs) and use these latent variables as features for
phenotype prediction. To our knowledge, DeepProfile is the first attempt to use deep learning to learn a feature
representation from a large number of unlabeled (i.e, without phenotype) expression samples that are not
incorporated to the prediction problem. We apply DeepProfile to predicting response to hundreds of cancer drugs
based on gene expression data. Most patients with advanced cancer continue to receive drugs that are
ineffective. This is exemplified by acute myeloid leukemia (AML), a disease for which treatments and cure rates (in
the range of 25%) have remained stagnant. Effectively deploying an ever-expanding array of cancer drugs holds great
promise to improve prognoses but requires methods to predict how drugs will affect specific patients.
Result: We train the VAE model that represents a specific mapping from input variables (here, gene
expression levels) into a much smaller number of latent variables, on the basis of gene expression data from AML
patients available through the Gene Expression Omnibus (GEO). Our results show that the lower dimensional
representation (i.e., latent variables) generated by using VAEs significantly outperform the original input feature
representation (i.e., gene expression levels) in the drug response prediction problem.
Conclusion: We demonstrate the effectiveness of VAEs in extracting a low-dimensional feature representation
from publicly available unlabeled gene expression data. We show that the learned features are relevant to drug
response prediction, which indicates that the latent variables capture important processes relevant to the
prediction problem.
Paper: https://www.biorxiv.org/content/early/2018/03/08/278739/
05/07: Two Short Talks on Single-Cell RNA-seq -- Erin + Yue
- Erin: "A First-Year's Tour Through Single-Cell RNA-seq Data"
-
Yue: "UNCURL-App: A framework for interactive analysis of single-cell RNA-Seq data"
Abstract: Analysis of single-cell RNASeq (scRNA-Seq) datasets is currently a complex and time-consuming
process, often requiring heuristics and guesswork from the user in order to obtain biologically meaningful
results. Here we introduce UNCURL-App, a comprehensive online tool for analyzing scRNA-Seq data, which allows for
the integration of prior knowledge into all stages of the analysis pipelines including clustering, visualization,
and differential expression. This tool provides an interactive interface to our UNCURL software for data
preprocessing and clustering, thereby allowing users to use UNCURL without programming. This step identifies cell
types and creates a low-dimensional representation for visualization. In addition, our tool allows users to assess
the importance of the identified clusters. This is done by finding the differentially expressed genes in each cell
type, and integrating external knowledge bases into the data analysis process to determine the biological relevance
of the identified genes. Finally, UNCURL-App allows users to interact with the analysis pipeline by iteratively
splitting or merging cell types.
Paper: https://www.biorxiv.org/content/early/2018/03/01/142398/
05/14: Building probabilistic models of RNA-seq experiments using approximate likelihood -- Daniel
05/21: Biocellion: high-performance software for modeling,
simulation and visualization of many-cell systems -- Dr. Simon Kahan, Biocellion/Dr. Ilya Shmulevich, ISB
Abstract:
For decades, 3d models have been reducing cost, accelerating progress and improving results in the
automotive, aerospace, and architecture and petroleum industries. Despite the continued failure of in vitro and
animal testing to reliably demonstrate efficacy and establish safety of drug and consumer care products, the life
science industries are only just beginning to embrace whole-system 3d modeling and simulation as an alternative.
Why? Because modeling complex living systems is hard; simulating these models at sufficient scale and duration
demands purpose-built high-performance software; and interactive visualization of the highly dynamic simulation
results poses new challenges for graphics engines.
We present Biocellion and Biovision software solutions. Biocellion is a platform that supports development of living
system models at cell-resolution, integrating biological, chemical and mechanical rules of interaction. Biocellion
simulates these models as they grow to tens of billions of cells. Biovision provides interactive exploration of the
simulation results over time.
We illustrate results from the application of Biocellion at P&G to skin growth and response to toxic materials. We
also show images from Pacific Northwest National Laboratory comparing simulations of intestinal response to a low-
versus high-fiber diet.
Though only recently developed, our models are able already to recapitulate many aspects of tissue growth,
homeostasis and response to some interventions. Using Biocellion, they can be incrementally extended and improved to
become increasingly predictive under an ever broadening spectrum of interventions.
05/28: -- Holiday
|