image University of Washington Computer Science & Engineering
  CSE 590CSp '18:  Reading & Research in Comp. Bio.
  CSE Home   About Us    Search    Contact Info 

 Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.
When/Where: Mondays, 3:30 - 4:50, EE1 026 (room info)
Organizers: Sreeram Kannan, Su-In Lee, Bill Noble, Larry Ruzzo, Yuliang Wang
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Course-related announcements and discussions
  Manage Your Subscription List Archives Biology seminar announcements from all around campus
  Manage Your Subscription List Archives Computational biology discussions, conference/job postings, etc.,
  Manage Your Subscription List Archives
 Theme Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our tentative Spring schedule is:
 Date  Presenters/Participants Topic Details
03/26---- Organizational Meeting ----
04/02JohannesDeep Learning of millions of random Alternative Polyadenylation variantsDetails
04/09Xiaojie QiuInferring developmental trajectories and causal regulations with single-cell genomics 
04/16JacobMulti-scale Deep Tensor Factorization Learns a Latent Representation of the Human Epigenome 
04/23No Meeting
04/30Alex + AyseHypoxemia + DeepProfileDetails
05/07Erin + YueTwo Short Talks on Single-Cell RNA-seqDetails
05/14DanielBuilding probabilistic models of RNA-seq experiments using approximate likelihood 
05/21Dr. Simon Kahan, Biocellion/Dr. Ilya Shmulevich, ISBBiocellion: high-performance software for modeling, simulation and visualization of many-cell systemsDetails
 Papers, etc.

  Note on Electronic Access to Journals

The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password.  

03/26:   -- ---- Organizational Meeting ----

04/02: Deep Learning of millions of random Alternative Polyadenylation variants -- Johannes

Abstract:   Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over three million APA reporters, built by inserting random sequence into twelve distinct 3' UTR contexts. Predictions are highly accurate across both synthetic and genomic contexts; when tasked with inferring APA in human 3' UTRs, APARENT outperforms a model trained exclusively on endogenous data. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of cleavage site selection, and integrates these features into a comprehensive, interpretable cis-regulatory code.

For background reading, Johannes recommends:

04/09: Inferring developmental trajectories and causal regulations with single-cell genomics -- Xiaojie Qiu

04/16: Multi-scale Deep Tensor Factorization Learns a Latent Representation of the Human Epigenome -- Jacob

04/23:   -- No Meeting

04/30: Hypoxemia + DeepProfile -- Alex + Ayse

  • Alex: "Predicting Hypoxemia During Surgery"
  • Ayse: "DeepProfile: Deep learning of patient molecular profiles for precision medicine in acute myeloid leukemia"

    Abstract: Motivation: Learning robust prediction models based on molecular profiles (e.g., expression data) and phenotype data (e.g., drug response) is a crucial step toward the development of precision medicine. Extracting a meaningful low-dimensional feature representation from patient's molecular profile is the key to success in overcoming the high-dimensionality problems. Deep learning-based unsupervised feature learning has enormously improved image classification by enabling us to use large amounts of "unlabeled" images informative of the prediction task.

    Approach: We present the DeepProfile framework that attempts to extract latent variables from publicly available expression data using the variational autoencoders (VAEs) and use these latent variables as features for phenotype prediction. To our knowledge, DeepProfile is the first attempt to use deep learning to learn a feature representation from a large number of unlabeled (i.e, without phenotype) expression samples that are not incorporated to the prediction problem. We apply DeepProfile to predicting response to hundreds of cancer drugs based on gene expression data. Most patients with advanced cancer continue to receive drugs that are ineffective. This is exemplified by acute myeloid leukemia (AML), a disease for which treatments and cure rates (in the range of 25%) have remained stagnant. Effectively deploying an ever-expanding array of cancer drugs holds great promise to improve prognoses but requires methods to predict how drugs will affect specific patients.

    Result: We train the VAE model that represents a specific mapping from input variables (here, gene expression levels) into a much smaller number of latent variables, on the basis of gene expression data from AML patients available through the Gene Expression Omnibus (GEO). Our results show that the lower dimensional representation (i.e., latent variables) generated by using VAEs significantly outperform the original input feature representation (i.e., gene expression levels) in the drug response prediction problem.

    Conclusion: We demonstrate the effectiveness of VAEs in extracting a low-dimensional feature representation from publicly available unlabeled gene expression data. We show that the learned features are relevant to drug response prediction, which indicates that the latent variables capture important processes relevant to the prediction problem.


05/07: Two Short Talks on Single-Cell RNA-seq -- Erin + Yue

  • Erin: "A First-Year's Tour Through Single-Cell RNA-seq Data"
  • Yue: "UNCURL-App: A framework for interactive analysis of single-cell RNA-Seq data"

    Abstract: Analysis of single-cell RNASeq (scRNA-Seq) datasets is currently a complex and time-consuming process, often requiring heuristics and guesswork from the user in order to obtain biologically meaningful results. Here we introduce UNCURL-App, a comprehensive online tool for analyzing scRNA-Seq data, which allows for the integration of prior knowledge into all stages of the analysis pipelines including clustering, visualization, and differential expression. This tool provides an interactive interface to our UNCURL software for data preprocessing and clustering, thereby allowing users to use UNCURL without programming. This step identifies cell types and creates a low-dimensional representation for visualization. In addition, our tool allows users to assess the importance of the identified clusters. This is done by finding the differentially expressed genes in each cell type, and integrating external knowledge bases into the data analysis process to determine the biological relevance of the identified genes. Finally, UNCURL-App allows users to interact with the analysis pipeline by iteratively splitting or merging cell types.


05/14: Building probabilistic models of RNA-seq experiments using approximate likelihood -- Daniel

05/21: Biocellion: high-performance software for modeling, simulation and visualization of many-cell systems -- Dr. Simon Kahan, Biocellion/Dr. Ilya Shmulevich, ISB

Abstract:   For decades, 3d models have been reducing cost, accelerating progress and improving results in the automotive, aerospace, and architecture and petroleum industries. Despite the continued failure of in vitro and animal testing to reliably demonstrate efficacy and establish safety of drug and consumer care products, the life science industries are only just beginning to embrace whole-system 3d modeling and simulation as an alternative.

Why? Because modeling complex living systems is hard; simulating these models at sufficient scale and duration demands purpose-built high-performance software; and interactive visualization of the highly dynamic simulation results poses new challenges for graphics engines.

We present Biocellion and Biovision software solutions. Biocellion is a platform that supports development of living system models at cell-resolution, integrating biological, chemical and mechanical rules of interaction. Biocellion simulates these models as they grow to tens of billions of cells. Biovision provides interactive exploration of the simulation results over time.

We illustrate results from the application of Biocellion at P&G to skin growth and response to toxic materials. We also show images from Pacific Northwest National Laboratory comparing simulations of intestinal response to a low- versus high-fiber diet.

Though only recently developed, our models are able already to recapitulate many aspects of tissue growth, homeostasis and response to some interventions. Using Biocellion, they can be incrementally extended and improved to become increasingly predictive under an ever broadening spectrum of interventions.

05/28:   -- Holiday

 Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Biostatistics Seminars
Microbiology Department Seminars
 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
A comprehensive FAQ at, including annotated links to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 590A: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX