image University of Washington Computer Science & Engineering
  CSE 590CSp '19:  Reading & Research in Comp. Bio.
  CSE Home   About Us    Search    Contact Info 

 Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.
When/Where: Mondays, 3:30 - 4:50, ECE 031 (room info)
Organizers: Sreeram Kannan, Bill Noble, Larry Ruzzo, Yuliang Wang
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Course-related announcements and discussions
  Manage Your Subscription List Archives Biology seminar announcements from all around campus
  Manage Your Subscription List Archives Computational biology discussions, conference/job postings, etc.,
  Manage Your Subscription List Archives
 Theme Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our tentative Spring schedule is:
 Date  Presenters/Participants Topic Details
04/01---- Organizational Meeting ----
04/08Lee OrganickDNA for digital storage 
04/15Prof. Ka Yee Yeung, UWTHow do we create, share and execute reproducible bioinformatics workflows?Details
04/22Gabe Erion // Ian CovertCost-Aware AI // TBADetails
04/29Shunfu MaoComparative analysis of single-cell RNA-seqDetails
05/06Dr. Ritambhara Singh, UWUnsupervised manifold alignment for single-cell genomicsDetails
05/13Dr. Yuliang Wang, UWMetabolic network models of prostate cancer using spatial transcriptomics dataDetails
05/20Erin Wilson // Yue ZhangEngineering micro-organisms // TBADetails
06/03Jacob SchreiberA pitfall for machine learning methods aiming to predict across cell typesDetails
 Papers, etc.

  Note on Electronic Access to Journals

The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password.  

04/01:   -- ---- Organizational Meeting ----

04/08: DNA for digital storage -- Lee Organick

04/15: How do we create, share and execute reproducible bioinformatics workflows? -- Prof. Ka Yee Yeung, UWT

04/22: Cost-Aware AI // TBA -- Gabe Erion // Ian Covert

Two short talks this week:
  • G. Erion,  "CoAI: Cost-Aware Artificial Intelligence for Efficient Prehospital Diagnosis of Trauma Patients"

    Abstract:  While artificial intelligence (AI) and machine learning (ML) are becoming widely used throughout medicine, the analysis of the cost of an ML model's predictions, particularly in terms of time or effort, has been very limited. For example, a model may accurately predict that a trauma patient will have acute traumatic coagulopathy (ATC), a bleeding disorder, when all available patient features are given; however, it may heavily rely on hard-to-measure patient features, like blood pressure or Glasgow Coma Score, to do so. Standard ML techniques do not prioritize using features that are fast and easy to acquire, which is a key factor to minimize death and injury.

    This idea, which we refer to as cost-aware prediction, is a topic of recent interest in the field of machine learning. However, there are substantial limitations to existing methods, and the impact of applying these methods to clinical settings has not been demonstrated. Our work aims to adopt recent advances in ML and explainable AI to develop more powerful cost-aware prediction techniques and demonstrate the value of these methods using clinical data. We show that even simple cost-aware AI methods can provide predictions that are at least as accurate as existing clinical risk scores and substantially reduce time and effort required for variable collection.

  • I. Covert,  "TBA"

04/29: Comparative analysis of single-cell RNA-seq -- Shunfu Mao

05/06: Unsupervised manifold alignment for single-cell genomics -- Dr. Ritambhara Singh, UW

Abstract:   Availability of genomics datasets across multiple experiments has increased data integration efforts to combine information and achieve more in-depth insights into cellular mechanisms. A primary focus of this effort has been on the single-cell genomics data. Advances in single-cell sequencing technologies have allowed scientists to explore cell-to-cell variation within a cell population. While integrating single-cell data is critical for our analysis of cell development and diseases, the heterogeneity among cells presents unique challenges to this task. To address this, we propose an unsupervised manifold alignment method to align the relevant features of the cells across different experiments. Our central assumption is that the single-cells, measured separately for different experiments, are sampled from a shared space. Therefore, by performing manifold alignment, we attempt to find a shared latent space where these cell measurements are aligned thus acting as an in-silico co-assay. We minimize an objective function that uses the Maximum Mean Discrepancy (MMD) function to match separate distributions while maintaining the underlying structure of the data. We apply this method to various single-cell experiments, for e.g. single-cell RNA-seq (gene expression) and ATAC-seq (accessible chromatin regions) measurements for different time-points during cell differentiation. Preliminary results show that our proposed algorithm can effectively align cells from certain time-points, across the two experiments, in a low-dimensional latent space.

And the following would be useful for background reading:

05/13: Metabolic network models of prostate cancer using spatial transcriptomics data -- Dr. Yuliang Wang, UW

Abstract:   Metabolic reprogramming is a hallmark of cancer, and there is great need to exploit cancer metabolic aberrations to develop novel selective therapies. Genome-scale metabolic network models have been successfully to model metabolic reprogramming in multiple types of cancers. However, current models are based on bulk gene expression data, and did not consider the spatial heterogeneity in the tumor microenvironment. Recent studies clearly demonstrated that spatial heterogeneity is a fundamental feature of the tumor microenvironment.

We performed metabolic network analysis using spatial transcriptomics data of prostate cancer microenvironment and revealed extensive spatial heterogeneity. We made novel malignant cell-specific metabolic vulnerabilities in multiple metabolic pathways that would have been missed by bulk models without spatial information. Some of our novel predictions can be targeted by existing drugs.

Metabolism is implicated in a wide range diseases. As more spatially resolved transcriptomics data are generated for multiple types of cancer and other diseases, the analytical workflow in this study can be applied to reveal novel metabolic strategies for disease treatment.

05/20: Engineering micro-organisms // TBA -- Erin Wilson // Yue Zhang

Two short talks this week:
  • E. Wilson,  "Engineering micro-organisms for macro problems."

  • Y. Zhang,  "TBA"

05/27:   -- Holiday

06/03: A pitfall for machine learning methods aiming to predict across cell types -- Jacob Schreiber

Authors:   J Schreiber, R Singh, J Bilmes, and WS Noble

Abstract:   Machine learning models to predict phenomena such as gene expression, enhancer activity, transcription factor binding, or chromatin conformation are most useful when they can generalize to make accurate predictions across cell types. In this situation, a natural strategy is to train the model on experimental data from some cell types and evaluate performance on one or more held-out cell types. In this work, we show that, when the training set contains examples derived from the same genomic loci across multiple cell types, then the resulting model can be susceptible to a particular form of bias related to memorizing the average activity associated with each genomic locus. Consequently, the trained model may appear to perform well when evaluated on the genomic loci that it was trained on but tends to perform poorly on loci that it was not trained on. We demonstrate this phenomenon by using epigenomic measurements and nucleotide sequence to predict gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data and computing resources become available, future projects will increasingly risk suffering from this issue.


 Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Biostatistics Seminars
Microbiology Department Seminars
 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A comprehensive FAQ at, including annotated links to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 527: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX