CSE 590C, Sp '19: Reading & Research in Comp. Bio.

Course Info

CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.

When/Where:	Mondays, 3:30 - 4:50, ECE 031 (room info)
Organizers:	Sreeram Kannan, Bill Noble, Larry Ruzzo, Yuliang Wang
Credit:	1-3 Variable
Grading:	Credit/No Credit. Talk to the organizers if you are unsure of our expectations.

cse590cb@cs.washington.edu			Course-related announcements and discussions
	Manage Your Subscription	List Archives
compbio-seminars@cs.washington.edu			Biology seminar announcements from all around campus
	Manage Your Subscription	List Archives
compbio-group@cs.washington.edu			Computational biology discussions, conference/job postings, etc.,
	Manage Your Subscription	List Archives

Theme

Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our tentative Spring schedule is:

Schedule

Date	Presenters/Participants	Topic	Details

04/01	---- Organizational Meeting ----
04/08	Lee Organick	DNA for digital storage
04/15	Prof. Ka Yee Yeung, UWT	How do we create, share and execute reproducible bioinformatics workflows?	Details
04/22	Gabe Erion // Ian Covert	Cost-Aware AI // TBA	Details
04/29	Shunfu Mao	Comparative analysis of single-cell RNA-seq	Details
05/06	Dr. Ritambhara Singh, UW	Unsupervised manifold alignment for single-cell genomics	Details
05/13	Dr. Yuliang Wang, UW	Metabolic network models of prostate cancer using spatial transcriptomics data	Details
05/20	Erin Wilson // Yue Zhang	Engineering micro-organisms // TBA	Details
05/27	Holiday
06/03	Jacob Schreiber	A pitfall for machine learning methods aiming to predict across cell types	Details

Papers, etc.

Note on Electronic Access to Journals

The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password.

04/01: -- ---- Organizational Meeting ----

04/08: DNA for digital storage -- Lee Organick

04/15: How do we create, share and execute reproducible bioinformatics workflows? -- Prof. Ka Yee Yeung, UWT

L-H Hung, J Hu, T Meiss, A Ingersoll, W Lloyd, D Kristiyanto, Y Xiong, E Sobie, and KY Yeung, "Building containerized workflows using the BioDepot-workflow-builder (Bwb)," https://www.biorxiv.org/content/10.1101/099010v2/
L-H Hung, D Kumanov, X Niu, W Lloyd, KY Yeung, "Rapid RNA sequencing data analysis using serverless computing," https://www.biorxiv.org/content/10.1101/576199v1/
R Almugbel, LH Hung, J Hu, A Almutairy, N Ortogero, Y Tamta, KY Yeung, "Reproducible Bioconductor workflows using browser-based interactive notebooks and containers." J Am Med Inform Assoc, 25, #1 (2018) 4-12. [offcampus]

04/22: Cost-Aware AI // TBA -- Gabe Erion // Ian Covert

Two short talks this week:

G. Erion, "CoAI: Cost-Aware Artificial Intelligence for Efficient Prehospital Diagnosis of Trauma Patients"

Abstract: While artificial intelligence (AI) and machine learning (ML) are becoming widely used throughout medicine, the analysis of the cost of an ML model's predictions, particularly in terms of time or effort, has been very limited. For example, a model may accurately predict that a trauma patient will have acute traumatic coagulopathy (ATC), a bleeding disorder, when all available patient features are given; however, it may heavily rely on hard-to-measure patient features, like blood pressure or Glasgow Coma Score, to do so. Standard ML techniques do not prioritize using features that are fast and easy to acquire, which is a key factor to minimize death and injury.
This idea, which we refer to as cost-aware prediction, is a topic of recent interest in the field of machine learning. However, there are substantial limitations to existing methods, and the impact of applying these methods to clinical settings has not been demonstrated. Our work aims to adopt recent advances in ML and explainable AI to develop more powerful cost-aware prediction techniques and demonstrate the value of these methods using clinical data. We show that even simple cost-aware AI methods can provide predictions that are at least as accurate as existing clinical risk scores and substantially reduce time and effort required for variable collection.
I. Covert, "TBA"

04/29: Comparative analysis of single-cell RNA-seq -- Shunfu Mao

A Alavi, M Ruffalo, A Parvangada, Z Huang, Z Bar-Joseph, "A web server for comparative analysis of single-cell RNA-seq data." Nat Commun, 9, #1 (2018) 4768. [offcampus]

05/06: Unsupervised manifold alignment for single-cell genomics -- Dr. Ritambhara Singh, UW

Abstract: Availability of genomics datasets across multiple experiments has increased data integration efforts to combine information and achieve more in-depth insights into cellular mechanisms. A primary focus of this effort has been on the single-cell genomics data. Advances in single-cell sequencing technologies have allowed scientists to explore cell-to-cell variation within a cell population. While integrating single-cell data is critical for our analysis of cell development and diseases, the heterogeneity among cells presents unique challenges to this task. To address this, we propose an unsupervised manifold alignment method to align the relevant features of the cells across different experiments. Our central assumption is that the single-cells, measured separately for different experiments, are sampled from a shared space. Therefore, by performing manifold alignment, we attempt to find a shared latent space where these cell measurements are aligned thus acting as an in-silico co-assay. We minimize an objective function that uses the Maximum Mean Discrepancy (MMD) function to match separate distributions while maintaining the underlying structure of the data. We apply this method to various single-cell experiments, for e.g. single-cell RNA-seq (gene expression) and ATAC-seq (accessible chromatin regions) measurements for different time-points during cell differentiation. Preliminary results show that our proposed algorithm can effectively align cells from certain time-points, across the two experiments, in a low-dimensional latent space.

And the following would be useful for background reading: http://www-anw.cs.umass.edu/pubs/2011/wang_k_m_11.pdf

05/13: Metabolic network models of prostate cancer using spatial transcriptomics data -- Dr. Yuliang Wang, UW

Abstract: Metabolic reprogramming is a hallmark of cancer, and there is great need to exploit cancer metabolic aberrations to develop novel selective therapies. Genome-scale metabolic network models have been successfully to model metabolic reprogramming in multiple types of cancers. However, current models are based on bulk gene expression data, and did not consider the spatial heterogeneity in the tumor microenvironment. Recent studies clearly demonstrated that spatial heterogeneity is a fundamental feature of the tumor microenvironment.

We performed metabolic network analysis using spatial transcriptomics data of prostate cancer microenvironment and revealed extensive spatial heterogeneity. We made novel malignant cell-specific metabolic vulnerabilities in multiple metabolic pathways that would have been missed by bulk models without spatial information. Some of our novel predictions can be targeted by existing drugs.

Metabolism is implicated in a wide range diseases. As more spatially resolved transcriptomics data are generated for multiple types of cancer and other diseases, the analytical workflow in this study can be applied to reveal novel metabolic strategies for disease treatment.

05/20: Engineering micro-organisms // TBA -- Erin Wilson // Yue Zhang

Two short talks this week:

E. Wilson, "Engineering micro-organisms for macro problems."
Y. Zhang, "TBA"

05/27: -- Holiday

06/03: A pitfall for machine learning methods aiming to predict across cell types -- Jacob Schreiber

Authors: J Schreiber, R Singh, J Bilmes, and WS Noble

Abstract: Machine learning models to predict phenomena such as gene expression, enhancer activity, transcription factor binding, or chromatin conformation are most useful when they can generalize to make accurate predictions across cell types. In this situation, a natural strategy is to train the model on experimental data from some cell types and evaluate performance on one or more held-out cell types. In this work, we show that, when the training set contains examples derived from the same genomic loci across multiple cell types, then the resulting model can be susceptible to a particular form of bias related to memorizing the average activity associated with each genomic locus. Consequently, the trained model may appear to perform well when evaluated on the genomic loci that it was trained on but tends to perform poorly on loci that it was not trained on. We demonstrate this phenomenon by using epigenomic measurements and nucleotide sequence to predict gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data and computing resources become available, future projects will increasingly risk suffering from this issue.

Link: https://www.biorxiv.org/content/10.1101/512434v1/

Other Seminars

Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Biostatistics Seminars
Microbiology Department Seminars

Resources

Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A comprehensive FAQ at bioinformatics.org, including annotated links to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 527: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology


	Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX