Papers, etc. |
Note on Electronic Access to Journals
The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these
articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the
library "proxy server" instructions. You will be
prompted for your UW net ID and password.
04/01: -- ---- Organizational Meeting ----
04/08: DNA for digital storage -- Lee Organick
04/15: How do we create, share and execute reproducible bioinformatics workflows? -- Prof. Ka Yee Yeung, UWT
- L-H Hung, J Hu, T Meiss, A Ingersoll, W Lloyd, D Kristiyanto, Y Xiong,
E Sobie, and KY Yeung, "Building containerized workflows using the
BioDepot-workflow-builder (Bwb),"
https://www.biorxiv.org/content/10.1101/099010v2/
- L-H Hung, D Kumanov, X Niu, W Lloyd, KY Yeung, "Rapid RNA sequencing
data analysis using serverless computing,"
https://www.biorxiv.org/content/10.1101/576199v1/
- R Almugbel, LH Hung, J Hu, A Almutairy, N Ortogero, Y Tamta, KY Yeung, "Reproducible Bioconductor workflows using browser-based interactive notebooks and containers." J Am Med Inform Assoc, 25, #1 (2018) 4-12.
[offcampus]
04/22: Cost-Aware AI // TBA -- Gabe Erion // Ian Covert
Two short talks this week:
G. Erion, "CoAI: Cost-Aware Artificial Intelligence for Efficient Prehospital Diagnosis of Trauma Patients"
Abstract: While artificial intelligence (AI) and machine learning (ML) are becoming widely used throughout medicine, the
analysis of the cost of an ML model's predictions, particularly in terms of time or effort, has been very
limited. For example, a model may accurately predict that a trauma patient will have acute traumatic coagulopathy
(ATC), a bleeding disorder, when all available patient features are given; however, it may heavily rely on
hard-to-measure patient features, like blood pressure or Glasgow Coma Score, to do so. Standard ML techniques do
not prioritize using features that are fast and easy to acquire, which is a key factor to minimize death and
injury.
This idea, which we refer to as cost-aware prediction, is a topic of recent interest in the field of machine
learning. However, there are substantial limitations to existing methods, and the impact of applying these
methods to clinical settings has not been demonstrated. Our work aims to adopt recent advances in ML and
explainable AI to develop more powerful cost-aware prediction techniques and demonstrate the value of these
methods using clinical data. We show that even simple cost-aware AI methods can provide predictions that are at
least as accurate as existing clinical risk scores and substantially reduce time and effort required for variable
collection. I. Covert, "TBA"
04/29: Comparative analysis of single-cell RNA-seq -- Shunfu Mao
05/06: Unsupervised manifold alignment for single-cell genomics -- Dr. Ritambhara Singh, UW
Abstract:
Availability of genomics datasets across multiple experiments has increased data integration efforts to
combine information and achieve more in-depth insights into cellular mechanisms. A primary focus of this effort has
been on the single-cell genomics data. Advances in single-cell sequencing technologies have allowed scientists to
explore cell-to-cell variation within a cell population. While integrating single-cell data is critical for our
analysis of cell development and diseases, the heterogeneity among cells presents unique challenges to this
task. To address this, we propose an unsupervised manifold alignment method to align the relevant features of the
cells across different experiments. Our central assumption is that the single-cells, measured separately for
different experiments, are sampled from a shared space. Therefore, by performing manifold alignment, we attempt to
find a shared latent space where these cell measurements are aligned thus acting as an in-silico co-assay. We
minimize an objective function that uses the Maximum Mean Discrepancy (MMD) function to match separate
distributions while maintaining the underlying structure of the data. We apply this method to various single-cell
experiments, for e.g. single-cell RNA-seq (gene expression) and ATAC-seq (accessible chromatin regions)
measurements for different time-points during cell differentiation. Preliminary results show that our proposed
algorithm can effectively align cells from certain time-points, across the two experiments, in a low-dimensional
latent space.
And the following would be useful for background reading: http://www-anw.cs.umass.edu/pubs/2011/wang_k_m_11.pdf
05/13: Metabolic network models of prostate cancer using spatial transcriptomics data -- Dr. Yuliang Wang, UW
Abstract:
Metabolic reprogramming is a hallmark of cancer, and there is great need to exploit cancer metabolic
aberrations to develop novel selective therapies. Genome-scale metabolic network models have been successfully to
model metabolic reprogramming in multiple types of cancers. However, current models are based on bulk gene
expression data, and did not consider the spatial heterogeneity in the tumor microenvironment. Recent studies
clearly demonstrated that spatial heterogeneity is a fundamental feature of the tumor microenvironment.
We performed metabolic network analysis using spatial transcriptomics data of prostate cancer microenvironment and
revealed extensive spatial heterogeneity. We made novel malignant cell-specific metabolic vulnerabilities in
multiple metabolic pathways that would have been missed by bulk models without spatial information. Some of our
novel predictions can be targeted by existing drugs.
Metabolism is implicated in a wide range diseases. As more spatially resolved transcriptomics data are generated for
multiple types of cancer and other diseases, the analytical workflow in this study can be applied to reveal novel
metabolic strategies for disease treatment.
05/20: Engineering micro-organisms // TBA -- Erin Wilson // Yue Zhang
Two short talks this week:
05/27: -- Holiday
06/03: A pitfall for machine learning methods aiming to predict across cell types -- Jacob Schreiber
Authors:
J Schreiber, R Singh, J Bilmes, and WS Noble
Abstract:
Machine learning models to predict phenomena such as gene expression, enhancer activity, transcription
factor binding, or chromatin conformation are most useful when they can generalize to make accurate predictions
across cell types. In this situation, a natural strategy is to train the model on experimental data from some cell
types and evaluate performance on one or more held-out cell types. In this work, we show that, when the training
set contains examples derived from the same genomic loci across multiple cell types, then the resulting model can
be susceptible to a particular form of bias related to memorizing the average activity associated with each genomic
locus. Consequently, the trained model may appear to perform well when evaluated on the genomic loci that it was
trained on but tends to perform poorly on loci that it was not trained on. We demonstrate this phenomenon by using
epigenomic measurements and nucleotide sequence to predict gene expression and chromatin domain boundaries, and we
suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data and computing resources become
available, future projects will increasingly risk suffering from this issue.
Link: https://www.biorxiv.org/content/10.1101/512434v1/
|