CSE 590C, Sp '22: Reading & Research in Comp. Bio.

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.

When/Where: Mondays, 3:30 - 4:50, DEN 259 (room info)

Organizers: Su-In Lee, Wouter Meuleman, Sara Mostafavi, Bill Noble, Larry Ruzzo, Sheng Wang

Credit: 1-3 Variable

Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.

Email

cse590cb@cs.washington.edu Course-related announcements and discussions
Manage Your Subscription List Archives

compbio-seminars@cs.washington.edu Biology seminar announcements from all around campus
Manage Your Subscription List Archives

compbio-group@cs.washington.edu Computational biology discussions, conference/job postings, etc.,
Manage Your Subscription List Archives

Theme Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our tentative Spring schedule is:

Videos Video recordings of all Zoom presentations are here.

Schedule

Date Presenters/Participants Topic Details
03/28 ---- Organizational Meeting ----

04/04 No Meeting

04/11 Nicasia Beebe-Wang AI framework uncovers relationships between gene expression and Alzheimer's disease Details

04/18 Danny E. Miller Long-read sequencing to identify missing disease-causing variation Details

04/25 Melih Yilmaz ML for de novo mass spec peptide sequencing Details

05/02 Pascal Sturmfels Learning inverse folding Details;  Slides

05/09 No Meeting

05/16 Alyssa La Fleur Interpreting neural networks for biological sequences by learning stochastic masks Details

05/23 Wouter Meuleman, Altius Institute Large-scale genomic data integration and visualization: towards Augmented Genomics Details;  Chat

05/30 Holiday

Papers, etc.
Note on Electronic Access to Journals
The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password.

03/28:   -- ---- Organizational Meeting ----

04/04:   -- No Meeting

04/11: AI framework uncovers relationships between gene expression and Alzheimer's disease -- Nicasia Beebe-Wang

N Beebe-Wang, S Celik, E Weinberger, P Sturmfels, PL De Jager, S Mostafavi, SI Lee, "Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer's disease neuropathologies." Nat Commun, 12, #1 (2021) 5369. [offcampus]

04/18: Long-read sequencing to identify missing disease-causing variation -- Danny E. Miller
Two relevant papers:
GA Logsdon, MR Vollger, EE Eichler, "Long-read human genome sequencing and its applications." Nat Rev Genet, 21, #10 (2020) 597-614. [offcampus]

DE Miller, A Sulovari, T Wang, H Loucks, K Hoekzema, KM Munson, AP Lewis, EPA Fuerte, CR Paschal, T Walsh, J Thies, JT Bennett, I Glass, KM Dipple, K Patterson, et 36 al., "Targeted long-read sequencing identifies missing disease-causing variation." Am J Hum Genet, 108, #8 (2021) 1436-1449. [offcampus]

04/25: ML for de novo mass spec peptide sequencing -- Melih Yilmaz

Melih Yilmaz, William E. Fondrie, Wout Bittremieux, Sewoong Oh, William Stafford Noble, "De novo mass spectrometry peptide sequencing with a transformer model."
Abstract Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids--the peptide--responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to address this de novo sequencing problem, challenges that arise when modeling tandem mass spectra have led to complex models that combine multiple neural networks and post-processing steps. We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with out-of-distribution samples, i.e., spectra with never-before-seen peptide labels. Casanovo not only achieves superior performance but does so at a fraction of the model complexity and inference time required by other methods.
https://www.biorxiv.org/content/10.1101/2022.02.07.479481v1

05/02: Learning inverse folding -- Pascal Sturmfels

Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives, "Learning inverse folding from millions of predicted structures." https://www.biorxiv.org/content/10.1101/2022.04.10.487779v1

05/09:   -- No Meeting

05/16: Interpreting neural networks for biological sequences by learning stochastic masks -- Alyssa La Fleur

Johannes Linder, Alyssa La Fleur, Zibo Chen, Ajasja Ljubetic, David Baker, Sreeram Kannan, and Georg Seelig,"Interpreting neural networks for biological sequences by learning stochastic masks." Nat Mach Intell 4, 41-54 (2022).
Abstract Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting nonlinear interactions in molecular sequences. Here, building on work in computer vision and natural language processing, we developed an approach based on deep learning-scrambler networks-wherein the most important sequence positions are identified with learned input masks. Scramblers learn to predict position-specific scoring matrices where unimportant nucleotides or residues are scrambled by raising their entropy. We apply scramblers to interpret the effects of genetic variants, uncover nonlinear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo-designed proteins. We show that scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods.
https://doi.org/10.1038/s42256-021-00428-6 [offcampus] . Also: https://www.biorxiv.org/content/10.1101/2021.04.29.441979v1

05/23: Large-scale genomic data integration and visualization: towards Augmented Genomics -- Wouter Meuleman, Altius Institute

Abstract:   Although technological developments have made it possible to construct rich genome-wide datasets measuring a variety of biological phenomena across hundreds of human cellular conditions, the scale and complexity precludes routine utility of such data. We develop computational and machine learning approaches to reduce their complexity, while maximally retaining relevant information. Our long term research goal is to make Augmented Genomics a reality: a new field in which the work of genome scientists is supplemented -- not replaced! -- by large-scale visualization and data-driven machine intelligence. I'll present our current vision for this field, along with a number of directions we are working in.

05/30:   -- Holiday

Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars

Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A comprehensive FAQ at bioinformatics.org, including annotated links to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 527: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX