image University of Washington Computer Science & Engineering
  CSE 590CSp '22:  Reading & Research in Comp. Bio.
  CSE Home   About Us    Search    Contact Info 

 Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.
When/Where: Mondays, 3:30 - 4:50, DEN 259 (room info)
Organizers: Su-In Lee, Wouter Meuleman, Sara Mostafavi, Bill Noble, Larry Ruzzo, Sheng Wang
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Course-related announcements and discussions
  Manage Your Subscription List Archives Biology seminar announcements from all around campus
  Manage Your Subscription List Archives Computational biology discussions, conference/job postings, etc.,
  Manage Your Subscription List Archives
 Theme Traditionally, we reserve Spring quarter for "homegrown" research --- highlights of work by researchers in the Seattle area. Our tentative Spring schedule is:
 Videos Video recordings of all Zoom presentations are here.
 Date  Presenters/Participants Topic Details
03/28---- Organizational Meeting ----
04/04No Meeting
04/11Nicasia Beebe-WangAI framework uncovers relationships between gene expression and Alzheimer's diseaseDetails
04/18Danny E. MillerLong-read sequencing to identify missing disease-causing variationDetails
04/25Melih YilmazML for de novo mass spec peptide sequencingDetails
05/02Pascal SturmfelsLearning inverse foldingDetails;  Slides
05/09No Meeting
05/16Alyssa La FleurInterpreting neural networks for biological sequences by learning stochastic masksDetails
05/23Wouter Meuleman, Altius InstituteLarge-scale genomic data integration and visualization: towards Augmented GenomicsDetails;  Chat
 Papers, etc.

  Note on Electronic Access to Journals

The UW Library is generally a paid subscriber to non-open-access journals we cite. You can freely access these articles from on-campus computers. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password.  

03/28:   -- ---- Organizational Meeting ----

04/04:   -- No Meeting

04/11: AI framework uncovers relationships between gene expression and Alzheimer's disease -- Nicasia Beebe-Wang

  • N Beebe-Wang, S Celik, E Weinberger, P Sturmfels, PL De Jager, S Mostafavi, SI Lee, "Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer's disease neuropathologies." Nat Commun, 12, #1 (2021) 5369. [offcampus]

04/18: Long-read sequencing to identify missing disease-causing variation -- Danny E. Miller
Two relevant papers:

  • DE Miller, A Sulovari, T Wang, H Loucks, K Hoekzema, KM Munson, AP Lewis, EPA Fuerte, CR Paschal, T Walsh, J Thies, JT Bennett, I Glass, KM Dipple, K Patterson, et 36 al., "Targeted long-read sequencing identifies missing disease-causing variation." Am J Hum Genet, 108, #8 (2021) 1436-1449. [offcampus]

04/25: ML for de novo mass spec peptide sequencing -- Melih Yilmaz

  • Melih Yilmaz, William E. Fondrie, Wout Bittremieux, Sewoong Oh, William Stafford Noble, "De novo mass spectrometry peptide sequencing with a transformer model."

    Abstract Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids--the peptide--responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to address this de novo sequencing problem, challenges that arise when modeling tandem mass spectra have led to complex models that combine multiple neural networks and post-processing steps. We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with out-of-distribution samples, i.e., spectra with never-before-seen peptide labels. Casanovo not only achieves superior performance but does so at a fraction of the model complexity and inference time required by other methods.

05/02: Learning inverse folding -- Pascal Sturmfels

05/09:   -- No Meeting

05/16: Interpreting neural networks for biological sequences by learning stochastic masks -- Alyssa La Fleur

  • Johannes Linder, Alyssa La Fleur, Zibo Chen, Ajasja Ljubetic, David Baker, Sreeram Kannan, and Georg Seelig,"Interpreting neural networks for biological sequences by learning stochastic masks." Nat Mach Intell 4, 41-54 (2022).

    Abstract Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting nonlinear interactions in molecular sequences. Here, building on work in computer vision and natural language processing, we developed an approach based on deep learning-scrambler networks-wherein the most important sequence positions are identified with learned input masks. Scramblers learn to predict position-specific scoring matrices where unimportant nucleotides or residues are scrambled by raising their entropy. We apply scramblers to interpret the effects of genetic variants, uncover nonlinear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo-designed proteins. We show that scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods. [offcampus] . Also:

05/23: Large-scale genomic data integration and visualization: towards Augmented Genomics -- Wouter Meuleman, Altius Institute

Abstract:   Although technological developments have made it possible to construct rich genome-wide datasets measuring a variety of biological phenomena across hundreds of human cellular conditions, the scale and complexity precludes routine utility of such data. We develop computational and machine learning approaches to reduce their complexity, while maximally retaining relevant information. Our long term research goal is to make Augmented Genomics a reality: a new field in which the work of genome scientists is supplemented -- not replaced! -- by large-scale visualization and data-driven machine intelligence. I'll present our current vision for this field, along with a number of directions we are working in.

05/30:   -- Holiday

 Other Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A comprehensive FAQ at, including annotated links to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 527: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX