image University of Washington Computer Science & Engineering
  CSE 590CSp '14:  Reading & Research in Comp. Bio.
  CSE Home   About Us    Search    Contact Info 

 Course Info    CSE 590C is a weekly seminar on Readings and Research in Computational Biology, open to all graduate students in computational, biological, and mathematical sciences.
When/Where:  Mondays, 3:30 - 4:50, EE1 026 (schematic)
Organizers:  Joe Felsenstein, Su-In Lee, Bill Noble, Larry Ruzzo
Credit: 1-3 Variable
Grading: Credit/No Credit. Talk to the organizers if you are unsure of our expectations.
 Email Course-related announcements and discussions
  Manage Your Subscription List Archives Biology seminar announcements from all around campus
  Manage Your Subscription List Archives Computational biology discussions, conference/job postings, etc.,
  Manage Your Subscription List Archives
 Date  Presenters/Participants Topic Details
03/31---- Organizational Meeting ----
04/07Erick Matsen, FHCRCSubstitution and per-residue selection in B cell affinity maturationDetails
04/14Jeff HowbertComputing exact p-values to improve calibration of a cross-correlation shotgun proteomics scoring function.Details
04/21Scott LundbergLearning Statistical Dependency Structure Among CHIP-seq Tracks 
04/28Max LibbrechtGenome annotation of multiple cell types and chromatin architecture using graph-based regularizationDetails
05/05Daniel JonesAnalysis of splicing and transcription in RNA-seq experiments 
05/12Alex HuModels to identify peptides from data-independent acquisition mass spectra 
05/19Sharon GreenblumCopy Number Variation in Human Gut Microbial Species 
06/02John EarlsAUREA Nebula: Cloud based network analysis 

 Papers, etc.

  Note on Electronic Access to Journals

Links to full papers below are often to journals that require a paid subscription. The UW Library is generally a paid subscriber, and you can freely access these articles if you do so from an on-campus computer. For off-campus access, follow the "[offcampus]" links below or look at the library "proxy server" instructions. You will be prompted for your UW net ID and password once per session.  

03/31: ---- Organizational Meeting ----

04/07: Substitution and per-residue selection in B cell affinity maturation -- Erick Matsen, FHCRC

04/14: Computing exact p-values to improve calibration of a cross-correlation shotgun proteomics scoring function. -- Jeff Howbert

    Abstract:   The core of every shotgun proteomics analysis pipeline is a function that scores the quality of a match between an observed fragmentation spectrum and a candidate peptide. The utility of these scores is critically dependent on their statistical calibration. For a well-calibrated score function, a score of X assigned to one spectrum is directly comparable to a score of X assigned to a different spectrum. Improving calibration of a score function across spectra can lead to large improvements in the number of identified spectra at a given statistical confidence threshold. Score calibration has been carried out previously using empirical curve fitting procedures to estimate p-values, or with post-processors such as PeptideProphet and Percolator.

This work describes a new method for computing exact p-values for the oldest and one of the most widely used score functions, SEQUEST XCorr. Dynamic programming is used to efficiently compute the full distribution of scores for all possible peptides whose masses are close to that of the spectrum precursor mass. We find that the resulting p-values are valid relative to a widely accepted null model, and that ranking identified spectra by p-value rather than XCorr reduces variance due to spectrum-specific effects on the score. Across a variety of data sets, our XCorr p-value yields significantly more spectrum and peptide identifications at a fixed false discovery rate than other, state-of-the-art methods, including SEQUEST, Mascot, X!Tandem, and Comet, and is competitive with other dynamic programming-based calibration methods like MS-GF+. Strikingly, the improved calibration afforded by our scoring scheme is complementary to that provided by Percolator, so that combination of the two methods yields even better results. Our method is able to take advantage of both high-resolution MS1 and MS2 data.

04/21: Learning Statistical Dependency Structure Among CHIP-seq Tracks -- Scott Lundberg

04/28: Genome annotation of multiple cell types and chromatin architecture using graph-based regularization -- Max Libbrecht

    Authors:   Maxwell W. Libbrecht (1), Michael M. Hoffman (2), Ferhat Ay (3), David M. Gilbert (4), Jeffrey A. Bilmes (5), William S. Noble (1,3). (1) Computer Science & Eng., U Washington; (2) Princess Margaret Cancer Center; (3) Genome Sciences, U Washington; (4) Biological Science, Florida State U; (5) Electrical Eng., U Washington

Abstract:   Semi-automated genome annotation algorithms facilitate human interpretation of large, heterogeneous collections of functional genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing methods fail to address two problems related to genome annotation: (1) performing genome annotation in multiple cell types and (2) integrating 3D structure information into the annotation. We propose a single solution to these seemingly different problems using the idea of a pairwise prior, which encourages certain pairs of genomic positions to receive the same label. We developed a novel computational method, called graph-based regularization (GBR), that performs inference in the presence of a pairwise prior. We first use GBR to annotate multiple cell types, transferring via the pairwise prior the information that pairs of genomic loci that received the same label in a reference cell type should be more likely to receive the same label in the cell type in question. We then use GBR to integrate 3D structure information from chromatin conformation assays such as Hi-C. In this case, the the pairwise prior encourages positions that are close in 3D to occupy the same type of domain. This approach allows us to annotate the human cell line IMR90 and thereby characterize the ontology of domains, revealing the relationships between Polycomb and constitutively repressed domains, topological domains, and replication domains. Finally, we use annotations over six human cell lines to find sequence elements that mark developmentally-conserved boundaries between domains.

05/05: Analysis of splicing and transcription in RNA-seq experiments -- Daniel Jones

05/12: Models to identify peptides from data-independent acquisition mass spectra -- Alex Hu

05/19: Copy Number Variation in Human Gut Microbial Species -- Sharon Greenblum

05/26:   -- Holiday

06/02: AUREA Nebula: Cloud based network analysis -- John Earls

 Other  Seminars Past quarters of CSE 590C
COMBI & Genome Sciences Seminars
Biostatistics Seminars
Microbiology Department Seminars

 Resources Molecular Biology for Computer Scientists, a primer by Lawrence Hunter (46 pages)
A Quick Introduction to Elements of Biology, a primer by Alvis Brazma et al.
A very comprehensive FAQ at, including annotated references to online tutorials and lectures.
CSE 527: Computational Biology
CSEP 590A: Computational Biology (Professional Masters Program)
Genome 540/541: Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology

CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX