|
![]() |
![]() |
![]() |
![]() |
Course Info |
CSE 590C is a weekly seminar on Readings and Research in
Computational Biology, open to all graduate students in computational,
biological, and mathematical sciences.
| |||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Schedule |
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Papers, etc. |
Links to full papers below are often to journals that require a
paid subscription. The UW Library is generally a paid
subscriber, and you can freely access these articles if you do
so from an on-campus computer. For off-campus access,
follow the "[offcampus]" links below or
look at the
library "proxy server" instructions.
You will be prompted for your UW net ID and password once per
session.
|
Abstract:
The core of every shotgun proteomics analysis pipeline is a function that scores the quality of a match
between an observed fragmentation spectrum and a candidate peptide. The utility of these scores is critically
dependent on their statistical calibration. For a well-calibrated score function, a score of X assigned to one
spectrum is directly comparable to a score of X assigned to a different spectrum. Improving calibration of a score
function across spectra can lead to large improvements in the number of identified spectra at a given statistical
confidence threshold. Score calibration has been carried out previously using empirical curve fitting procedures to
estimate p-values, or with post-processors such as PeptideProphet and Percolator.
This work describes a new method for computing exact p-values for the oldest and one of the most widely used score functions, SEQUEST XCorr. Dynamic programming is used to efficiently compute the full distribution of scores for all possible peptides whose masses are close to that of the spectrum precursor mass. We find that the resulting p-values are valid relative to a widely accepted null model, and that ranking identified spectra by p-value rather than XCorr reduces variance due to spectrum-specific effects on the score. Across a variety of data sets, our XCorr p-value yields significantly more spectrum and peptide identifications at a fixed false discovery rate than other, state-of-the-art methods, including SEQUEST, Mascot, X!Tandem, and Comet, and is competitive with other dynamic programming-based calibration methods like MS-GF+. Strikingly, the improved calibration afforded by our scoring scheme is complementary to that provided by Percolator, so that combination of the two methods yields even better results. Our method is able to take advantage of both high-resolution MS1 and MS2 data. |
04/21: Learning Statistical Dependency Structure Among CHIP-seq Tracks -- Scott Lundberg
04/28: Genome annotation of multiple cell types and chromatin architecture using graph-based regularization -- Max Libbrecht
Authors:
Maxwell W. Libbrecht (1), Michael M. Hoffman (2), Ferhat Ay (3), David M. Gilbert (4), Jeffrey A. Bilmes (5), William S. Noble (1,3).
(1) Computer Science & Eng., U Washington; (2) Princess Margaret Cancer Center; (3) Genome Sciences, U Washington; (4) Biological Science, Florida State U; (5) Electrical Eng., U Washington
Abstract: Semi-automated genome annotation algorithms facilitate human interpretation of large, heterogeneous collections of functional genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing methods fail to address two problems related to genome annotation: (1) performing genome annotation in multiple cell types and (2) integrating 3D structure information into the annotation. We propose a single solution to these seemingly different problems using the idea of a pairwise prior, which encourages certain pairs of genomic positions to receive the same label. We developed a novel computational method, called graph-based regularization (GBR), that performs inference in the presence of a pairwise prior. We first use GBR to annotate multiple cell types, transferring via the pairwise prior the information that pairs of genomic loci that received the same label in a reference cell type should be more likely to receive the same label in the cell type in question. We then use GBR to integrate 3D structure information from chromatin conformation assays such as Hi-C. In this case, the the pairwise prior encourages positions that are close in 3D to occupy the same type of domain. This approach allows us to annotate the human cell line IMR90 and thereby characterize the ontology of domains, revealing the relationships between Polycomb and constitutively repressed domains, topological domains, and replication domains. Finally, we use annotations over six human cell lines to find sequence elements that mark developmentally-conserved boundaries between domains. |
05/05: Analysis of splicing and transcription in RNA-seq experiments -- Daniel Jones
05/12: Models to identify peptides from data-independent acquisition mass spectra -- Alex Hu
05/19: Copy Number Variation in Human Gut Microbial Species -- Sharon Greenblum
05/26: -- Holiday
06/02: AUREA Nebula: Cloud based network analysis -- John Earls
CSE's Computational Molecular Biology research group
Interdisciplinary Ph.D. program in Computational Molecular Biology
![]() |
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX |