| Course Info
CSE 590C is a weekly seminar on Readings and Research in
Computational Biology, open to all graduate students in computational,
biological, and mathematical sciences.
| Papers, etc.
Note on Electronic Access to Journals
Links to full papers below are often to journals that require a
paid subscription. The UW Library is generally a paid
subscriber, and you can freely access these articles if you do
so from an on-campus computer. For off-campus access,
follow the "[offcampus]" links below or
look at the
library "proxy server" instructions.
You will be prompted for your UW net ID and password once per
03/31: ---- Organizational Meeting ----
04/07: Substitution and per-residue selection in B cell affinity maturation -- Erick Matsen, FHCRC
04/14: Computing exact p-values to improve calibration of a cross-correlation shotgun proteomics scoring function. -- Jeff Howbert
The core of every shotgun proteomics analysis pipeline is a function that scores the quality of a match
between an observed fragmentation spectrum and a candidate peptide. The utility of these scores is critically
dependent on their statistical calibration. For a well-calibrated score function, a score of X assigned to one
spectrum is directly comparable to a score of X assigned to a different spectrum. Improving calibration of a score
function across spectra can lead to large improvements in the number of identified spectra at a given statistical
confidence threshold. Score calibration has been carried out previously using empirical curve fitting procedures to
estimate p-values, or with post-processors such as PeptideProphet and Percolator.
This work describes a new method for computing exact p-values for the oldest and one of the most widely used score
functions, SEQUEST XCorr. Dynamic programming is used to efficiently compute the full distribution of scores for all
possible peptides whose masses are close to that of the spectrum precursor mass. We find that the resulting p-values
are valid relative to a widely accepted null model, and that ranking identified spectra by p-value rather than XCorr
reduces variance due to spectrum-specific effects on the score. Across a variety of data sets, our XCorr p-value
yields significantly more spectrum and peptide identifications at a fixed false discovery rate than other,
state-of-the-art methods, including SEQUEST, Mascot, X!Tandem, and Comet, and is competitive with other dynamic
programming-based calibration methods like MS-GF+. Strikingly, the improved calibration afforded by our scoring
scheme is complementary to that provided by Percolator, so that combination of the two methods yields even better
results. Our method is able to take advantage of both high-resolution MS1 and MS2 data.
04/21: Learning Statistical Dependency Structure Among CHIP-seq Tracks -- Scott Lundberg
04/28: Genome annotation of multiple cell types and chromatin architecture using graph-based regularization -- Max Libbrecht
Maxwell W. Libbrecht (1), Michael M. Hoffman (2), Ferhat Ay (3), David M. Gilbert (4), Jeffrey A. Bilmes (5), William S. Noble (1,3).
(1) Computer Science & Eng., U Washington; (2) Princess Margaret Cancer Center; (3) Genome Sciences, U Washington; (4) Biological Science, Florida State U; (5) Electrical Eng., U Washington
Semi-automated genome annotation algorithms facilitate human interpretation of large, heterogeneous collections of functional genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing methods fail to address two problems related to genome annotation: (1) performing genome annotation in multiple cell types and (2) integrating 3D structure information into the annotation. We propose a single solution to these seemingly different problems using the idea of a pairwise prior, which encourages certain pairs of genomic positions to receive the same label. We developed a novel computational method, called graph-based regularization (GBR), that performs inference in the presence of a pairwise prior. We first use GBR to annotate multiple cell types, transferring via the pairwise prior the information that pairs of genomic loci that received the same label in a reference cell type should be more likely to receive the same label in the cell type in question. We then use GBR to integrate 3D structure information from chromatin conformation assays such as Hi-C. In this case, the the pairwise prior encourages positions that are close in 3D to occupy the same type of domain. This approach allows us to annotate the human cell line IMR90 and thereby characterize the ontology of domains, revealing the relationships between Polycomb and constitutively repressed domains, topological domains, and replication domains. Finally, we use annotations over six human cell lines to find sequence elements that mark developmentally-conserved boundaries between domains.
05/05: Analysis of splicing and transcription in RNA-seq experiments -- Daniel Jones
05/12: Models to identify peptides from data-independent acquisition mass spectra -- Alex Hu
05/19: Copy Number Variation in Human Gut Microbial Species -- Sharon Greenblum
05/26: -- Holiday
06/02: AUREA Nebula: Cloud based network analysis -- John Earls