Retro prof in the lab University of Washington Computer Science & Engineering
 Computational Biology Capstone: Project
  CSE Home   About Us    Search    Contact Info 

 Course Home
 Software
 Data
 References
   

Correlating Genomic Annotations

Project goals

  1. Software design goal: a pipeline that takes as input a genome annotation table A and, for each feasible Genome Browser table B, determines whether A and B are more correlated than expected by chance.
  2. Analysis goal: apply the pipeline to the extremely conserved elements with the goal of trying to understand the function of such extreme conservation.

Phase Alpha restrictions

We start with a simplified version of the project, but design the software for ease of later extensions.
  1. The input genome annotation table A consists of unmarked segments given by the extremely conserved elements.
  2. Genome Browser table B is feasible if a set of unmarked segments can be extracted from it.
  3. We will only look at tables from the hg19 human genome assembly.
  4. Correlation of A and B is measured by the number of base-pairs by which they overlap.
  5. Monte Carlo simulation to determine the p-value of this overlap will be done by preserving everything in table B, by preserving the segment and intersegment lengths of table A, but randomizing the positions of the segments and intersegments of table A, as described in statistical tests, "US-US Overlap" section, "Null Hypothesis 3" subsection.
  6. There will be one such analysis per human chromosome.

Phase Beta restrictions

  1. The input genome annotation table A consists of unmarked segments given by the extremely conserved elements.
  2. Genome Browser table B is feasible if it can be treated as a partial function from genomic positions to real numbers. This includes tables of type wig, bigwig, and bedgraph.
  3. We will only look at tables from the hg19 human genome assembly.
  4. Correlation of A and B is measured by the average value of the table B function over positions in table A segments.
  5. Monte Carlo simulation to determine the p-value of this overlap will be done by preserving everything in table B, by preserving the segment and intersegment lengths of table A, but randomizing the positions of the segments and intersegments of table A.
  6. There will be one such analysis per human chromosome.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to cse428-owner]