|
|
|
|
Correlating Genomic Annotations
Project goals
-
Software design goal: a pipeline that takes as input a genome
annotation table A and, for each feasible
Genome Browser table B,
determines whether A and B are more correlated than expected by
chance.
-
Analysis goal: apply the pipeline to
the extremely conserved elements with
the goal of trying to understand the function of such extreme
conservation.
Phase Alpha restrictions
We start with a simplified version of the project, but design the
software for ease of later extensions.
- The input genome annotation table A consists of unmarked segments
given by the extremely conserved elements.
- Genome Browser table B is feasible if a set of unmarked
segments can be extracted from it.
- We will only look at tables from the hg19 human genome assembly.
- Correlation of A and B is measured by the number of base-pairs by
which they overlap.
- Monte Carlo simulation to determine the p-value of this overlap
will be done by preserving everything in table B, by preserving the
segment and intersegment lengths of table A, but randomizing the
positions of the segments and intersegments of table A, as described
in statistical tests, "US-US
Overlap" section, "Null Hypothesis 3" subsection.
- There will be one such analysis per human chromosome.
Phase Beta restrictions
- The input genome annotation table A consists of unmarked segments
given by the extremely conserved elements.
- Genome Browser table B is feasible if it can be treated as a partial
function from genomic positions to real numbers. This includes
tables of type wig, bigwig, and bedgraph.
- We will only look at tables from the hg19 human genome assembly.
- Correlation of A and B is measured by the average value of the
table B function over positions in table A segments.
- Monte Carlo simulation to determine the p-value of this overlap
will be done by preserving everything in table B, by preserving the
segment and intersegment lengths of table A, but randomizing the
positions of the segments and intersegments of table A.
- There will be one such analysis per human chromosome.
|