|
|
|
|
Here is an outline of the process you will
automate for phase 1 of the project:
- Pick gene g and bacterium b containing g.
- Get b's accession number.
- Look in b's protein table to find g's protein product pid.
- Run perl script to download blast result and gather first x hits.
This will produce x pairs of pid p_i and species name s_i, one for
each BLAST hit.
Repeat steps 5 - 10 for each i:
- Use accession_species table to look up accession num a_i
corresponding to s_i.
- Look in [a_i]_protein_table for pid = p_i.
- Find genomic indices of upstream region.
- Look up genome.
- Extract upstream region and set aside.
- Look up amino acid sequence (in [a_i]_amino_acid_table) and set aside.
- Pass set of amino acid sequences to ClustalW to produce phylogeny T.
- Pass T, set of upstream regions, and appropriate parameters to
FootPrinter.
- Present results.
Eventually you will want to
-
iterate over different choices of
parameters to FootPrinter, choosing the "most interesting" results, and
-
iterate over different starting genes, perhaps reporting only those
that terminated in the "most interesting" FootPrinter results.
|