CSE 490MT: Outline of Phase 1

University of Washington Computer Science & Engineering

CSE Home

About Us

Contact Info

Phase 2

Project

CSE 490MT Home

Here is an outline of the process you will automate for phase 1 of the project:

Pick gene g and bacterium b containing g.
Get b's accession number.
Look in b's protein table to find g's protein product pid.
Run perl script to download blast result and gather first x hits. This will produce x pairs of pid p_i and species name s_i, one for each BLAST hit.
Repeat steps 5 - 10 for each i:
Use accession_species table to look up accession num a_i corresponding to s_i.
Look in [a_i]_protein_table for pid = p_i.
Find genomic indices of upstream region.
Look up genome.
Extract upstream region and set aside.
Look up amino acid sequence (in [a_i]_amino_acid_table) and set aside.

Pass set of amino acid sequences to ClustalW to produce phylogeny T.
Pass T, set of upstream regions, and appropriate parameters to FootPrinter.
Present results.

Eventually you will want to

iterate over different choices of parameters to FootPrinter, choosing the "most interesting" results, and
iterate over different starting genes, perhaps reporting only those that terminated in the "most interesting" FootPrinter results.

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]