Retro prof in the lab University of Washington Computer Science & Engineering
 CSE 490MT: Outline of Phase 1
  CSE Home   About Us    Search    Contact Info 

 Phase 2
 Project
 CSE 490MT Home
   

Here is an outline of the process you will automate for phase 1 of the project:

  1. Pick gene g and bacterium b containing g.
  2. Get b's accession number.
  3. Look in b's protein table to find g's protein product pid.
  4. Run perl script to download blast result and gather first x hits. This will produce x pairs of pid p_i and species name s_i, one for each BLAST hit.

    Repeat steps 5 - 10 for each i:

  5. Use accession_species table to look up accession num a_i corresponding to s_i.
  6. Look in [a_i]_protein_table for pid = p_i.
  7. Find genomic indices of upstream region.
  8. Look up genome.
  9. Extract upstream region and set aside.
  10. Look up amino acid sequence (in [a_i]_amino_acid_table) and set aside.

  11. Pass set of amino acid sequences to ClustalW to produce phylogeny T.
  12. Pass T, set of upstream regions, and appropriate parameters to FootPrinter.
  13. Present results.

Eventually you will want to

  1. iterate over different choices of parameters to FootPrinter, choosing the "most interesting" results, and
  2. iterate over different starting genes, perhaps reporting only those that terminated in the "most interesting" FootPrinter results.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]