Retro prof in the lab University of Washington Computer Science & Engineering
 Computational Biology Capstone: Project
  CSE Home   About Us    Search    Contact Info 

 Course Home
 Software
 Data
 References
   

Tools for Prokaryotic Comparative Genomics

Ultimate Project goals

The goal is to build a software tool that takes as input the genomes of closely related prokaryotic species and automates comparative genomic analyses of these species. The tool should be easy for microbiologists to use. Here are some of the analyses the tool will do:
  1. Provide a good genome browser that allows comparative sequence exploration of the genomes. Mauve does much of this very nicely, with many wonderful features. Mauve does not show multiple sequence alignments when orthologous elements have opposite orientations, and may not be aggressive enough about finding all orthologies for some of the analyses listed below.
  2. For each species, find all unique genes, that is, those that do not occur in the other species.
  3. For each species, find all genes the other species have but are missing from this species. Classify these genes as entirely absent or as pseudogenes, indicating recent loss.
  4. Mauve provides a phylogenetic guide tree of the genomes it aligns. Generalizing the two points above, consider the guide tree's partition of species into most closely related subsets and determine the genes that are peculiar to each subset.
  5. Alternatively, the user supplies a partition of the species into two subsets and the tool deduces a "barcode" of genes that can be used to classify further species into one of these two subsets, as in O'Sullivan et al. Use this to classify symptoms, host, or niche.
  6. Label Mauve's guide tree with gene gain and loss along each branch, as in Lefebure and Stanhope, Figure 4.
  7. Investigate unique (unalignable) regions of each species for evidence of lateral transfer (by BLAST to sequenced genomes, by unusual G+C content, by unusual codon usage, and by presence of sequence uptake signals as in O'Sullivan et al., page 15.
  8. Classify the functions of core and dispensable genes, as in Tettelin et al.
  9. Plot trends in core and dispensable gene counts, as in Tettelin et al.
  10. Investigate phylogenetic footprints in noncoding regions in an attempt to identify functional regulatory elements.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]