|
![]() |
![]() |
![]() |
![]() |
|
![]() |
Suppose you were interested in finding a list of proteins whose amino
acid sequences were similar to that of the metK gene in the bacterium
Bacillus subtilis. (We will return later to the question of
why you might guess to start at this gene.) Here's
how you might go about it by hand:
We're going to pick some number of the top-scoring genes from this list to form our data set. For each such gene, we will need two things: (1) the amino acid sequence and (2) the DNA sequence upstream of the gene. (Incidentally, you don't want any 2 upstream DNA sequences that you choose to be too similar to each other. Therefore, you should never choose 2 strains of the same species. For example, #2 and #4 on the list are: 1771 8 AAS43815 42739890 S-adenosylmethionine synthetase [Bacillus cereus ATCC 10987] 1768 8 AAP11666 29898393 S-adenosylmethionine synthetase [Bacillus cereus ATCC 14579]These are genes from 2 different strains of Bacillus cereus. Choose one or the other, but not both.) As an example of the required data collection, consider the S-adenosylmethionine synthetase gene in Thermoanaerobacter tengcongensis, which is about #11 in the list above, with a similarity score of 1570 to metK in B. subtilis: 1570 4 AAM23768 20515476 S-adenosylmethionine synthetase [Thermoanaerobacter tengcongensis](To get an idea how similar this means the two proteins are, click on the 1570 score link on the page of proteins similar to metK of B. subtilis.) Here is how you might find the two required items for the case of Thermoanaerobacter tengcongensis. Find the protein table for Thermoanaerobacter tengcongensis, as above. Find the line for S-adenosylmethionine synthetase in this table: 499626..500147 + 174 20806992 TTE0487 Rubrerythrin 500405..501592 + 396 20806993 MetK TTE0488 S-adenosylmethionine synthetaseFrom the line for MetK, note the following:
Here are the sort of compilations of amino acid sequences and upstream DNA sequences you might obtain from this process. These files are in "FASTA format", which you will also need to use. |
![]() |
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to tompa] |