Retro prof in the lab University of Washington Computer Science & Engineering
 CSE 490MT: Postgresql Database
  CSE Home   About Us    Search    Contact Info 

 Project
 CSE 490MT Home
    There is a postgresql database server running on cubist.cs.washington.edu on which I have created a database called 'mercaldi'. (Sorry, I don't have the permissions necessary to give it a more appropriate name!) In this database is most of the data you will need from the NCBI Completed Microbial Genomes website.

Database logistics

You will receive an email when your database account is ready. When you do, you'll want to change your password from the default to something you can remember. You can do this by logging in to the database and running the following command:

alter user [username] with encrypted password '[newpasswd]';

When accessing the database manually after logging in to cubist (as opposed to writing code that will access it) you'll want to add 'pgsql' to your PATH (as well as the manpages if you like). This can be done by running the following commands:

setenv PATH ${PATH}:/usr/local/pgsql/bin
setenv MANPATH /usr/local/pgsql/man

You will need to do this every time you log in to cubist, so if you find yourself doing it frequently, I suggest creating an alias for these commands or putting them in a setup file that you can source each time.

Database content

Each bacterium has a unique identifier referred to as its accession number in the database. Similarly each protein in each bacterium has a pid (protein I.D.) that uniquely identifies it. There are four types of tables:

accession_species:
(This table provides a mapping from accession numbers to species names, and vice versa.)
'accession' varchar(30)
'species' varchar(1024)

accession_genome:
(In this table you can look up any bacterial genome using it's accession number.)
'accession' varchar(30)
'genome' text

[accession]_protein_table:
(There is one of these for each bacterium, containing the same data as the protein tables on the website.)
'loc_start' int
'loc_end' int
'strand' char
'length' int
'pid' varchar(30)
'gene' varchar(30)
'synonym' varchar(30)
'code' char
'cog' varchar(30)
'product' varchar(1024)

[accession]_amino_acid_table:
(There is one amino acid table per bacterium. In it you can look up the amino acid sequence for any protein using its 'pid'.)
'pid' varchar(30)
'amino_acid_string' text

Database documentation

Here are some links you may find useful:


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to tompa]