CSE 527

September 29, 2003

Notes by Jeffrey Bigham

 

Why is computational biology an interesting topic?

 

               The amount of data in GenBank (nucleotide repository) is growing exponentially, like MooreÕs Law which predicts a doubling in integrated circuit size every 18 months. This is lots of data that could be put to useful purposes and computational techniques and methods are required to get the most benefit from this data. Even though the human genome has been Òfinished,Ó mining data from all of this information is the real problem and it is just beginning. Computational methods applied to this domain have the potential to revolutionize genomics and medicine.

 

Biology Review

 

-         Genetics, molecular biology, and cell biology have increasingly become information sciences.

-         Genome: hereditary information in cells residing mainly in DNA.

-         Nucleotides: denoted A, C, T, G

-         Humans have 3x10^9 nucleotides.

-         Genetics is the study of heredity

-         Gene is generally taken to mean a part of the genetic code sufficient to define one protein.

-         Mendel studied the transmission of this in pea plants and while he was kind of lucky that the behavior he observed aligned itself with how genetics actually works much of what we know today was started by him and his pea plants.

 

Cells

 

-         First detected by using a regular optical microscope.

-         Protected by a fatty layer known as the plasma membrane.

-         Eukaryotic cells have their genetic material stored inside a separate nucleus in the cell.

-         Prokaryotic cells have no nucleus and are more homogenous.

-         Bacteria are examples of prokaryotes.

-         Protozoa, Animals, and plants are all eukaryotes.

-         The genetic material itself is contained in chromosomes.

-         A diploid cell has a copy of both the paternal and the maternal genetic material and each copy is duplicated upon cell division.

-         A haploid cell has just one copy of both (egg and sperm).

-         Crossover material is exchanged between paternal and maternal copies. Two cell divisions create four haploid cells and then through fertilization sperm and egg cells combine to form a diploid zygote.

 

DNA

 

-         Discovered in 1869.

-         It is the carrier of genetic information.

-         It is formed into a double helix Ð two strands, each formed by a chain of nucleotides.

-         The nucleotide pairs A T and C G are considered complimentary and always bind to one another.

-         Linear ordering of nucleotides contains genetic information.

-         Genes code for proteins.

-         DNA has a 3Õ end and a 5Õ end.

 

DNA Undergoes

 

-         Replication

-         Repair -> this reduces the error rate from 1/1000 nucleotide pairs to 1/1,000,000

-         Rearrangement

-         Recombination

-         This process if catalyzed by enzymes Ð replication requires DNA Polymerase Ð for creation of complementary sequences and DNA Ligase for binding together short sequences.

 

Protein

 

-         Chain of amino acids of 20 types.

-         Function determined by 3D structure into which protein folds.

-         Proteins make up cellular structure.

-         Enzymes catalyze reactions.

-         Transcription Factors regulate production of other proteins.

-         Receptors for hormones and other enabling nucleotides.

 

Gene Expression Ð The ÒCentral DogmaÓ

 

DNA -> RNA -> Protein

 

The DNA is encoded into messenger RNA which then migrates to the ribosomes which read the RNA and makes proteins.

 

The process of going form DNA to mRNA is called transcription and the process from mRNA to protein is called translation. Going from mRNA to DNA is called reverse transcription and is what retro-viruses do in order to get their genetic material incorporated into that of the cell.

 

Many functionally different types of RNA exist including mRNA, tRNA, and rRNA.

 

The process of translation which is initiated at the DNA is begun at a site on the DNA called the promoter (a particular sequence of bases) which is near the 5Õ end. AÕs become UÕs, TÕs become AÕs, CÕs become GÕs, and GÕs become CÕs in the mRNA. The U is another nucleotide that basically holds the same position that a T usually would.

 

Each codon (3 pairs of nucleotides) codes for an amino acid or a special stop/start value. Three pairs and four nucleotides allow for 4^3 = 64 different codons.

 

In Eukaryotes there are three different RNA polymerases. During transcription sequences known as introns are spliced out while exons are spliced together. Complex control of transcription rate by proteins that bind to DNA, sometimes far from the site where the transcription actually starts.