CSE 527 Lecture 2, Wednesday 10/01/03

Notes by Katarzyna Wilamowska <kasiaw@u>

Talks today
1:30 in HSB K-069
Dr. Phil Green
"Finishing the Gene-ome: Computationally Directed Gene Structure Verification in C. elegans"

3:30 in Hitchcock Hall 132
Dr. Mark Chee
"Accessing Genetic Information: Technology for Large Scale SNP Genotyping"

SNP - single nucleotide polymorphism


Good websites for infomation on computational biology:

Genome sizes

NAME
Genome length
Number of genes
mycoplasma genitalium 580,073 483 smallest bacteria in genome size
E. coli 4,639,221 4,290
saccharomyces cerevisiae 12,495,682 5,726 baker's yeast
caenorhabditis elegans 95.5x106 19,820 little see-through worm
arabidopsis thaliana 115,409,949 25,498
drosophila melanogaster 122,653,977 13,472 fruit fly
humans 3.3x109 ~25,000

Gene Expression

  • proteins do most of the work
  • they are dynamically created/destroyed
  • so are their mRNA blueprints
  • different mRNAs expressed at different times/places
  • knowing mRNA "expression levels" tells alot about the state of the cell
  • Expression Microarrays

  • thousands to hundreds of thousands of spots per square inch
  • each holds millions of copies of a DNA sequence from one gene
  • take MRNA from cells, put it on array
  • see where it sticks - MRNA from gene x should stick to spot x
  • An Expression Array Experiment

    o o o o
     o o o
      cells

      |
      | extract mRNA
      |
     \/
    ~ ~ ~
     ~ ~ ~
    mRNA

      |
      |
      |
     \/
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O
      |
      |
      |             UV light
     \/
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O
    O O O O O O

    UV light shows by color were mRNA sticks

    An example application

  • 72 leukemia patients
          77 ALL
          25 AML
  • 1 chip per patient
  • 7132 human genes per chip
  • Key issue: What's Different?

  • What genes are behaving differently between ALL & AML (or toher disease/normal states)?
  • Potential uses:
          diagnosis
          prognosis
          insight into underlying bilogy/biologies
          treatement
  • A classification problem

  • Given an array from a new patient is it ALL or AML?
  • Many possible approaches (LDA, logistic regression, NN)
  • Problems - noise, dimensions
  • PolyA tail

    on 3'tail of mRNA
  • likely to be recognized by transport machinery from nucleus to rest of cells
  • useful for us to separate mRNA for our uses by making polyT tails
  • Practical Application of Microarrays

  • gene target discovery
  • pharmacology and toxicology
  • diagnostics
  • study gene function and regulation
  • refined categorization of diseases
          e.g. "prostate cancer" is almost certainly not one disease. Are subtypes distinguishable at expression level?
  • Microarray platforms

  • oligonucleotide-based arrays
          25mers spotted on a glass wafer, Affymetrix GeneChip arrays
          custom spotted 50-80mers generated from know sequences
  • cDNA (complimentary DNA) - intially easier and cheaper to do
          inserts from cDNA libraries
          PCR products generated from gene specific or universal primers
  • DNA is more stable and easier to work with in lab. RNA degrades quickly.

    How unique is a 20mer?

    VERY crude model: DNA is random - every position is equally likely to be A, C, G, or T, independent of eachother.
    Then probability of a random 20mer is
    (1/4)20=(1/2)40=((1/2)10)4=(1/1024)4 which is about (10-3)4 = 10 -12
    So a random 20mer occours in random human-sized DNA with the probability equal to 0.003

    How random is a Genome?

  • G/C content can vary from ~40-60% across and within organisims
  • Adjacent pairs are not independant
  • Adjacent triples are not independant
  • ...
  • Many large-scale repeats e.g.
          similar genes, domains within genes
          transpositions and other junk (within primates, ~5% of all DNA is composed of (noisey) copies of a 300bp ALU sequence)