Lecture 2


October 4, 2001
Lecturer: Larry Ruzzo
Notes: Peter Mork

Microarrays are devices intended to measure gene activity.  Its name originates from its structure; it is a rectangular chip on which is imposed a grid of DNA spots.  These spots form a two dimensional array.  Each spot in the array contains millions of copies of some DNA strand, bonded to the chip.

To understand how microarrays capture gene activity, recall the process by which genes are converted to proteins.  Double-stranded DNA exists in the nucleus.  The DNA is partially separated and unwound.  One of the exposed strands is then transcribed into messenger RNA (mRNA).  (The transcription process involves a number of steps including splicing and capping, see the previous lecture’s notes for more detail.)

The mRNA is transported to a ribosome, where it is translated into a sequence of amino acids (i.e., a protein).  The translation process converts codons (3-base sequences) into amino acids.  Of the 64 (43) possible codons, 60 code exclusively for amino acids, 1 codes for methionine as well as indicating the start of the sequence, the remaining 3 codons indicate a stop.  Translation begins at the start codon (AUG) and continues until a stop codon is reached.

The translation process relies on transfer RNA (tRNA).  Each codon is associated with a specific tRNA molecule (an anti-codon).  A tRNA molecule exhibits a cloverleaf shape.  One leaf binds to the codon, another to the corresponding amino acid.  Using tRNA to perform the conversion, translation ratchets along from start-codon to stop-.

Since mRNA is a necessary intermediary in the expression of a gene, it is sufficient to measure the presence of mRNA.  Part of the transcription process includes adding a poly-A tail to the mRNA.  This allows a poly-T sequence bonded to glass to capture the mRNA.  After everything else is washed away, the mRNA is released from the glass (often via heating since hydrogen bonds that bind base pairs are weakened as the temperature increases) and converted into complementary DNA (cDNA).  The cDNA usually incorporates fluorescent components allowing its presence to be detected.  The cDNA is washed over the microarray as it cools, thereby bonding the cDNA to the DNA on the microarray.

It should be noted that this process requires thousands of copies of mRNA.  Thus, thousands of cells may be needed to obtain a signal.  Alternatively, the mRNA in a single cell can be amplified using polymerase chain reaction (PCR).


There are several reasons for performing microarray analyses including determining the role a gene plays in a pathway or disease, diagnostics and pharmacology:

Researchers may be attempting to discover genes involved in some disease.  Two sample are generated, one normal and one diseased.  The differences in gene expression between the samples indicate which genes are over- or under-expressed in the diseased cells.  Over- or under-expression suggests possible involvement in the disease.

The function or regulation of a gene can also be studied.  Two genes whose expressions are correlated are probably related.  They may either be co-regulated or share a common pathway.  Further knockout analyses (in which a small piece of a gene is removed) can be performed to elucidate genetic regulation of genes.

Diagnostic experiments are possible once a gene’s role has been identified.  A small sample can be extracted from a patient and the presence (or absence) of the disease gene can be confirmed.  In the abstract, this method allows for the classification of disease (phenotype) based on actual genotype.

Finally, pharmacological experiments can be run.  The efficacy of a drug in triggering gene expression can be measured (i.e., does the drug sufficiently increase gene expression).  Alternatively, the toxicity of a drug can similarly be measured (i.e., does the drug impair important gene expression).


There are two main platforms for performing microarray analyses.  The chip can use cDNA or oligonucleotide sequences.

The use of cDNA requires knowledge of the complete genetic sequence.  This can be obtained from a library of cDNA or, in the case of some species, from the complete genetic sequence for that organism.  These analyses are complicated by the fact that a perfect sequence match may not be necessary for bonding to the chip.

Oligonucleotide sequences are a more common alternative.  Affymetrix relies on 25-mers (sequences of 25 bases).  Custom chips can be constructed using 50- to 80-mers.  The selection of good 25-mers is a complicated task.  Several 25-mers are chosen from each gene of interest.  Given the inexact nature of bonding several factors must be considered.  Since G/C and A/T bond with different strengths, all of the 25-mers should have a consistent mix of these groups.  It is important to select stable regions of the gene; sites with many mutations/variants must be avoided (unless these differences are relevant to the experiment).  The 25-mers need to select a unique gene (i.e., the sequence must not appear elsewhere in the genome).

Several factors complicate the selection of 25-mers.  The proportion of G/C to A/T varies both across organisms as well as within; the G/C content can range from ~40-60%.  Adjacent pairs are not independently distributed; adjacent triples are not independently distributed; etc.  There are large-scale repeats since genes with related function may share sub-modules with a specific sub-function.  There are (seemingly functionless) repeats of some sequences (e.g., the Alu sequence).  All of these complications have to be taken into account when identifying unique 25-mers.

Probe sets are used to screen out background noise when performing oligonucleotide analyses.  One band of spots is bonded to the correct 25-mers.  An adjacent band of spots (often below the perfect match) is bonded to the same 25-mers with a single mismatch.  If a gene is present, there should be a strong signal in the perfect match band and a weak (or no) signal in the mismatch band.