The Gibbs Sampler

Notes by Nan Li, (annli@cs)

1. Background
1.1 Markov Chain Monte Carlo
The aims of MCMC methods are to solve this problem: to generate samples from a given probability distribution P(x), which is hard to sample directly. It's always assumed that P(x) is evaluated. They involve a Markov process in which a sequence of states is generated, each sample having a probability distribution that depends on the previous value. Given enough time, the Markov chain will converge to a "stationary distribution," where samples are unbiased samples from P(x), effectively independent from the starting state. Although successive samples remain dependent on previous ones, independence is not needed for many purpposes, e.g. calculating the expectation of some random variable. In real applications there is seldom any useful analytical theory for predicting the time required to reach stationarity ("burnin"), nor for determining the sample size needed to adequately represent the target distribution ("mixing"), but the method nevertheless is of great practical importance.

[Computer scientists may recognize similarities to "simulated annealing," which has similar roots.]

1.2 Metropolis-Hastings Algorithm
The Original Metropolis Algorithm was proposed as an algorithm to simulate some physics process. From a given state i of energy E_i, generate a new state j of energy E_j by a small perturbation. If the new proposed state j has smaller energy than the initial state i then make j the new current state, otherwise accept state j with some probability.

Metropolis et al (1953) showed how to construct a Markov chain whose stationary distribution is the target distribution P(x). The method was generalized by Hastings (1970).

The Metropolis-Hasting algorithm uses an auxiliary "proposal" density q(y|x) which depends on the current state x. This proposal density is not necessary to look at all similar to P(x). Usually q(y|x) is chosen so that it is easy to sample from.

A tentative new state y is generated from the proposal q(y|x). To decide whether to accept the new state, compute the "Hastings ratio",

R = q(x|y)P(y)
q(y|x)P(x)

If R &ge 1, then the new state is accepted; otherwise, the new state is accepted with probability R.

1.3 Gibbs Sampling
Gibbs sampling proposed by (Geman and Geman, 1984) and named after J.W. Gibbs a 19th century American physicist and pioneer of thermodynamics, is a special case of Metropolis method, in which the proposal distribution q(y|x) is defined in terms of the conditional distributions of the joint distribution P(x). It is assumed that although P(x) is too complex to draw samples from directly, its conditional distributions are tractable. So the Hastings ratio becomes 1, and the new state is always accepted.

In the general case of a system with K variables, a single iteration involves sampling one parameter at a time:

x₁^(t+1) ~ P(x₁|x₂^(t), x₃^(t), x_K^(t))
x₂^(t+1) ~ P(x₂|x₁^(t+1), x₃^(t), x_K^(t))
x₃^(t+1) ~ P(x₃|x₁^(t+1), x₂^(t+1), ... x_K^(t)), etc

2. Basic Algorithm
The input is a set of K sequences S₁, ..., S_K. The output is a motif of specified width W, with one instance in each sequence. The algorithm runs a two-step iteration:
(1) predictive update step: The weight matrix model of the motif is calculated based on all instances in the sequences except one. One of the K sequences, z is chosen to be excluded out of the calculation. the background frequencies are calculated.
(2) sampling step: every possible segment of width W within z is considered as a possible instance. A weight is assigned to this segment. A random segment then is chosen as the new instance in z, with probability in proportion to its weight.

"The central idea is that the more accurate the pattern description constructed in step 1, the more accurate the determination of its location in step 2, and vice versa."

It's interesting to compare this to MEME. The probabilities calculated in step (1) are the expected values of the hidden variables calculated in the E-step of EM. The methods differ in step (2), where MEME updates its model by averaging over all possible motif instances, whereas the Gibbs sampler updates by simply sampling a single motif instance. The later is simpler and faster, but seems less thorough in exploiting information available at each step.

Comparison of the performance of MEME and Gibbs Sampler was mentioned a little in the following paper:
Pevzner, P.A. and Sze, S.-H. (2000) Combinatorial approaches to finding subtle signals in DNA sequences. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 269-278.

Lawrence et al. do briefly mention comparison to "greedy" type algorithms such as the one mentioned in the last lecture and claim Gibbs sampling generally gives better answers with little if any sacrifice in speed. Again, see Pevzner & Sze for some more discussion on this point, and Tompa's web page for other approaches. (In particular, see Buhler/Tompa for improvements to Pevzner/Sze.)

3. Extensions
Phase shifts To jump out of a local maximum that is a shifted form of the optimal motif, another step is inserted into the algorithm. After every Mth iteration, the current motif found may be shifted left and right by up to a certain number of letters, the shifted motifs are assigned scores, and one of them is chosen with appropriate corresponding scores.

Pattern width To relax the requirement of specifying the expected width of the motif, the algorithm is extended to accept a range of possible widths and select the best result. The criterion applied is the so-called "incomplete-data log-probability ratio". The ratio is devided by the number of free parameters of the motif model (19 for protein, 3 for DNA). The experiments show this criterion produced a statistic useful for choosing width.

Multiple patterns To relax the assumption that the motif is free of gap, the algorithm is extended to seek several motifs simultaneously. Two joint loci information are considered: ordering probability (favoring consistent ordering), and spacing effects (favoring close packing).

References:
Charles E. Lawrence; Stephen F. Altschul; Mark S. Boguski; Jun S. Liu; Andrew F. Neuwald; John C. Wootton, "Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment", Science 1993

Geman & Geman, IEEE PAMI 1984

Hastings, Biometrika, 1970

Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, "Equations of State Calculations by Fast Computing Machines," J. Chem. Phys. 1953