Metropolis-Hastings

1. overview

A Markov Chain Monte Carlo method for sampling from a distribution \(p(x)=\frac{1}{Z}f(x)\), when we only have access to the unnormalized density \(f(x)\). We might want to do this when our distribution is defined by some clique potentials in a graphical model. Then, it's very easy to get \(f(x)\), but computationally hard to get the partition function \(Z\), which can be used to normalize \(f(x)\).

1.1. when do we want to use this?

For example, when we want to know the posterior \(p(\theta \mid X=x)\), which is proportional to \(p(\theta, X=x)\). It just matters that is proportional. We might not know, or don't want to compute, the normalizing constant. But if we have a formula for \(p(\theta, X=x)\) that we can sample from, we can use MCMC to sample from \(p(\theta\mid X=x)\).

2. Basic idea

Construct a Markov process with transition probabilities \(P\), where the stationary distribution \(\pi(x)\) is the distribution we want to sample from.

2.1. stationary distribution

A Markov process has a unique stationary disribution if it is ergodic. That is, it is aperiodic and irreducible.

If we can show that \(\pi(x)\) satisfies detailed balance, this is sufficient to show that it is a stationary distribution.

Detailed balance means that for all states \(x,y\): \[ \pi(x)P(x\mid y) = P(y\mid x)\pi(y) \] In other words, the probability flowing into \(y\) from \(x\) is the same as the probability flowing from \(y\) to \(x\). Note that this means that after one application of \(P\), for \(P\pi(x)\), all the probabilities will be unchanged.

3. Algorithm and derivation

3.1. Given

There is some distribution \(p(x) = Cf(x)\) that we want to sample from, but we don't know the normalizing constant \(C\), so we only have access to \(f(x)\)
We have some symmetric proposal distribution \(q(x'\mid x)\) s.t. \(q(x' \mid x) = q(x \mid x')\)

3.2. Want to show

We want to construct a Markov process with transition probabilities \(P(x\mid x')\) such that the unique stationary distribution of this process is \(p(x)\)

3.3. Derivation

We write the probability of transition from \(x\) to \(x'\) as the product of our proposal distribution and the probability that we accept our proposal distribution: \[ P(x'\mid x) = q(x' \mid x) A(x', x) \]

We use the following for our acceptance probability: \[ A(x', x) = \min\left(1, \frac{p(x') q(x \mid x') }{ p(x) q(x' \mid x)}\right) \]

Now, let's see if \(P(x' \mid x)\) fulfills detailed balance for \(p(x)\):

\[ p(x)q(x'\mid x)A(x',x) = p(x')q(x \mid x')A(x,x') \]

Without loss of generality, assume that \(A(x',x) = \frac{p(x')}{p(x)}\) (remember that \(q(x\mid x') = q(x'\mid x)\) and \(A(x,x') = 1\).

Then, we see that detailed balance is satisfied for the above and we are done!

4. useful links

gibbs sampling