Rejection Sampling

Task: draw samples from the random variable \(X\) with probability density function \(f(x)\).

1. Intuition

Imagine that we have a graph of \(f(x)\) and a rectangle enclosing this graph. Throw darts at the rectangle and keep the \(x\) -coordinates of the darts that land under \(f(x)\). The \(x\) -coordinates will be distributed according to \(X\). Intuitively, if the darts are distributed across the rectangle, then more darts will fall under the region of the graph where there is more probability density.

2. Method

We will replace the "rectangle" in the above intuition with a more general proposal distribution. We have a proposal distribution \(Y\) with probability density function \(g(x)\) such that \(Mg(x) \geq f(x)\) for every \(x\) where \(f(x)\geq 0\) for some constant \(M\). In other words, \(g(x)\) scaled by a constant is at least as "tall" as \(f(x)\) on the support of \(X\). Then, the method is:

Draw a sample \(x\) from the proposal distribution
Sample \(u\) uniformly random from \([0, Mg(x)]\) and accept if \(u\leq f(x)\). In other words, accept with a probability of \(\frac{f(x)}{Mg(x)}\)

Why does this work? First, consider the samples that (1) have their \(x\) coordinate drawn from \(g(x)\) and (2) then have their \(y\) coordinate drawn uniformly at random from \([0,Mg(x)]\). Notice that these samples are uniformly distributed on the region of the graph under \(Mg(x)\). Recall that \(Mg(x)\) completely covers the height of \(f(x)\) on the support of \(f(x)\). In step (2), we only keep the points on the region of the graph under \(f(x)\). Then, looking marginally at the points as they fall on the \(x\) -axis, we will see that our points distributed according to \(X\).

In practice, the proposal distribution is something that we can easily sample from, perhaps by Inverse transform sampling