Importance Sampling

1. Problem statement:

We want to draw samples from \(p(x)\), but this is hard.
We have the ability to query \(p(x)\), for any \(x\)
We have the ability to sample from \(q(x)\)

2. Solution outline:

Sample from \(q(x)\) and weight the samples by how important they are \(p(x)\)

3. Solution:

Sample \(q(x)\) \(n\) times
Assign each sample the weight \(p(x)/q(x)\)
Normalize weights across samples to sum to 1. Take from these samples (with replacement) according to their weights

4. Comments:

Consider a small slice \(\mathbf{x} = [a, a + da]\). About \(q(\mathbf{x})\) samples in step 2 will have value in \(\mathbf{x}\). If, in step 3, we took uniformly at random from the step \(2\) samples, we would have \(q(\mathbf{x})\) samples with value \(\mathbf{x}\). Instead, we should weight the value \(\mathbf{x}\) by \(p(\mathbf{x})/q(\mathbf{x})\). Then, the weighted proportion of samples with value \(\mathbf{x}\) is: \[ \frac{\text{# samples} = \mathbf{x}}{n} \times \frac{p(\mathbf{x})}{q(\mathbf{x})} = q(\mathbf{x}) \times \frac{p(\mathbf{x})}{q(\mathbf{x})} = p(\mathbf{x}) \]

Note that this is also useful when \(p(x)\) is very sparse in an area. Then, sampling directly from \(p(x)\) might result in 0 samples for a sparse region. So instead, we should sample from \(q(x)\) and re-weight.

5. computing expectation

For a random variable \(X\) with pdf \(p(x)\), we can compute an expectation over \(p(x)\) by sampling over \(q(x)\) and weighting by \(p(x)\). This is written: \[ \mathbb{E}[X] = \mathbb{E}_{X\sim q} \left[X \frac{p(X)}{q(X)}\right] \]

5.1. example

Say that we are trying to estimate the expectation of \(f(X)\), where \(X\) is a discrete r.v. with pdf \(p(x)\). Then, we draw samples from the proposal distribution \(q(x)\). \[ \hat{\mathbb{E}}[X] = \frac{1}{n}\sum_{i=1}^{n} \frac{p(x_i)}{q(x_i)} f(x_i) \approx \sum_{x\in X} \frac{p(x)}{q(x)} f(x) q(x) \] where \(\frac{p(x)}{q(x)}\) is called the likelihood ratio and the approximation holds when the number of samples \(n\) is large so that \(\frac{|\{x_i = x\}|}{n} \approx q(x)\)

5.2. where do we want to use this?

In fact, the case is usually that we want to choose our \(q(x)\) so that more of the probability weight of the samples covers the places where \(p\) is high. See this example from ray tracing