UP | HOME

evidence lower bound

1. terms

  • \(p^*\) is the true distribution
  • \(p_{\theta}\) is the parameterized distribution that we want to approximate \(p^*\)
  • \(q_{\phi}\) is the parameterized distribution that we want to approximate \(p_{\theta}\)

2. ELBO is a lower bound on evidence

The ELBO is \(\mathbb{E}_{z \sim q_{\phi} (\cdot \mid x)} \ln \left( \frac{p_{\theta}(x, z)}{q_{\phi}(z \mid x)} \right)\).

This is a lower bound on the evidence \(\ln(p_{\theta}(x))\).

You can see this is a lower bound because: \[\begin{align*} \ln(p_{\theta}(x)) &= \ln \left( \sum_{z} \frac{p_{\theta}(x,z)}{q_{\phi}(z \mid x)} q_{\phi}(z \mid x) \right) \\ &= \ln \left( \mathbb{E}_{z \sim q_{\phi} (\cdot \mid x)} \frac{p_{\theta}(x,z)}{q_{\phi}(z\mid x)} \right)\\ &\geq \mathbb{E}_{z \sim q_{\phi}(\cdot \mid x)} \ln \left( \frac{p_{\theta}(x,z)}{q_{\phi}(z \mid x)} \right) \end{align*}\]

The first line can be seen as Importance Sampling. The last line is a result of Jensen's Inequality and the fact that \(\ln\) is concave.

In fact, the gap between the ELBO and the evidence is exactly the kl divergence between \(q_{\phi}(z \mid x)\) and \(p_{\theta}(z \mid x)\):

\[ D_{KL}(q_{\phi}(\cdot \mid x) \Vert p_{\theta}(\cdot \mid x))) = p_{\theta}(x) - \mathbb{E}_{z \sim q_{\phi}(\cdot \mid x)} \ln\left( \frac{p_{\theta}(z,x)}{q_{\phi}(z \mid x)} \right) \]

Here, we can see that the kl divergence between \(q\) and \(p\) is exactly the difference between the log evidence and the ELBO: \[\begin{align*} \ln p(x) - \mathbb{E}_{z \sim q}\ln \left[\frac{p(z,x)}{q(z)}\right] &= \mathbb{E}_{z \sim q} \ln p(x) - \mathbb{E}_{z \sim q}\ln \left[\frac{p(z,x)}{q(z)}\right] \\ &= \mathbb{E}_{z \sim q} \ln \frac{p(x)}{p(z,x)} + \mathbb{E}_{z \sim q}\ln q(z) \\ &= \mathbb{E}_{z \sim q} \ln \frac{1}{p(z \mid x)} + \mathbb{E}_{z \sim q}\ln q(z)\\ &= \mathbb{E}_{z \sim q} \ln \frac{q(z)}{p(z \mid x)} \\ &= D_{KL}( q \Vert p) \end{align*}\]

3. see

Created: 2025-11-02 Sun 18:48