# reparameterization trick

## 1. VAE example

Let's say that our loss involves a stochastic term. That is, it involves sampling from a Gaussian distribution, defined by our network. We can't compute a gradient over that sampling step.

In this situation, the computation graph might look like:

f ^ | | z <--- stochastic node ^ / \ / \ sig mu

Then, the computation of \(f\) proceeds as:

- compute \(\mu\) according to the model parameters
- compute \(\sigma\) according to the model parameters
- sample \(z\) from \(\mathcal{N}(\sigma, \mu)\)
- compute \(f(z)\)

Now we are in trouble, because we can't do backpropagation on \(x\) and \(\phi\). With the reparameterization trick, we move the sampling step off the critical path of the back propagation:

f ^ | | z __ ^ __ / | \ / | \ sig mu epsilon <--- stochastic node

Here, \(\epsilon \sim \mathcal{N}(0,1)\)

Then, the procedure for computing the output at \(f\) is:

- compute \(\sigma\) according to the model parameters
- compute \(\mu\) according to the model parameters
- sample \(\epsilon\sim \mathcal{N}(0,1)\)
- compute \(z=\mu + \sigma \odot \epsilon\). Note that this is equivalent to sampling \(z\) from \(\mathcal{N}(\mu, \sigma)\)
- compute \(f(z)\)

Now, the gradient can flow from \(f\) to \(\sigma\) and \(\mu\)