# Reparameterization Trick

Note: The IPython notebook for this post can be seen here.

Here we will try to understand the reparameterization trick used by Kingma and Welling (2014)^{1} to train their variational autoencoder.

Assume we have a normal distribution that is parameterized by , specifically . We want to solve the below problem

This is of course a rather silly problem and the optimal is obvious. But here we just want to understand how the reparameterization trick helps in calculating the gradient of this objective .

The usual way to calculate is as follows

which makes use of . This trick is also the basis of the REINFORCE^{2} algorithm used in policy gradient.

For our example where , this method gives

Reparameterization trick is a way to rewrite the expectation so that the distribution with respect to which we take the gradient is independent of parameter . To achieve this, we need to **make the stochastic element in** **independent of** . Hence, we write as

Then, we can write

where is the distribution of , i.e., . Now we can write the derivative of as follows

Now let us compare the variances of the two methods; we are hoping to see that the first method has high variance while reparameterization trick decreases the variance substantially.

```
4.46239612174
4.1840532024
```

Let us plot the variance for different sample sizes.

```
[ 3.8409546 3.97298803 4.03007634 3.98531095 3.99579423]
[ 3.97775271 4.00232825 3.99894536 4.00353734 3.99995899]
[ 6.45307927e+00 6.80227241e-01 8.69226368e-02 1.00489791e-02
8.62396526e-04]
[ 4.59767676e-01 4.26567475e-02 3.33699503e-03 5.17148975e-04
4.65338152e-05]
```

Variance of the estimates using reparameterization trick is one order of magnitude smaller than the estimates from the first method!