Here's the deal: I used PyMC3, matplotlib, and Jake Vanderplas' JSAnimation to create javascript animations of three MCMC sampling algorithms -- Metropolis-Hastings, slice sampling and NUTS.

Note that this post uses the html5 video tag which doesn't seem to work for Firefox but should work on any other browser.

I like visualizations because they provide a good intuition for how the samplers work and what problems they can run into.

You can download the full notebook here or view it in your browser. Note that for this post I used video embedding due to the size of the animations if they are not compressed. The notebook contains code for both.

The model is a simple linear model as explained in my previous blog post on Bayesian GLMs. Essentially,
I generated some data and estimate `intercept`

and `slope`

. In the
lower left corner is the *joint* posterior while the plot above shows
the *trace* of the *marginal* posterior of the `intercept`

while the
right plot shows the trace of the *marginal* posterior of the `slope`

parameter. Each
point represents a sample drawn from the posterior. At 3 quarters of the way I added a thousand samples to show that they all sample from the posterior eventually.

## Metropolis-Hastings

First, lets see how our old-school Metropolis-Hastings (MH)
performs. The code uses matplotlib's handy `FuncAnimation`

(see
here
for a tutorial), my own animation code, and the recently merged iterative sampling function `iter_sample()`

.

```
with pm.Model() as model:
pm.glm.glm('y ~ x', data)
step = pm.Metropolis()
iter_sample = pm.iter_sample(samples+1000, step)
animation.FuncAnimation(fig, animate, init_func=init,
frames=samples, interval=5, blit=True)
```

As you can see, there is quite some correlation between `intercept`

and `slope`

-- if we believe in a higher intercept we must also
believe in a lower slope (which makes geometrical sense if you think
how lines could fit through the point clouds). This often makes it
difficult for the MCMC algorithm to converge (i.e. sample from the
true posterior) as we wittness here.

The reason MH does not do anything at first is that MH proposes huge jumps that are not accepted because they are way outside the posterior. PyMC then tunes the proposal distribution so that smaller jumps are proposed. These smaller jumps however lead to the random-walk behavior you can see which makes sampling inefficient (for a good intuition about this "drunken walk", see here).

## Gibbs sampling

Lets see how Gibbs sampling fares with a slice step method.

```
with pm.Model() as model:
pm.glm.glm('y ~ x', data)
step = pm.Slice()
iter_sample = pm.iter_sample(samples+1000, step)
animation.FuncAnimation(fig, animate, init_func=init,
frames=samples, interval=5, blit=True)
```