While My MCMC Gently Samples

Bayesian modeling, Computational Psychiatry, and Python

Animating MCMC with PyMC3 and Matplotlib

Here's the deal: I used PyMC3, matplotlib, and Jake Vanderplas' JSAnimation to create javascript animations of three MCMC sampling algorithms -- Metropolis-Hastings, slice sampling and NUTS.

Note that this post uses the html5 video tag which doesn't seem to work for Firefox but should work on any other browser.

I like visualizations because they provide a good intuition for how the samplers work and what problems they can run into.

You can download the full notebook here or view it in your browser. Note that for this post I used video embedding due to the size of the animations if they are not compressed. The notebook contains code for both.

The model is a simple linear model as explained in my previous blog post on Bayesian GLMs. Essentially, I generated some data and estimate intercept and slope. In the lower left corner is the joint posterior while the plot above shows the trace of the marginal posterior of the intercept while the right plot shows the trace of the marginal posterior of the slope parameter. Each point represents a sample drawn from the posterior. At 3 quarters of the way I added a thousand samples to show that they all sample from the posterior eventually.

Metropolis-Hastings

First, lets see how our old-school Metropolis-Hastings (MH) performs. The code uses matplotlib's handy FuncAnimation (see here for a tutorial), my own animation code, and the recently merged iterative sampling function iter_sample().

In [23]:
with pm.Model() as model:
    pm.glm.glm('y ~ x', data)
    step = pm.Metropolis()
    iter_sample = pm.iter_sample(samples+1000, step)

animation.FuncAnimation(fig, animate, init_func=init,
                        frames=samples, interval=5, blit=True)
Out[23]:

As you can see, there is quite some correlation between intercept and slope -- if we believe in a higher intercept we must also believe in a lower slope (which makes geometrical sense if you think how lines could fit through the point clouds). This often makes it difficult for the MCMC algorithm to converge (i.e. sample from the true posterior) as we wittness here.

The reason MH does not do anything at first is that MH proposes huge jumps that are not accepted because they are way outside the posterior. PyMC then tunes the proposal distribution so that smaller jumps are proposed. These smaller jumps however lead to the random-walk behavior you can see which makes sampling inefficient (for a good intuition about this "drunken walk", see here).

Gibbs sampling

Lets see how Gibbs sampling fares with a slice step method.

In [26]:
with pm.Model() as model:
    pm.glm.glm('y ~ x', data)
    step = pm.Slice()
    iter_sample = pm.iter_sample(samples+1000, step)

animation.FuncAnimation(fig, animate, init_func=init,
                        frames=samples, interval=5, blit=True)
Out[26]: