Note that this post uses the html5 video tag which doesn't seem to work for Firefox but should work on any other browser.
I like visualizations because they provide a good intuition for how the samplers work and what problems they can run into.
You can download the full notebook here or view it in your browser. Note that for this post I used video embedding due to the size of the animations if they are not compressed. The notebook contains code for both.
The model is a simple linear model as explained in my previous blog post on Bayesian GLMs. Essentially,
I generated some data and estimate
slope. In the
lower left corner is the joint posterior while the plot above shows
the trace of the marginal posterior of the
intercept while the
right plot shows the trace of the marginal posterior of the
slope parameter. Each
point represents a sample drawn from the posterior. At 3 quarters of the way I added a thousand samples to show that they all sample from the posterior eventually.
First, lets see how our old-school Metropolis-Hastings (MH)
performs. The code uses matplotlib's handy
for a tutorial), my own animation code, and the recently merged iterative sampling function
with pm.Model() as model: pm.glm.glm('y ~ x', data) step = pm.Metropolis() iter_sample = pm.iter_sample(samples+1000, step) animation.FuncAnimation(fig, animate, init_func=init, frames=samples, interval=5, blit=True)
As you can see, there is quite some correlation between
slope -- if we believe in a higher intercept we must also
believe in a lower slope (which makes geometrical sense if you think
how lines could fit through the point clouds). This often makes it
difficult for the MCMC algorithm to converge (i.e. sample from the
true posterior) as we wittness here.
The reason MH does not do anything at first is that MH proposes huge jumps that are not accepted because they are way outside the posterior. PyMC then tunes the proposal distribution so that smaller jumps are proposed. These smaller jumps however lead to the random-walk behavior you can see which makes sampling inefficient (for a good intuition about this "drunken walk", see here).
Lets see how Gibbs sampling fares with a slice step method.
with pm.Model() as model: pm.glm.glm('y ~ x', data) step = pm.Slice() iter_sample = pm.iter_sample(samples+1000, step) animation.FuncAnimation(fig, animate, init_func=init, frames=samples, interval=5, blit=True)