This post presents the main convergence diagnostics of Markov chains for Bayesian inference.
We have seen a first introduction of Bayesian inference with Markov Chain Monte Carlo (MCMC) techniques in previous posts (here and here). Exercises related to the two main MCMC algorithms used to do Bayesian inference have been presented : Gibbs sampler and the Metropolis algorithm.
But, it is important to have confidence that the posterior chains provided by these techniques are reliable enough to provide accurate inference. We have to verify that the generated chains have indeed reached their (posterior) stationary distribution. It is often done informally by inspecting trace plots.
More formal diagnostic tools have been proposed in the literature. None of them can prove convergence. However, their combined use can increase our confidence that convergence indeed happened.
We will still rely on the Extreme Value Theory (EVT) by using the Gumbel distribution, which is a distribution with only 2 parameters: location and scale. MCMC is necessary here since we cannot find the posterior distribution analytically.
We will use the package coda
, together with packages evd
(to generate sample from Gumbel distribution), reshape2
and tidyverse
.
The solutions of this set of exercises can be found here.
Another post will be devoted to another newer package and prettier for visualization, the bayesplot
package.
Exercise 1
Consider a Gumbel distribution with location and scale parameters set to 0 and 0.5 (resp.) and generate a random sample of size 1,000. Plot the kernel density of this sample, together with the theoretical density.
(Take the seed
123 to obtain the same results as in the solutions.)
Exercise 2
Generate random samples from the posterior distribution by Gibbs Sampling using the function created the 7th exercise’s solutions of the previous exercise set.
Chains must be long enough to eliminate the influence of initial conditions and to make sure that one is sampling from the target stationary distribution.
– Take 10,000 iterations.
– Take the mean and standard deviation of the sample as starting values for the location (mu) and scale (sigma) parameters.
– Tune the proposal standard deviations to reach around 40% for both parameters. (HINT : look at exercise 8 of the Gibbs sampling exercise set and solutions)
– Plot the generated chains.
Exercise 3
Starting values should not have influence on the convergence of the sampler. Run the same algorithm with different starting values and check whether the chains mixing is good.
Exercise 4
the iterates within a chain can be serially correlated. This will induce a low mixing rate and cast doubt on convergence. Check the autocorrelation of the chains of Exercise 2.
(HINT : you may use a function from coda
package)
Then, check the cross-correlation between mu and sigma.
What do you conclude ?
- Work through a start to finish analysis process, including data mining
- Know how to compare regression models and the background of a Bayesian classifier
- And much more
Exercise 5
Thin the chains and check that you indeed reduced the autocorrelation.
Note that the problem of thinning is that it also reduces the sample size and hence it could reduce efficiency.
Exercise 6
The Raftery-Lewis diagnostic gives several diagnostics on the generated chains.
Compute it :
1. For the individual chains (with different starting values) of Exercise 3.
2. For the complete chain (stacking chains with different starting values together).
Take quantile 5%, 2% as desired margin of error and 95% as probability of obtaining an estimate in the specified interval.
(Hint : have a look at the documentation of the R function with ?raftery.diag
)
Exercise 7
The geweke diagnostic tests for equality of the mean of the first 10% of the generated chains with the mean computed from the 2nd half of the chain. Do you reject the hypotheses here, and hence what do you conclude about he convergence of the algorithm ?
Exercise 8
The Gelman Plot is useful since it computes the “R hat” statistics which is the factor by which one can expect to reduce the overall variance estimate if it were computed after convergence. Compute this value.
(HINT : do that for every chain that havedifferent starting values and put them in a mcmc.list
. Then, take this object as input of the function from coda
)
What do you see ?
Exercise 9 (Bonus)
Relying on Exercise 8, re-do the Gelman-Rubin plot but using the package ggplot2 .
(Hint : have a look on all the output values : ?gelman.plot and check how to use it and use the reshape2
and tidyverse
packages).
Leave a Reply