This post presents exercises using the
MCMCglmm package in order to compute parameter estimates in a Bayesian fashion, relying on Mark Chain Monte Carlo (MCMC) methods. The
MCMCglmm package typically deals with Generalized Linear (Mixed) Models (GLMM).
This package mainly uses Gibbs Sampling to update the parameters, but also the Metropolis-Hastings. Course notes are available as a vignette for this package. An overview is also available.
After the first introduction of Bayesian inference with (MCMC) techniques in the first sets of exercises(here and here), you can find exercises related to these two popular MCMC algorithms that are used in
MCMCglmm : Gibbs sampler and the Metropolis algorithm.
Load the data “Traffic” from the package “MASS.” (HINT : use the
Show a quick summary of the structure of this data-set; ex. the number of observations and columns, type of variables, etc.
Explain the variable “y” (which is the number of citizens injured in road accidents) with a generalized linear (frequentest) model, taking into account, the limit, the year and the day.
(HINT : since “y” is a count variable, take the Poisson family)
Show the summary of the fitted model. How many parameters are significant at the 5% level?
Now, create the same model in a Bayesian setting by using the
MCMCglmm framework. Note that the prior distribution is an Inverse-Wishart in
a. Set weak prior parameters (HINT: check the information given with the help of
?MCMCglmm and have a look at the parameter
prior. Set only the R matrix, since we don’t have random effects.)
b. Run the model with 10,000 iterations.
Plot the posterior chains by using the plotting method of
MCMCglmm; ex. plot the individual trace-plots of the variable, year and day (only), together with the density plot.
What can you say about the convergence of these chains and the Bayesian point estimates of these parameters?
Then, plot the variance-co-variance posterior chain of the residual error.
Compute the posterior mean residual for a particular data point: for the day “85.”
Now, run the same model, but with “year” taken as a random effect and plot the posterior chains. Take a burn-in of 1000.
What can you observe?
Check the convergence of the model with:
– The Raftery-Lewis diagnostic
– The Geweke diagnostic
– (BONUS)The Gelman-Rubin diagnostic. (HINT: You can use
mclapply from the parallel package to run it faster for the chains with different starting values.)
Make a prediction from this model for the year 1961 and year 1962 separately.
Plot these predictions together with density plots using the
tidyverse) package. Plot these two plots side-by-side. (HINT: use the
gridExtra package, for example.)
Compute the posterior estimates for years 1961 and 1962.
(HINT: take the mode of the posterior distribution.)