This post presents exercises using the MCMCglmm
package in order to compute parameter estimates in a Bayesian fashion, relying on Mark Chain Monte Carlo (MCMC) methods. The MCMCglmm
package typically deals with Generalized Linear (Mixed) Models (GLMM).
This package mainly uses Gibbs Sampling to update the parameters, but also the Metropolis-Hastings. Course notes are available as a vignette for this package. An overview is also available.
Backgrounds
After the first introduction of Bayesian inference with (MCMC) techniques in the first sets of exercises(here and here), you can find exercises related to these two popular MCMC algorithms that are used in MCMCglmm
: Gibbs sampler and the Metropolis algorithm.
We will use the packages MCMCglmm
and coda
together with packages evd
(to generate sample from Gumbel distribution), reshape2
, and tidyverse
.
Solutions for this set of exercises can be found here.
Exercise 1
Load the data “Traffic” from the package “MASS.” (HINT : use the data()
function)
Show a quick summary of the structure of this data-set; ex. the number of observations and columns, type of variables, etc.
Exercise 2
Explain the variable “y” (which is the number of citizens injured in road accidents) with a generalized linear (frequentest) model, taking into account, the limit, the year and the day.
(HINT : since “y” is a count variable, take the Poisson family)
Show the summary of the fitted model. How many parameters are significant at the 5% level?
Exercise 3
Now, create the same model in a Bayesian setting by using the MCMCglmm
framework. Note that the prior distribution is an Inverse-Wishart in MCMCglmm
:
a. Set weak prior parameters (HINT: check the information given with the help of ?MCMCglmm
and have a look at the parameter prior
. Set only the R matrix, since we don’t have random effects.)
b. Run the model with 10,000 iterations.
- Work through a start to finish analysis process, including data mining
- Know how to compare regression models and the background of a Bayesian classifier
- And much more
Exercise 4
Plot the posterior chains by using the plotting method of MCMCglmm
; ex. plot the individual trace-plots of the variable, year and day (only), together with the density plot.
What can you say about the convergence of these chains and the Bayesian point estimates of these parameters?
Then, plot the variance-co-variance posterior chain of the residual error.
Exercise 5
Compute the posterior mean residual for a particular data point: for the day “85.”
Exercise 6
Now, run the same model, but with “year” taken as a random effect and plot the posterior chains. Take a burn-in of 1000.
What can you observe?
Exercise 7
Check the convergence of the model with:
– The Raftery-Lewis diagnostic
– The Geweke diagnostic
– (BONUS)The Gelman-Rubin diagnostic. (HINT: You can use mclapply
from the parallel package to run it faster for the chains with different starting values.)
Exercise 8
Make a prediction from this model for the year 1961 and year 1962 separately.
Plot these predictions together with density plots using the ggplot2
(or tidyverse
) package. Plot these two plots side-by-side. (HINT: use the gridExtra
package, for example.)
Exercise 9
Compute the posterior estimates for years 1961 and 1962.
(HINT: take the mode of the posterior distribution.)
Leave a Reply