In the last post, we saw that the Metropolis sampler can be used in order to generate a random sample from a posterior distribution that cannot be found analytically.
Following the same idea, Gibbs sampling is a popular Markov Chain Monte Carlo (MCMC) technique that is more efficient, in general, since the updates of the parameters are now made one at a time, instead of simultaneously in the Metropolis.
We will use the same methodology as for the Metropolis exercises.
We will use concepts of Extreme Value Theory (EVT) with the Gumbel distribution, which is a distribution with only 2 parameters: location and scale.
Recall that the goal here is not to study EVT, but just use it for Bayesian inference. We will use the evd
package with the dgev()
family of functions (rgev()
, pgev()
, as well as the ggplot2
and gridExtra
packages for visualization. When the shape parameter is set to “0” for the GEV distribution, you retrieve the Gumbel distribution.
The solutions for these set of exercises can be found here.
Exercise 1
Consider a Gumbel distribution with location and scale parameters set to 0 and 0.5 (resp.) and generate a random sample of size 1,000. Plot the kernel density of this sample, together with the theoretical density.
(Take the seed
1234 to obtain the same results as in the solutions.)
Exercise 2
Relying on the dgev()
function from the evd
package, write the log-likelihood function of the Gumbel distribution. Then, compute the log-likelihood value for the sample from Exercise 1.
(Note : Working with log-values will simplify the computations.)
Exercise 3
Consider a non-informative prior, in terms of a large variance normal prior (sd = 50) for each of the two parameters individually centered on the true parameter value.
Using the likelihood function from Exercises 2, write a function that computes the log-posterior.
(HINT : if you work with log values, you can simply sum the terms.)
Then, compute the log-posterior value of the generated sample.
Exercise 4
Now you have all the tools to build the Gibbs sampler.
a. Take the mean and the SD of the generated sample of Exercise 1 as starting values for the location and the scale parameters (resp.) and take 1,000 iterations.
b. Initialize the output data-frame (or matrix) which will contain the generated chains for the parameters.
c. Run the algorithm by taking normal distributions for proposals (these are symmetric distributions) with 0.05 and 0.01 as the respective standard deviations.
Exercise 5
a. Check visually that the chains of the Gibbs sampler converge. Do they seem stationary?
b. Then, check that the chains are indeed updating one at a time while they are updated simultaneously in the Metropolis algorithm.
HINT: Recall the function for the Metropolis algorithm created in the previous set of exercises.
c. What do you observe with the starting values used to generate these chains?
- Work through a start to finish analysis process, including data mining
- Know how to compare regression models and the background of a Bayesian classifier
- And much more
Exercise 6
The acceptance rate of the Gibbs sampler (ex. the proportion of times that a proposal is accepted in the algorithm) is crucial for the efficiency of the algorithm. Complete the previous code in order to retrieve the acceptance rate of this Gibbs sampler.
Exercise 7
Gather everything you did earlier into one function.
BONUS: add a parameter for the burn-in inside the function.
Exercise 8
It is recommended to target acceptance rates of around 40% for the Gibbs sampler since components are updated one at a time.
Use the function created above to find better values for the proposal standard deviations in order to reach these target acceptance rates.
Exercise 9
Provide a nice visualization (using ggplot2
) of the chains that also shows the Bayesian posterior estimates of the parameters (Take the median.)
Check whether increasing the number of iterations or adding a burn-in period could improve the results.
This is great! I now understand how the Gibbs sampler works. Other explanations I came across were too mathematical in nature, whereas this programmatic explanation is perfect for an R programmer. I do think it would be helpful to explain why 0.05 and 0.01 standard deviations were chosen as random proposal moves. When I build my own Gibbs sampler I’m not sure how I’ll choose these hyperparameters. Trial and error?
Thanks,
B