In this exercise, we will try to handle the model that has been over-dispersed using the quasi-Poisson model. Over-dispersion simply means that the variance is greater than the mean. It’s important because it leads to inflation in the models and increases the possibility of Type I errors. We will use a data-set on amphibian road kill (Zuur et al., 2009). It has 17 explanatory variables. We’re going to focus on nine of them using the total number of kills (TOT.N) as the response variable.
Please download the data-set here and name it “Road.” Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.
Doing some plotting, we can see decreasing variability of kills with distance.
Run the GLM model with distance as the explanatory variables.
Add more co-variables to the model and see what’s happening by checking the model summary.
Check the co-linearity using VIF’s. Set options in Base R concerning missing values.
Check the summary again and set base R options. See why we do this on the previous related post exercise.
Check for over-dispersion (rule of thumb, value needs to be around 1.) If it is still greater or less than 1, then we need to check diagnostic plots and re-run the GLM with another structure model.
Restructure the model by throwing out the least significant terms and repeat the model until generating fewer significant terms.
Check the diagnostic plots. If there are still some problems, then we might need to use other types of regression, like Negative Binomial regression. We’ll discuss it in the next exercise post.