Basic Generalized Additive Models In Ecology: Exercises

 

Generalized Additive Models (GAM) are non-parametric models that add smoother to the data. In this exercise, we will look at GAMs using cubic spline using the mgcv package. Data-sets used can be downloaded here. The data-set is the experiment result of grassland richness over time in the Yellowstone National Park (Skkink et al. 2007).
Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.

Exercise 1
Observe the data-set and try to classify the response and explanatory variables. We will focus on ROCK as an explanatory variable.

Exercise 2
Do some scatter-plots.

Exercise 3
Since it is not linear, try to do GAM with ROCK variables.

Exercise 4
Check the result. What can be inferred?

Exercise 5
Do some validation plots.

Exercise 6
Plot the base graph.

Exercise 7
Add “predict” across the data and add some lines.

Exercise 8
Plot the fitted values.

Why do we only use ROCK variables? It is proven to give the most fitted data without incorporation of all the explanatory variables. Try to play around with other explanatory variables to see the difference.




Basic Generalized Additive Models In Ecology: Solutions

Below are the solutions to these exercises on “GAMs – Exercises.”

if (!require(mgcv)){install.packages(mgcv, dep=T)}
library(mgcv)
if (!require(car)){install.packages(car, dep=T)}
library(car)
###############
#             #
# Exercise 1  #
#             #
###############
Veg <- read.csv(file.choose())
str(Veg)
## 'data.frame':	58 obs. of  8 variables:
##  $ SR      : int  8 6 8 8 10 7 6 5 8 6 ...
##  $ ROCK    : num  27 26 30 18 23 26 39 25 24 21 ...
##  $ LITTER  : num  30 20 24 35 22 26 19 26 24 16 ...
##  $ BARESOIL: num  26 28 30 16 9 23 19 33 29 41 ...
##  $ SprTmax : num  15.8 15.2 12.8 14 14.3 ...
##  $ Time    : int  1958 1962 1967 1974 1981 1994 2002 1958 1962 1967 ...
##  $ FallPrec: num  30.2 99.6 43.4 54.9 24.4 ...
##  $ Transect: int  1 1 1 1 1 1 1 2 2 2 ...
head(Veg)
##   SR ROCK LITTER BARESOIL SprTmax Time FallPrec Transect
## 1  8   27     30       26   15.77 1958    30.22        1
## 2  6   26     20       28   15.21 1962    99.56        1
## 3  8   30     24       30   12.76 1967    43.43        1
## 4  8   18     35       16   14.00 1974    54.86        1
## 5 10   23     22        9   14.33 1981    24.38        1
## 6  7   26     26       23   16.91 1994    10.16        1
# Response variable = SR or species richness of plants per transect
# Explanatory - Rock content (ROCK)
# Explanatory - baresoil (BARESOIL)
# Explanatory - Litter (LITTER)
# Explanatory - ppt in Autumn (FallPrec)
# Explanatory - Max Spring temperature (SprTmax)
# Explanatory - Year of transect (Time)
# Explanatory - Transect number (Transect)

###############
#             #
# Exercise 2  #
#             #
###############
scatterplot(SR ~ ROCK, data = Veg)

###############
#             #
# Exercise 3  #
#             #
###############
Veg.gam1 <- gam(SR ~ s(ROCK), data = Veg)
plot(Veg.gam1)

###############
#             #
# Exercise 4  #
#             #
###############
summary(Veg.gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## SR ~ s(ROCK)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.9655     0.3757   26.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value  
## s(ROCK) 3.778  4.664 2.285  0.0768 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.126   Deviance explained = 18.4%
## GCV = 8.9217  Scale est. = 8.1867    n = 58
###############
#             #
# Exercise 5  #
#             #
###############
gam.check(Veg.gam1)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 5 iterations.
## The RMS GCV score gradient at convergence was 0.0001067967 .
## The Hessian was positive definite.
## Model rank =  10 / 10 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##           k'  edf k-index p-value
## s(ROCK) 9.00 3.78    1.01    0.47
###############
#             #
# Exercise 6  #
#             #
###############
# plot base graph (with CIs)
plot(SR ~ ROCK, data = Veg,
     pch = 16,
     xlab = "% ROCK in substrate",
     ylab = "Species Richness")

###############
#             #
# Exercise 7  #
#             #
###############
# predict across the data
x <- seq(min(Veg$ROCK), max(Veg$ROCK), l=100) # 100 steps....
y <- predict(Veg.gam1, data.frame(ROCK = x), se = TRUE)  # using standard errors se = TRUE
# add lines

###############
#             #
# Exercise 7  #
#             #
###############
lines(x, y$fit)   # plots the fitted values
lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit
lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit