It is pretty rare to find something that represents linearity in the environmental system. The Y/X response may not be a straight line, but humped, asymptotic, sigmoidal or polynomial are possibly, truly non-linear. In this exercise, we will try to take a closer look at how polynomial regression works and practice with a study case. There are three types of common patterns of data exploration, including concave (power and exponential), S-shaped (sigmoidal and logistic), and Peaks and valleys (polynomials.) There are others patterns, but at this time, we will stick to those three. Polynomials are incorporation’s of predictor variables where the variable is represented by multiple instances of itself in successively higher orders.
Here, we use ecological data (Peake and Quinn, 1993) to investigate the abundance effects for invertebrates living in mussel beds in intertidal areas. Possible variable configuration:
Response variable = number of invertebrates (INDIV)
Explanatory variable = the area of each clump (AREA)
Additional possible response variables = Species richness of invertebrates (SPECIES)
Download the data-set here.
Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Load the data-set and try to look at its structure, particularly the normality. What’s the best guess based on the scatter-plot?
Assess its linearity using the
Add in polynomial terms for the distance variable up to the 3rd order.
Validate the model for each order of polynomial models.
Create the predictive model and generate the regression equation. Which one is the best model?