- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In this exercise, we will try to handle the model that has been over-dispersed using the quasi-Poisson model. Over-dispersion simply means that the variance is greater than the mean. It’s important because it leads to inflation in the models and increases the possibility of Type I errors. We will use a data-set on amphibian road kill (Zuur et al., 2009). It has 17 explanatory variables. We’re going to focus on nine of them using the total number of kills (TOT.N) as the response variable.

Please download the data-set here and name it “Road.” Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.

**Exercise 1**

Doing some plotting, we can see decreasing variability of kills with distance.

**Exercise 2**

Run the GLM model with distance as the explanatory variables.

**Exercise 3**

Add more co-variables to the model and see what’s happening by checking the model summary.

**Exercise 4**

Check the co-linearity using VIF’s. Set options in Base R concerning missing values.

**Exercise 5**

Check the summary again and set base R options. See why we do this on the previous related post exercise.

**Exercise 6**

Check for over-dispersion (rule of thumb, value needs to be around 1.) If it is still greater or less than 1, then we need to check diagnostic plots and re-run the GLM with another structure model.

**Exercise 7**

Restructure the model by throwing out the least significant terms and repeat the model until generating fewer significant terms.

**Exercise 8**

Check the diagnostic plots. If there are still some problems, then we might need to use other types of regression, like Negative Binomial regression. We’ll discuss it in the next exercise post.

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Big Data Manipulation in R Exercises
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In the previous exercise on the #REcology series, we learned how to define the impact of one explanatory variable to another response variable. In a real practice, particularly in experimental or observational design, explanatory variables are often found to be more than one. Thus, it needs a new determination to analyze the data-set and generate the correct conclusion. In this exercise, we will try to do an analysis of the co-variance (ANCOVA) method. Covariates here refers to the continuous explanatory variables. It involves a combination of regression and analysis of variance. ANCOVA requires a continuous response variable, at least one continuous explanatory, and at least one explanatory factor variable. Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Data-set on this exercise can be downloaded here.

**Exercise 1**

Load the data-set and required package, `car`

.

**Exercise 2**

Do some plotting; what can be inferred? Create a basic verbal hypothesis.

**Exercise 3**

Create an interaction model based on the basic verbal hypothesis generated on Exercise 2.

**Exercise 4**

Check the interaction between the explanatory variables of the model created using ANOVA. Make sure that the interaction of those two variables is insignificant.

**Exercise 5**

Check the statistic summary of the model. Pay attention to the intercept, slope, and the R square of the model.

**Exercise 6**

Create a linear regression plot and determine the equation based on the statistic summary.

Answers to the exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

To recall our previous exercise, below is the flowchart of group comparison process.

Download dataset required : here

**Exercise 1**

Load required package; `car,ggplot2,dplyr,lattice, alr4`

and check if the dataset is in balance using table and or replication function

**Exercise 2**

Determine the Null hypothesis and create some data visualizations including histogram, boxplot and coplot

**Exercise 3**

Chcek for normality and homogeneity of variance

**Exercise 4**

Check for interaction between explanatory variables using interaction plot and or xyplot

Select the appropriate model ANOVA based on the interaction (nested model)

**Exercise 6**

Select nested model ANOVA based on Food effect

**Exercise 7**

Select nested model ANOVA based on Pen Effect

**Exercise 8**

Comparing those two nested models and generate some conclusions

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Big Data Manipulation in R Exercises
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

In this exercise, we’ll learn how to analyze response and explanatory variables of data that consist of two or more groups. In this exercise, we will explore the application of various models/types of ANOVA. We will focus on two ways: (part 1) and nested ANOVA models (part 2). Repeated measures ANOVA exercises can be found here.

The data-sets will be based on ecology; however, the application may vary. Base knowledge is important to interpret the result and make the right decision under certain circumstances.

Answers to the exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

To make our exercise easy to follow, below is the flowchart of group comparison processes.

**Exercise 1**

Load the required package `car,ggplot2,dplyr,lattice, alr4`

and two different data-sets in the link below.

Dataset 1

**Exercise 2**

Determine the null hypothesis and check if the data-set is in balance condition using the `table`

and or `replication`

function.

**Exercise 3**

Produce descriptive statistic summaries and data visualization; `histogram, boxplot`

and `coplot`

. What can be inferred from the visualization?

**Exercise 4**

Check the normality and heterogeneity of variances using `qqnorm, qqline,shapiro test`

and ` levene test`

. The rule of thumb for normality is that the `qqnorm`

is following the `qqline`

, accompanied by p>0.05 for `shapiro test`

and `levene test`

. We’ll discuss it further on the answer page.

**Exercise 5**

Check for interaction between explanatory variables using the `interaction plot`

and or `xyplot`

and select the appropriate model ANOVA based on the interaction.

**Exercise 6**

Plot the residual vs. fitted for model validation.

**Exercise 7**

Accept or reject the null hypothesis? What is the conclusion?