As we’re aware, the growth of data science has been increased recently, and successfully applied on research for decision making or creating baseline conditions. Statistical analysis, including data visualization, exploration, and modeling are three main important elements in data science.
In this exercise, we’ll learn how to analyze response and explanatory variables of data that consist of two or more groups. In this exercise, we will explore the application of various models/types of ANOVA. We will focus on two ways: (part 1) and nested ANOVA models (part 2). Repeated measures ANOVA exercises can be found here.
The data-sets will be based on ecology; however, the application may vary. Base knowledge is important to interpret the result and make the right decision under certain circumstances.
Answers to the exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
To make our exercise easy to follow, below is the flowchart of group comparison processes.
Load the required package
car,ggplot2,dplyr,lattice, alr4 and two different data-sets in the link below.
Determine the null hypothesis and check if the data-set is in balance condition using the
table and or
Produce descriptive statistic summaries and data visualization;
histogram, boxplot and
coplot. What can be inferred from the visualization?
Check the normality and heterogeneity of variances using
qqnorm, qqline,shapiro test and
levene test. The rule of thumb for normality is that the
qqnorm is following the
qqline, accompanied by p>0.05 for
shapiro test and
levene test. We’ll discuss it further on the answer page.
Check for interaction between explanatory variables using the
interaction plot and or
xyplot and select the appropriate model ANOVA based on the interaction.
Plot the residual vs. fitted for model validation.
Accept or reject the null hypothesis? What is the conclusion?