These are the solutions to the exercises One Way Analysis of Variance here

```
####################
# #
# Exercise 1 #
# #
####################
#Read in the cancer survival.csv data
setwd("H:/datasets")
cancer.survival = read.csv("cancer survival.csv", header = TRUE)
#Inspect structure of the data
head(cancer.survival)
```

```
## Survival Organ
## 1 124 Stomach
## 2 42 Stomach
## 3 25 Stomach
## 4 45 Stomach
## 5 412 Stomach
## 6 51 Stomach
```

```
####################
# #
# Exercise 2 #
# #
####################
#Get summary statistics for each organ
#You need to install library psych
library(psych)
```

`## Warning: package 'psych' was built under R version 3.3.1`

`describeBy(cancer.survival$Survival,cancer.survival$Organ)`

```
## group: Breast
## vars n mean sd median trimmed mad min max range skew
## X1 1 11 1395.91 1238.97 1166 1280.33 662.72 24 3808 3784 0.81
## kurtosis se
## X1 -0.7 373.56
## --------------------------------------------------------
## group: Bronchus
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 17 211.59 209.86 155 181.2 133.43 20 859 839 1.75 2.66
## se
## X1 50.9
## --------------------------------------------------------
## group: Colon
## vars n mean sd median trimmed mad min max range skew
## X1 1 17 457.41 427.17 372 394.2 244.63 20 1843 1823 1.96
## kurtosis se
## X1 3.76 103.6
## --------------------------------------------------------
## group: Ovary
## vars n mean sd median trimmed mad min max range skew
## X1 1 6 884.33 1098.58 406 884.33 386.96 89 2970 2881 1.01
## kurtosis se
## X1 -0.75 448.49
## --------------------------------------------------------
## group: Stomach
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 13 286 346.31 124 234.64 121.57 25 1112 1087 1.27 0.25
## se
## X1 96.05
```

```
####################
# #
# Exercise 3 #
# #
####################
#Create boxplots to identify any outliers
library(ggplot2)
```

`## Warning: package 'ggplot2' was built under R version 3.3.1`

```
##
## Attaching package: 'ggplot2'
```

```
## The following objects are masked from 'package:psych':
##
## %+%, alpha
```

`ggplot(cancer.survival,aes(x = Organ,y=Survival, color = Organ)) + geom_boxplot() + stat_summary(fun.y=mean, geom="point", shape=23, size=4) + ggtitle("Survival time of patients affected by different cancers")`

```
####################
# #
# Exercise 4 #
# #
####################
#Check for normality in each group
with(cancer.survival,tapply(Survival,Organ,shapiro.test))
```

```
## $Breast
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.86857, p-value = 0.07431
##
##
## $Bronchus
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.76596, p-value = 0.0007186
##
##
## $Colon
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.76056, p-value = 0.0006134
##
##
## $Ovary
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.76688, p-value = 0.029
##
##
## $Stomach
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.75473, p-value = 0.002075
```

```
####################
# #
# Exercise 5 #
# #
####################
#Check for equality of variance
library(car)
```

`## Warning: package 'car' was built under R version 3.3.1`

```
##
## Attaching package: 'car'
```

```
## The following object is masked from 'package:psych':
##
## logit
```

`leveneTest(Survival~Organ, data = cancer.survival)`

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 4.4524 0.003271 **
## 59
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
####################
# #
# Exercise 6 #
# #
####################
#Apply a log transformation to survival time and check for normality and equality of variance.
cancer.survival$log.survival = log(cancer.survival$Survival)
with(cancer.survival,tapply(log.survival,Organ,shapiro.test))
```

```
## $Breast
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.802, p-value = 0.009995
##
##
## $Bronchus
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.98047, p-value = 0.9613
##
##
## $Colon
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.92636, p-value = 0.1891
##
##
## $Ovary
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.983, p-value = 0.9655
##
##
## $Stomach
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.92837, p-value = 0.3245
```

`leveneTest(log.survival~Organ, data = cancer.survival)`

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.6685 0.6164
## 59
```

```
####################
# #
# Exercise 7 #
# #
####################
#Perform one way anova
aov1 = aov(log.survival~Organ,cancer.survival)
summary(aov1)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ 4 24.49 6.122 4.286 0.00412 **
## Residuals 59 84.27 1.428
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
####################
# #
# Exercise 8 #
# #
####################
#Perform a Tukey HSD comparison
TukeyHSD(aov1)
```

```
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = log.survival ~ Organ, data = cancer.survival)
##
## $Organ
## diff lwr upr p adj
## Bronchus-Breast -1.60543320 -2.906741 -0.3041254 0.0083352
## Colon-Breast -0.80948110 -2.110789 0.4918267 0.4119156
## Ovary-Breast -0.40798703 -2.114754 1.2987803 0.9615409
## Stomach-Breast -1.59068365 -2.968399 -0.2129685 0.0158132
## Colon-Bronchus 0.79595210 -0.357534 1.9494382 0.3072938
## Ovary-Bronchus 1.19744617 -0.399483 2.7943753 0.2296079
## Stomach-Bronchus 0.01474955 -1.224293 1.2537924 0.9999997
## Ovary-Colon 0.40149407 -1.195435 1.9984232 0.9540004
## Stomach-Colon -0.78120255 -2.020245 0.4578403 0.3981146
## Stomach-Ovary -1.18269662 -2.842480 0.4770864 0.2763506
```

```
####################
# #
# Exercise 9 #
# #
####################
#Interpret results
#our data showed departure from normality and equality of variance. Perhaps unequal variance was due to our unbalanced design (we had unequal samples in our groups)
#kruskal-wallis test would still not be appropriate because it relies on equal variance
#a log transformation was useful in stabilizing variance.
#normality was violated in the breast group even after transformation. Anova is robust to slight deviations from normality
#differences between groups were statistically significant
#kruskal-wallis leads to same conclusion.
####################
# #
# Exercise 10 #
# #
####################
#use a kruskal-wallis test
kruskal.test(log.survival~Organ,cancer.survival)
```

```
##
## Kruskal-Wallis rank sum test
##
## data: log.survival by Organ
## Kruskal-Wallis chi-squared = 14.954, df = 4, p-value = 0.004798
```

**What's next:**

- Explore all our (>1000) R exercises
- Find an R course using our R Course Finder directory
- Subscribe to receive weekly updates and bonus sets by email
- Share with your friends and colleagues using the buttons below

wgray says

I like these exercises — thanks. But I am getting some different numbers than you in xcise #07 and #08. the F and p-values are the same but the SS and MS differ:

summary(aov1)

# Df SumSq MeanSq Fvalue Pr(>F)

# Organ 4 4.618 1.1546 4.286 0.00412 **

# Residuals 59 15.894 0.2694

I also get different numbers from xcise #08 — TukeyHSD(aov1) — though here again the p-values are equal.

> TukeyHSD(aov1)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = logSurvival ~ Organ, data = cs)

$Organ

diff lwr upr p adj

Bronchus-Breast -0.697230780 -1.2623816 -0.13207998 0.0083352

Colon-Breast -0.351553176 -0.9167040 0.21359762 0.4119156

Ovary-Breast -0.177186517 -0.9184261 0.56405310 0.9615409

Stomach-Breast -0.690825131 -1.2891592 -0.09249106 0.0158132

Colon-Bronchus 0.345677604 -0.1552750 0.84663024 0.3072938

Ovary-Bronchus 0.520044263 -0.1734933 1.21358179 0.2296079

Stomach-Bronchus 0.006405649 -0.5317038 0.54451510 0.9999997

Ovary-Colon 0.174366659 -0.5191709 0.86790419 0.9540004

Stomach-Colon -0.339271955 -0.8773814 0.19883750 0.3981146

Stomach-Ovary -0.513638614 -1.2344732 0.20719600 0.2763506

Juan Acosta says

Very well explained, thanks.

Just a question: how can we perform a barplot marking with an * groups where we have found statistical differences?