Below are the solutions to these exercises on nonparametric tests.

#################### # # # Exercise 1 # # # #################### # Chi-square goodness of fit # H0: f1=f2=f3=f4=f5=f6=f7=f8 # H1: f1!=f2 chisq.test(c(102, 300, 102, 100, 205, 105, 71, 92))

## ## Chi-squared test for given probabilities ## ## data: c(102, 300, 102, 100, 205, 105, 71, 92) ## X-squared = 314.74, df = 7, p-value < 2.2e-16

# p < 0.05 => Sales are not equaly distributed among the stores #################### # # # Exercise 2 # # # #################### # Mann-Whitney test # H0: Me1=Me2 # H1: Me1!=Me2 x <- c(50, 50, 60, 70, 75, 80, 90, 85) y <- c(55, 75, 80, 90, 105, 65) wilcox.test(x, y, correct=FALSE, paired=FALSE)

## Warning in wilcox.test.default(x, y, correct = FALSE, paired = FALSE): ## cannot compute exact p-value with ties

## ## Wilcoxon rank sum test ## ## data: x and y ## W = 17.5, p-value = 0.3993 ## alternative hypothesis: true location shift is not equal to 0

# p > 0.05 => H0 can't be rejected, i.e. the income doesn't depend on the type of store #################### # # # Exercise 3 # # # #################### # Wilcoxon's paired test # H0: S(+) >= S(-) # H1: S(+) < S(-) x <- c(509, 517, 502, 629, 830, 911, 847, 803, 727, 853, 757, 730, 774, 718, 904) y <- c(517, 508, 523, 730, 821, 940, 818, 821, 842, 842, 709, 688, 787, 780, 901) wilcox.test(x, y, correct=FALSE, paired=TRUE)

## Warning in wilcox.test.default(x, y, correct = FALSE, paired = TRUE): ## cannot compute exact p-value with ties

## ## Wilcoxon signed rank test ## ## data: x and y ## V = 45.5, p-value = 0.41 ## alternative hypothesis: true location shift is not equal to 0

# p > 0.05 => H0 can't be rejected, i.e. the campaign was not successful #################### # # # Exercise 4 # # # #################### # Kruskal Wallis # H0: Me1=Me2=Me3 # H1: Me1!=Me2!=Me3 x <- c(510, 720, 930, 754, 105) y <- c(925, 735, 753, 685) z <- c(730, 745, 875, 610) kruskal.test(list(x, y, z))

## ## Kruskal-Wallis rank sum test ## ## data: list(x, y, z) ## Kruskal-Wallis chi-squared = 0.47473, df = 2, p-value = 0.7887

# p > 0.05 => H0 can't be rejected, i.e. color doesn't influence the sales #################### # # # Exercise 5 # # # #################### # Cochran's Q # H0: Pi1=Pi2=Pi3 # H1: Pi1!=Pi2 library("CVST") data <- read.csv("https://www.r-exercises.com/wp-content/uploads/2016/11/tv-station.csv") cochranq.test(data)

## ## Cochran's Q Test ## ## data: data ## Cochran's Q = 1.2, df = 3, p-value = 0.753

# p > 0.05 => H0 can't be rejected, i.e. there are no differences in satisfaction between four measures #################### # # # Exercise 6 # # # #################### # McNemar's test # H0: Pi1=Pi2 # H1: Pi1!=Pi2 satisfaction <- matrix(c(32, 68, 48, 52), nrow=2) mcnemar.test(satisfaction)

## ## McNemar's Chi-squared test with continuity correction ## ## data: satisfaction ## McNemar's chi-squared = 3.1121, df = 1, p-value = 0.07771

# p > 0.05 => H0> can't be rejected, i.e. there is no difference in satisfaction before and after improvement #################### # # # Exercise 7 # # # #################### # Chi-square test for homogeneity # H0: oij=eij for all cells usage <- matrix(c(151, 802, 753, 252, 603, 55, 603, 404, 408), nrow=3) chisq.test(usage)

## ## Pearson's Chi-squared test ## ## data: usage ## X-squared = 822.12, df = 4, p-value < 2.2e-16

# p < 0.05 => H0 is rejected, i.e. there is significant influence of the city size to frequency of buying product #################### # # # Exercise 8 # # # #################### data <- read.csv("https://www.r-exercises.com/wp-content/uploads/2016/11/ab-consumption.csv") shapiro.test(as.numeric(data$A))

## ## Shapiro-Wilk normality test ## ## data: as.numeric(data$A) ## W = 0.97147, p-value = 0.02867

shapiro.test(as.numeric(data$B))

## ## Shapiro-Wilk normality test ## ## data: as.numeric(data$B) ## W = 0.97673, p-value = 0.07367

# since the data is not normaly distributed, we use Spearman's correlation coefficient cor(data, method="spearman")

## A B ## A 1.00000000 0.03736654 ## B 0.03736654 1.00000000

# practically, there is no correlation between consumption of products A and B #################### # # # Exercise 9 # # # #################### # Chi-square test for independence # H0: oij=eij for all cells m <- matrix(data = c(301, 353, 558, 502, 155, 153), nrow = 3) chisq.test(m)

## ## Pearson's Chi-squared test ## ## data: m ## X-squared = 289.71, df = 2, p-value < 2.2e-16

# p < 0.05 => H0 is rejected i.e. gender influences the decision to buy #################### # # # Exercise 10 # # # #################### h <- chisq.test(m)$statistic n <- sum(m) cobs <- sqrt(h/(h+n)) cobs

## X-squared ## 0.3540099

r <- NROW(m) c <- NCOL(m) cmax <- ((r-1)/r*(c-1)/c)^(1/4) cobs/cmax

## X-squared ## 0.4659032

stockie says

In the 2nd exercise, when p-value is greater than .05, you can only reject null hypotheses, but cannot accept alternate hypotheses as shown in the answer. Please correct this, unless I’m missing something.

Miodrag Sljukic says

Since this is a test of difference of medians, null hypothesis says that there is no difference between them, i.e. difference between two medians equals 0 (Me1=Me2). Result of p 0.05 which means that the probability of gaining such result given the data is larger than threshold of 5% we set, so we can assume that the differences that are observed in our sample have most probably occurred by accident. Thus, based on the result, we can’t claim that the difference exists in the population, being, at the same time, be within 5% level of confidence.

This is why, the only correct conclusion in the case of exercise #2 is that there is no difference in the population, i.e. that the difference we observed in our sample is a consequence of random event. This means that the null hypothesis is correct, while alternative is not.

jake says

Hi,

In Ex3, shouldn’t we test for “positive outcome of the campaign”, that is:

wilcox.test(x, y, correct=FALSE, paired=TRUE, alternative=’less’)

wilcox.test(x, y, correct=FALSE, paired=TRUE) is just testing that the campaign has changed something (for better or worse).

jake says

Hi,

In Ex4, we should first check if the variances are equal. This is prerequisite for the Kruskal-Wallis test.