In this set of exercises we will practice multivariate analysis of variance – MANOVA.
We shall try to find if there is a difference in the combination of export and bank reserves, depending on the status of banking sector (is there a crisis or not). The data set is fictitious and servers for education purposes only. It consist of variables crisis
, which is factor, meaning that there exists or there does not exist banking crisis and export
and reserves
, in billions of currency units. You can download it here.
In this set of exercises we use two packages: MVN
and heplots
. If you haven’t already installed them, do it using the following code:
install.packages(c("MVN", "heplots"))
and load them into the session using the following code:
library("MVN")
library("heplots")
before proceeding.
Answers to the exercises are available here.
If you have different solution, feel free to post it.
Exercise 1
Is the sample size large enough for conducting MANOVA? (Tip: You should have at least 2 cases for each cell.)
- Yes
- No
Exercise 2
Are there univariate and multivariate outliers?
- There are univariate, but not multivariate outliers
- There doesn’t exist a univariate outlier, but there are multivariate outliers
- There exist both univariate and multivariate outliers
Exercise 3
How do you estimate univariate and multivariate normality of dependent variables?
- Both variables are univariate normal, but they are not multivariate normally distributed
- None of the variables is univariate normal, and hence there doesn’t exist multivariate normality
- Both variables are univariate normal and the data is multivariate normally distributed
Exercise 4
Using the matrix of scatter plots, check for the linearity between dependent variables export
and reserves
for each category of independent variable.
Exercise 5
Calculate the correlation between dependent variables export
and reserves
. Is it appropriate to justify conducting MANOVA?
- Yes
- No
Exercise 6
Is there equality of covariances of the dependent variables export
and reserves
across the groups. (Tip: You should perform Box’s M test of equality of covariance matrices.)
- Yes
- No
Exercise 7
Is there equality of variances of the dependent variables export
and reserves
across groups? (Tip: Use Levens’s test of error variances.)
- Yes
- No
Exercise 8
On the level of significance of 0.05, is there effect of banking crisis to export and banking reserves combination?
- Yes
- No
Exercise 9
How much of the variance in the dependent variables export
and reserves
is explained by banking crisis?
Exercise 10
Does the export differ when banking sector is in the crisis compared to when banking sector is not in the crisis? What about reserves?
- Only export differ
- Only reserves differ
- Both export and reserves differ
- None of them differ
This is a brand new area for me. I have no foundation for what the questions are referring to and have never seen most of the functions used.
What is the purpose of the $out in
boxplot(data$export)$out ? I am assuming something is being exported somewhere, but what, where, and why?
Any good reference materials you can refer me to would be appreciated.
Thank you for your comment Carl.
The key idea of this set of exercises is to show how you can investigate the influence of categorical independent variable to two continuous dependent variables. This technique is called multivariate analysis of variance (MANOVA). Basically, we check if there is a difference between subgroups of continuous variables. In the case of these exercises we put the analysis in the context of banking crises, asking if national export and level of banking reserves vary differently when the crises of banking system happens. In order to conduct this kind of analysis, some assumptions must be met, i.e. data must satisfy certain conditions. We explore these conditions in exercises 1 through 7. Exercises 8 through 10 ask you to find the result and do post-hoc analysis. My intention with this set of exercises was to cover entire process of MANOVA step by step, so that the solutions can be used in practical work. If you are interested to learn more about MANOVA, I can suggest you a book “Applied Multivariate Statistics for the Social Sciences” by James P Stevens, although I’m sure there are also great other books and on-line resources.
There are many ways to solve these exercises. The solutions given here are just one of them. For example, in order to asses univariate normality of data, you can use
ks.test
orshapiro.test
functions which are included fromstats
package. But in order to test multivariate normality, other package has to be used. I found thatMVN
package is good (although there might be others which are even better) and, hence it also has a function for testing univariate normality which gives more information than those in package stats, I used it to test univariate normality too. This doesn’t mean that using some other function is wrong, in most cases it is just a matter of taste. Doing this way I wanted to show that there are different options and to encourage reader to exploit R’s compelling richness of possibilities.Besides drawing box-plot, R’s function
boxplot
returns some basic statistics about data it plots. You can see it if you runsummary(boxplot(your_data))
. It displays 5 fields, among themout
, which contains a list of outlier values for the data plotted, which was asked in the exercise 2. So the statementboxplot(data$export)$out
actually do two things: plot the chart and displays a list of outliers. You can find more info about boxplot function in r help running?boxplot
.Thanks a lot Miodrag for investing time, energy and intentions to make these exercises. I really appreciate that.
For Ques 3 , do you think following this would be right?
#check univariate/multivariate normality
subset1<-as.numeric(bankcrisis[,2])
shapiro.test(subset1)
subset2<-as.numeric(bankcrisis[,3])
shapiro.test(subset2)
#OR
install.packages("mvnormtest")
library(mvnormtest)
subset3<-as.matrix(bankcrisis[,2:3])
qqnorm(subset3)
qqnorm(subset1)
qqnorm(subset2)
Yes, your solution is perfectly good for testing univariate normality, but for multivariate normality you have to use different functions. I gave an example of three tests contained in mvnormtest library.