In this set of exercises we will practice multivariate analysis of variance – MANOVA.
We shall try to find if there is a difference in the combination of export and bank reserves, depending on the status of banking sector (is there a crisis or not). The data set is fictitious and servers for education purposes only. It consist of variables
crisis, which is factor, meaning that there exists or there does not exist banking crisis and
reserves, in billions of currency units. You can download it here.
In this set of exercises we use two packages:
heplots. If you haven’t already installed them, do it using the following code:
and load them into the session using the following code:
Answers to the exercises are available here.
If you have different solution, feel free to post it.
Is the sample size large enough for conducting MANOVA? (Tip: You should have at least 2 cases for each cell.)
Are there univariate and multivariate outliers?
- There are univariate, but not multivariate outliers
- There doesn’t exist a univariate outlier, but there are multivariate outliers
- There exist both univariate and multivariate outliers
How do you estimate univariate and multivariate normality of dependent variables?
- Both variables are univariate normal, but they are not multivariate normally distributed
- None of the variables is univariate normal, and hence there doesn’t exist multivariate normality
- Both variables are univariate normal and the data is multivariate normally distributed
Using the matrix of scatter plots, check for the linearity between dependent variables
reserves for each category of independent variable.
Calculate the correlation between dependent variables
reserves. Is it appropriate to justify conducting MANOVA?
Is there equality of covariances of the dependent variables
reserves across the groups. (Tip: You should perform Box’s M test of equality of covariance matrices.)
Is there equality of variances of the dependent variables
reserves across groups? (Tip: Use Levens’s test of error variances.)
On the level of significance of 0.05, is there effect of banking crisis to export and banking reserves combination?
How much of the variance in the dependent variables
reserves is explained by banking crisis?
Does the export differ when banking sector is in the crisis compared to when banking sector is not in the crisis? What about reserves?
- Only export differ
- Only reserves differ
- Both export and reserves differ
- None of them differ
Carl Sutton says
This is a brand new area for me. I have no foundation for what the questions are referring to and have never seen most of the functions used.
What is the purpose of the $out in
boxplot(data$export)$out ? I am assuming something is being exported somewhere, but what, where, and why?
Any good reference materials you can refer me to would be appreciated.
Miodrag Sljukic says
Thank you for your comment Carl.
The key idea of this set of exercises is to show how you can investigate the influence of categorical independent variable to two continuous dependent variables. This technique is called multivariate analysis of variance (MANOVA). Basically, we check if there is a difference between subgroups of continuous variables. In the case of these exercises we put the analysis in the context of banking crises, asking if national export and level of banking reserves vary differently when the crises of banking system happens. In order to conduct this kind of analysis, some assumptions must be met, i.e. data must satisfy certain conditions. We explore these conditions in exercises 1 through 7. Exercises 8 through 10 ask you to find the result and do post-hoc analysis. My intention with this set of exercises was to cover entire process of MANOVA step by step, so that the solutions can be used in practical work. If you are interested to learn more about MANOVA, I can suggest you a book “Applied Multivariate Statistics for the Social Sciences” by James P Stevens, although I’m sure there are also great other books and on-line resources.
There are many ways to solve these exercises. The solutions given here are just one of them. For example, in order to asses univariate normality of data, you can use
shapiro.testfunctions which are included from
statspackage. But in order to test multivariate normality, other package has to be used. I found that
MVNpackage is good (although there might be others which are even better) and, hence it also has a function for testing univariate normality which gives more information than those in package stats, I used it to test univariate normality too. This doesn’t mean that using some other function is wrong, in most cases it is just a matter of taste. Doing this way I wanted to show that there are different options and to encourage reader to exploit R’s compelling richness of possibilities.
Besides drawing box-plot, R’s function
boxplotreturns some basic statistics about data it plots. You can see it if you run
summary(boxplot(your_data)). It displays 5 fields, among them
out, which contains a list of outlier values for the data plotted, which was asked in the exercise 2. So the statement
boxplot(data$export)$outactually do two things: plot the chart and displays a list of outliers. You can find more info about boxplot function in r help running
Thanks a lot Miodrag for investing time, energy and intentions to make these exercises. I really appreciate that.
For Ques 3 , do you think following this would be right?
#check univariate/multivariate normality
Miodrag Sljukic says
Yes, your solution is perfectly good for testing univariate normality, but for multivariate normality you have to use different functions. I gave an example of three tests contained in mvnormtest library.