The first thing you should do when you start working with new data is to explore it to learn what’s in there. The easiest way to do this is by visualization. Distributions, point plots, etc. They are very helpful, but plotting all of them for each variable or pair of variables can be time-consuming. That’s where GGally
comes in handy. It extends,ggplot2
adding a few very useful functions for plotting multiple plots at once.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Load ggplot2
and GGally
packages. Use ggpairs
functions to explore the iris
dataset.
Exercise 2
Customize the plot by setting different colors for each species of iris and adjusting alpha
to make the plot more readable.
Exercise 3
Change the plot to apply the colors and alpha
from exercise 2, only to lower the triangle of the plot.
Exercise 4
Put variables names on the diagonal of the plot.
Exercise 5
Create custom plotting function that utilizes geom_quasirandom
function from ggbeeswarm
package and uses it for pairs with categorical X and continuous Y in the upper triangle of the plot.
Exercise 6
Use ggscatmat
function in iris
. Colour it by species. What are the differences with ggpairs
?
- Work extensively with the different visualization packages and their functionality
- Learn what visualizations exist to quickly explore datasets
- And much more
Exercise 7
Plot parallel coordinate plots of continuous columns in iris
. Color it by species.
Exercise 8
Fit linear model of Sepal.Length
against all other columns. Using GGally,
display coefficients of the fit.
Exercise 9
Modify the plot from exercise 8 by adding vertical endings to error bars and making size of the points depend on p-value.
Exercise 10
Plot all available model diagnostics with ggnostic
.
Leave a Reply