The first thing you should do when you start working with new data is to explore it to learn what’s in there. The easiest way to do this is by visualization. Distributions, point plots, etc. They are very helpful, but plotting all of them for each variable or pair of variables can be time-consuming. That’s where
GGally comes in handy. It extends,
ggplot2 adding a few very useful functions for plotting multiple plots at once.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
GGally packages. Use
ggpairs functions to explore the
Customize the plot by setting different colors for each species of iris and adjusting
alpha to make the plot more readable.
Change the plot to apply the colors and
alpha from exercise 2, only to lower the triangle of the plot.
Put variables names on the diagonal of the plot.
Create custom plotting function that utilizes
geom_quasirandom function from
ggbeeswarm package and uses it for pairs with categorical X and continuous Y in the upper triangle of the plot.
ggscatmat function in
iris. Colour it by species. What are the differences with
Plot parallel coordinate plots of continuous columns in
iris. Color it by species.
Fit linear model of
Sepal.Length against all other columns. Using
GGally,display coefficients of the fit.
Modify the plot from exercise 8 by adding vertical endings to error bars and making size of the points depend on p-value.
Plot all available model diagnostics with