Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”.
In order to be able to solve this set of exercises you should have solved the part 0, part 1, part 2,part 3, and part 4 of this series but also you should run this script which contain some more data cleaning. In case you haven’t, run this script in your machine which contains the lines of code we used to modify our data set. This is the sixth set of exercise of a series of exercises that aims to provide a descriptive analytics solution to the ‘2008’ data set from here. This data set which contains the arrival and departure information for all domestic flights in the US from 2008 has become the “iris” data set for Big Data. The goal of Descriptive analytics is to inform the user about what is going on at the dataset. A great way to do that fast and effectively is by performing data visualisation. Data visualisation is also a form of art, it has to be simple, comprehended and full of information. On this set of exercises we will explore different ways of visualising continuous variables using the famous ggplot2
package. Before proceeding, it might be helpful to look over the help pages for the ggplot
, geom_histogram
, scale_fill_gradient
,geom_point
, geom_line
, geom_boxplot
, coord_flip
, geom_violin
.
For this set of exercises you will need to install and load the packages ggplot2
and dplyr
.
install.packages('ggplot2')
library(ggplot2)
install.packages('dplyr')
library(dplyr)
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Develop an histogram which illustrates the TaxIn variable.
Exercise 2
Let’s make things a bit fancier, illustrate the histogram of TaxiIn variable, with range from 0 to 50, while they break by 2 ,the highest values will be filled with red and the lowest will be filled with green and finally add a title.
Exercise 3
Make a scatter plot of ArrDelay in respect to Full_Date while illustrating each carrier with a different colour.
Exercise 4
Create a new variable called mean_delay which is the mean of ArrDelay
for each carrier every day.
Now make a scatter plot of Mean_ArrDelay
in respect to Full_Date
while illustrating each carrier with a different colour.
Exercise 5
Make the previous plot a bit more appealing by changing the alpha
parameter of the data points, the theme of the points , and by inserting names to the x-axis and y-axis.
Exercise 6
With the same variables, plot a line chart.
Hint: set the parameter ,group
in order to proceed
Exercise 7
Create a box plot which illustrates the mean of daily ArrDelay
for every day of the week.
Exercise 8
Modify the box plot by setting a colour and a size for the outliers, also make every day of the week to be illustrated with a different colour. Also, if you wish and your screen is not big enough, remove the legend.
Exercise 9
While box plot is a great way to demonstrate distributions, an even better way are violin plots. Plot a violin plot with the same data.
Exercise 10
Modify the violin plot, use different colour for every day of the week, remove the trim
and the legends.
Vasileios,
I like your “Exercised” series, and have tried to go through each one. I think they have a lot to bring to the beginner-intermediate R user, so kudos to you!!
One detail I found lacking though, and that is the incomplete/lack of treatment of the x-axis in answers 3-6 in the set above. When you leave the axis un-readable, this should never be seen as a “correct” answer. I know some may disagree, but this is not a small point: if you cannot read the graph, then it doesn’t matter how pretty it is or how efficiently it was generated. It is not doing its job of conveying information.
You could leave the first graph, and then ask: ok, what’s wrong with the previous answer and how do we fix it?
Anyway, good job and keep it up!
best,
Mike
Hello Mike,
I can’t thank you enough for the feedback! I will keep that in mind for the next exercises to come.
Cheers !
I think the link to your prerequisite script is missing a “.R” at the end
Original link: https://github.com/VasTsak/r-exercises/blob/master/data_visualisation_prerequisite
Should be: https://github.com/VasTsak/r-exercises/blob/master/data_visualisation_prerequisite.R
Hello Brent,
Thanks a lot for noticing. I fixed it .
Cheers!