The
aggregate()
function subsets dataframes, and time series data, then computes summary statistics. The structure of the aggregate()
function is aggregate(x, by, FUN)
.
Answers to the exercises are available here.
Exercise 1
Aggregate the “airquality
” data by “airquality$Month
“, returning means on each of the numeric variables. Also, remove “NA
” values.
Exercise 2
Aggregate the “airquality
” data by the variable “Day
“, remove “NA
” values, and return means on each of the numeric variables.
Exercise 3
Aggregate “airquality$Solar.R
” by “Month
“, returning means of “Solar.R
“. The header of column 1 should be “Month
“. Remove “not available
” values.
Exercise 4
Apply the standard deviation function to the data aggregation from Exercise 3.
Exercise 5
The structure of the aggregate()
formula interface is aggregate(formula, data, FUN)
.
The structure of the formula is y ~ x
. The “y
” variables are numeric data. The “x
” variables, usually factors, are grouping variables, that subset the “y
” variables.
aggregate.formula
allows for one-to-one, one-to-many, many-to-one, and many-to-many aggregation.
Therefore, use aggregate.formula
for a one-to-one aggregation of “airquality
” by the mean of “Ozone
” to the grouping variable “Day
“.
Exercise 6
Use aggregate.formula
for a many-to-one aggregation of “airquality
” by the mean of “Solar.R
” and “Ozone
” by grouping variable, “Month
“.
Exercise 7
Dot notation can replace the “y
” or “x
” variables in aggregate.formula
. Therefore, use “.
” dot notation to find the means of the numeric variables in airquality
“, with the grouping variable of “Month
“.
Exercise 8
Use dot notation to find the means of the “airquality
” variables, with the grouping variables of “Day
” and “Month
“. Display only the first 6 resulting observations.
Exercise 9
Use dot notation to find the means of “Temp
“, with the remaining “airquality
” variables as grouping variables.
Exercise 10
aggregate.ts
is the time series method for aggregate()
.
Using R
‘s built-in time series dataset, “AirPassengers
“, compute the average annual standard deviation.
Image by Averater (Own work) [CC BY-SA 3.0], via Wikimedia Commons.
Very helpful! Thanks a lot.
Thank you for your compliment!
In excercise 10, why is aggregate.ts given as a hint if the explanation is to not use it?
Hi Huw,
To answer Exercise 10, you can use both aggregate and aggregate.ts, it is up to you to decide!
Have fun!
Thank you!
Looking at it, is there a functional difference between:
aggregate(airquality$Ozone, list(airquality$Day), mean, na.rm=TRUE)
and
aggregate(Ozone ~ Day, airquality, mean)
It seems to be that rather than including the ‘list(airquality)’ section, it would be easier to do everything as X~Y and use dot notation but I’m not sure if I’m missing something in the background.
Yes. “aggregate(airquality$Ozone, list(airquality$Day), mean, na.rm=TRUE) ” was not a suggested answer, because the Exercise referred to the formula interface of aggregate().
Exercise 10 asks for the average annual standard deviation. Should the solution not be: mean(aggregate(AirPassengers,nfrequency=1,sd)) ?
Thanks for inquiring, Jan!
I believe I was referring to aggregate() finding the means of a set of observations.
However, I will possible modify the question.
Best Regards,
John Akwei, Data Scientist
From where we can download airpassengers dataset