Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of the demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In this first set of exercises, you’ll learn the essential concepts we’ll use throughout the series: how to use the fundamental R tools for time series analysis, how to verify if a time series is stationary and how to make prediction in that context.
Answers to the exercises are available here.
To be able to do theses exercises, you have to have installed the R packages
data function to load the
treering dataset from the R library. This dataset is loaded as an R time series object which is a vector whose value are ordered chronologically. Look at the structure of this
ts object and use another function to find the number of observations in the dataset.
Use the function
window(ts, start, end) to select the observations in the treering dataset from the year 1500 to the year 2000.
Do a basic plot of the treering dataset and use the
abline function to add an horizontal red line representing the mean of the dataset.
Looking at the previous plot, we get the sense that these values are randomly distributed around the red line who represent the mean of the dataset. Also the magnitude of the random fluctuation of the points seems to stay stable over time. Such a time series is called “stationary” and it is a propriety we prefer to observe in a time series when we want to make predictions.
To make sure that the time series is stationary, we’ll draw the autocorrelation plot and run the Box-Ljung test, Kwiatkowski-Phillips-Schmidt-Shin test and the Augmented Dickey–Fuller test on the dataset.
forecast package and use the
Acf function to draw the autocorrelation plot of the time series.
Box.test function to apply the Box-Ljung test on the data set. Set the parameter lag to the maximum value of Lag in the previous plot.
tseries package and apply the Kwiatkowski-Phillips-Schmidt-Shin test by using the
kpss.test function on the data.
adf.test function to apply the Augmented Dickey–Fuller t-statistic test on the dataset. Set the argument
alternative to “stationary”.
Use the Holt-Winters Filtering method to apply an exponential smoothing on the time series. Use the function
HoltWinters with the parameters
gamma set to “FALSE” to select the exponential smoothing and start the function at the first observation of the dataset. Store the result in a variable named
forecast.HoltWinters function make predictions for the next 5 years, store the results in a variable named
prediction and print it to the screen.
plot.forecast function from the
forecast package to plot your predictions.