Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In the previous post we have seen the fundamental models of time series. We’ve seen what’s a white noise, what’s a random walk, what is seasonality and which transformation to use to make a time series stationary. Today, we’ll see some basic methods for predicting the values of a time series and how to select which method to use.
To be able to do theses exercise, you have to have installed the packages
smooth. Also, when you generate a random number or process, set the seed to 42 to have results identical to those on the solution set.
Answers to the exercises are available here.
Create an artificial time series, which we’ll use to test the different forecasting methods we’ll use today. This random walk should have a white noise component with a mean of 0.5 and a standard deviation of 1, should have 365 observations with a frequency and deltat of 1.
The basic way to predict the future value of a time series is to suppose that the next value will be the same as the last. Just imagine you are the manager of a grocery store, that you have to order bread for the next week and that you don’t have the time to do some fancy R programming. Using the bread sale of last week as a reference for your order could be quite enough. This method is called the naive method, for good reason! Use this method to do predict the value of observations from t=331 to t=365, calculate the forecasting error and save the results in a
naive.forecast data frame.
forecast package provides the
naive() function to use this method. Use this function to make prediction on the time windows t=331 to t=365 with a confidence interval of 90% and 95% then plot the forecast.
Looking back at the grocery store manager example, using the naive method can make you order too much or too little bread if last week’s sales were unusual. It gets worst for food which price can change drastically depending on the period. For example, lobster gets cheap during fishing season and using this method would under estimate the sales in the beginning of the fishing season and over estimate the sale during the week following this period. The naive method gives good result only when the time series is stable, so only when forecasting is not really useful!
Calculate three averages: the average of all the observations, of the observations 115 to 330 and of observations 329 to 330. Use those three values as predictions of value from 331 to 365 and calculate the estimation error. Save the result in 3 different data frames.
By computing the mean of the whole data set and using this value to do the forecast, we get a bigger estimation error than with the naive method, but the less observations we use to compute the mean the better the forecast become. Using the mean of the time series as forecast value is called the mean method and is useful for white noise process, but it’s not precise for non stationary time series, since his mean change over time. We can better better estimate that changing mean value by using a small number of observations in our computation. This method is called moving average and can be used to smooth the time series and easily make a prediction.
Use the mean method with the
meanf() function to predict the last 35 observations of the time series with an confidence interval of 90 and 95 percent.
sma() function from the
smooth library to predict the last 35 of the time series by the moving average method. This function will use different size of n-day sample to compute the mean, select the model that best fit the data and make the prediction.
The moving average can also be used to remove the randomness of the time series curve to better see the trend in the data. Plot the time series and add two red lines: one line made by a moving average made with a n-day sample of 150 and another made by a n-day sample of 5. You can use the
ma() function to compute the moving average.
With the rwf() function makes predictions for the observations between t=331 and t=365.
Calculate the RMSE of the prediction you made with each model and make a bar plot of the result. Which would be the best model to use to make forecasts of this time series?
In practice, when deciding which model to use to make a prediction, you should follow the process used in this exercise set: separate your historic data in a training and a test set, apply the algorithm on the training set, make prediction on the test set, calculate the prediction error, choose a metric like RMSE or MASE to evaluate the models and select the model which is the more effective to predict your data.
Note that in practice the distribution of your time series will change over time, so you must repeat this process periodically to make sure that you make prediction with the model best suited to the new data.
Follow the steps of the last exercise, but this time use only the observations
- between t=331 and t=335
- between t=361 and t=365
Does the prediction are more precise on the short or long term?