Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of the demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In the first set of exercises, we’ve seen how to make predictions in the case where the mean and the variance of the time series is constant. Now, we’ll see how to deal with time series whose mean change over time.
The basic type of time series who is not stationary is called a random walk. This random process is obtained by adding each random fluctuation of a stationary time series to the sum of all the previous one. As a consequence, the mean of such a time series will change over time, so it’s not stationary, but his variance will stay the same. You can find more information about random walk here. Once you get the hang of how to use random walks, make sure to read part three of this series of exercise, where you will see how to make predictions with this kind of models.
To be able to do theses exercice, you have to have installed the packages
Answers to the exercises are available here.
Set the seed to 42 and generate 100 random points from a normal distribution of mean 0 and standard deviation of 1 to simulate a white noise. Then use the
cumsum() function to sum the point of the white noise and simulate a random walk. Finally, use the
as.ts() function with the correct argument to create a time series named
Do a basic plot of
Follow the steps of exercise 1, but this time generates 100 points from a normal distribution with a mean of 0.5 and a standard deviation of 1. Save the time series in a variable called
random.walk2, then plot it.
If we look at the previous plot, we see that there is a general positive trend in the data. This resulted from the fact that the white noise component of the random walk has a mean greater than 0 and a relatively small standard derivation. In consequence, those values are generally positive and since those components are adding together, each observation of the random walk has a good chance to be greater than their predecessor.
To get a sense of how the mean and the standard deviation of the white noise affect the shape of the resulting time series, generate 100 points of a random walk with a white noise component of:
- mean = -0.5 and st=1
- mean = 2 and st=1
- mean = 10 and st=1
- mean = 0.5 and st=10
Again, set the seed to 42 and plot your result.
From the first plot, we can see that the sign of the mean of the white noise component determines the sign of the trend line of the curve. If the mean is positive the trend tend to be positive and vice versa. From the other plot, we see how the relative magnitude of the mean and the standard deviation affect the shape of the curve. Since the standard deviation of the white noise determines the degree of randomness of the observations, the greater is this value compared to the mean, the less we can see a trend in the data. The opposite is also true: greater is the mean compared to the standard deviation, the more the curve of the time series tend to be close to the trend line.
In the last set of exercises, we have seen how to do prediction on a time series who’s stationery. Since a random walk doesn’t have a constant mean, we cannot directly follow the same steps.
Create a new time series by subtracting of each observation
x_i in the
random.walk2 time series from exercise 3 the previous observation
x_i-1. Name this time series
white.noise and plot it. As usual, set the seed to 42.
Since the random walk is the cumulative sum of each individual white noise component the result of this subtraction is those random components. Since a white noise is stationary, we should be able to use the same step we have seen in the last set of exercises to make predictions on this time series.
Confirm that the time series white.noise is indeed stationary by drawing his autocorrelation plot.
Use the Box-Ljung test, the keeps test and the adf test to see if the time series is stationary. You can use the part-1 of this series as reference.
Use the function
forecast.HoltWinters to apply an exponential smoothing on the
white.noise time series and make predictions for the next 5 points. Save the result in a variable called
prediction, then print them and plot them.
Knowing that the prediction are stored in the
$mean column of the prediction variable and that the lower and upper limit of the 80\% confidence intervals are stored in the
$upper[,1] column respectively, plot
random.walk2 and the predictions.
So far, we made some assumptions on the random walk to be able to make predictions.
- Constance of the mean of the white noise component
- Constance of the variance of the white noise component
- The white noise component are add together to create the random walk
- The white noise is the only random component of the random walk
If we use the method shown today to make predictions on a time series that doesn’t verify at least one of those conditions, the results shouldn’t be trusted.
Load the Load the
EuStockMarkets time series from the
dataset library and plot the first time series of this set. Then use the
diff() function to create a
white.noise2 time series containing the white noise component and plot it.
Looking at the time series plot we see a curve which look a lot like a random walk we’ve seen before, but by plotting the white noise component, we see that the amplitude of his random variations increase over time. In this situation, using the method we’ve seen today wouldn’t produce accurate predictions. In the next post in this series of exercises, we’ll see how to transform a random walk to make it respect all the necessary assumptions.