The standard ARIMA (autoregressive integrated moving average) model allows to make forecasts based only on the past values of the forecast variable. The model assumes that future values of a variable linearly depend on its past values, as well as on the values of past (stochastic) shocks. The ARIMAX model is an extended version of the ARIMA model. It includes also other independent (predictor) variables. The model is also referred to as the vector ARIMA or the dynamic regression model.
The ARIMAX model is similar to a multivariate regression model, but allows to take advantage of autocorrelation that may be present in residuals of the regression to improve the accuracy of a forecast.
This set of exercises provides a practice in using the auto.arima
function from the forecast
package to make forecasts with the ARIMAX model. A function from the lmtest
package is also used to check the statisical significance of regression coeffcients.
The exercises make use of the Icecream
dataset from the Ecdat
package. The dataset contains the following variables:
The number of observations is 30. They correspond to fourweekly periods in the span from March 18, 1951 to July 11, 1953 (download here).
For other parts of the series follow the tag forecasting.
Answers to the exercises are available here.
Exercise 1
Load the dataset, and plot the variables cons
(ice cream consumption), temp
(temperature), and income
.
Exercise 2
Estimate an ARIMA model for the data on ice cream consumption using the auto.arima
function. Then pass the model as input to the forecast
function to get a forecast for the next 6 periods (both functions are from the forecast
package).
Exercise 3
Plot the obtained forecast with the autoplot.forecast
function from the forecast
package.
Exercise 4
Use the accuracy
function from the forecast
package to find the mean absolute scaled error (MASE) of the fitted ARIMA model.
Exercise 5
Estimate an extended ARIMA model for the consumption data with the temperature variable as an additional regressor (using the auto.arima
function). Then make a forecast for the next 6 periods (note that this forecast requires an assumption about the expected temperature; assume that the temperature for the next 6 periods will be represented by the following vector: fcast_temp < c(70.5, 66, 60.5, 45.5, 36, 28)
).
Plot the obtained forecast.
Exercise 6
Print summary of the obtained forecast. Find the coefficient for the temperature variable, its standard error, and the MASE of the forecast. Compare the MASE with the one of the initial forecast.
Exercise 7
Check the statistical significance of the temperature variable coefficient using the the coeftest
function from the lmtest
package. Is the coefficient statistically significant at 5% level?
Exercise 8
The function that estimates the ARIMA model can input more additional regressors, but only in the form of a matrix. Create a matrix with the following columns:
Print the matrix.
Note: the last three columns can be created by prepending two NA
‘s to the vector of values of the income variable, and using the obtained vector as an input to the embed
function (with the dimension
parameter equal to the number of columns to be created).
Exercise 9
Use the obtained matrix to fit three extended ARIMA models that use the following variables as additional regressors:
Examine the summary for each model, and find the model with the lowest value of the Akaike information criterion (AIC).
Note that the AIC cannot be used for comparison of ARIMA models with different orders of integration (expressed by the middle terms in the model specifications) because of a difference in the number of observations. For example, an AIC value from a nondifferenced model, ARIMA (p, 0, q), cannot be compared to the corresponding value of a differenced model, ARIMA (p, 1, q).
Exercise 10
Use the model found in the previous exercise to make a forecast for the next 6 periods, and plot the forecast. (The forecast requires a matrix of the expected temperature and income for the next 6 periods; create the matrix using the fcast_temp
variable, and the following values for expected income: 91, 91, 93, 96, 96, 96
).
Find the mean absolute scaled error of the model, and compare it with the ones from the first two models in this exercise set.
Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In the previous post we have seen the fundamental models of time series. We’ve seen what’s a white noise, what’s a random walk, what is seasonality and which transformation to use to make a time series stationary. Today, we’ll see some basic methods for predicting the values of a time series and how to select which method to use.
To be able to do theses exercise, you have to have installed the packages forecast
and smooth
. Also, when you generate a random number or process, set the seed to 42 to have results identical to those on the solution set.
Answers to the exercises are available here.
Exercise 1
Create an artificial time series, which we’ll use to test the different forecasting methods we’ll use today. This random walk should have a white noise component with a mean of 0.5 and a standard deviation of 1, should have 365 observations with a frequency and deltat of 1.
Exercise 2
The basic way to predict the future value of a time series is to suppose that the next value will be the same as the last. Just imagine you are the manager of a grocery store, that you have to order bread for the next week and that you don’t have the time to do some fancy R programming. Using the bread sale of last week as a reference for your order could be quite enough. This method is called the naive method, for good reason! Use this method to do predict the value of observations from t=331 to t=365, calculate the forecasting error and save the results in a naive.forecast
data frame.
Exercise 3
The forecast
package provides the naive()
function to use this method. Use this function to make prediction on the time windows t=331 to t=365 with a confidence interval of 90% and 95% then plot the forecast.
Looking back at the grocery store manager example, using the naive method can make you order too much or too little bread if last week’s sales were unusual. It gets worst for food which price can change drastically depending on the period. For example, lobster gets cheap during fishing season and using this method would under estimate the sales in the beginning of the fishing season and over estimate the sale during the week following this period. The naive method gives good result only when the time series is stable, so only when forecasting is not really useful!
Exercise 4
Calculate three averages: the average of all the observations, of the observations 115 to 330 and of observations 329 to 330. Use those three values as predictions of value from 331 to 365 and calculate the estimation error. Save the result in 3 different data frames.
By computing the mean of the whole data set and using this value to do the forecast, we get a bigger estimation error than with the naive method, but the less observations we use to compute the mean the better the forecast become. Using the mean of the time series as forecast value is called the mean method and is useful for white noise process, but it’s not precise for non stationary time series, since his mean change over time. We can better better estimate that changing mean value by using a small number of observations in our computation. This method is called moving average and can be used to smooth the time series and easily make a prediction.
Exercise 5
Use the mean method with the meanf()
function to predict the last 35 observations of the time series with an confidence interval of 90 and 95 percent.
Exercise 6
Use the sma()
function from the smooth
library to predict the last 35 of the time series by the moving average method. This function will use different size of nday sample to compute the mean, select the model that best fit the data and make the prediction.
Exercise 7
The moving average can also be used to remove the randomness of the time series curve to better see the trend in the data. Plot the time series and add two red lines: one line made by a moving average made with a nday sample of 150 and another made by a nday sample of 5. You can use the ma()
function to compute the moving average.
Exercise 8
With the rwf() function makes predictions for the observations between t=331 and t=365.
Exercise 9
Calculate the RMSE of the prediction you made with each model and make a bar plot of the result. Which would be the best model to use to make forecasts of this time series?
In practice, when deciding which model to use to make a prediction, you should follow the process used in this exercise set: separate your historic data in a training and a test set, apply the algorithm on the training set, make prediction on the test set, calculate the prediction error, choose a metric like RMSE or MASE to evaluate the models and select the model which is the more effective to predict your data.
Note that in practice the distribution of your time series will change over time, so you must repeat this process periodically to make sure that you make prediction with the model best suited to the new data.
Exercise 10
Follow the steps of the last exercise, but this time use only the observations
Does the prediction are more precise on the short or long term?
]]>In the previous exercises of this series, forecasts were based only on an analysis of the forecast variable. Another approach to forecasting is to use external variables, which serve as predictors. This set of exercises focuses on forecasting with the standard multivariate linear regression.
Running regressions may appear straightforward but this method of forecasting is subject to some pitfalls:
(1) a basic difficulty is selection of predictor variables (which is more of an art than a science),
(2) a possible problem is the dependence of a forecast on assumptions about expected values of predictor variables,
(3) another problem can arise if autocorrelation is present in regression residuals (it implies, among other things, that not all information, which could be used for forecasting, was retrieved from the forecast variable).
This set of exercises allow to practice in using the regsubsets
function from the leaps
package to run sets of regressions, making and plotting forecast from a multivariate regression, and testing residuals for autocorrelation (which requires the lmtest
package to be installed). The model selection is based on the Bayesian information criterion (BIC).
The exercises make use of the quarterly data on light vehicles sales (in thousands of units), real disposable personal income (per capita, in chained 2009 dollars), civilian unemployment rate (in percent), and finance rate on personal loans at commercial banks (24 month loans, in percent) in the USA for 19762016 from FRED, the Federal Reserve Bank of St. Louis database (download here).
For other parts of the series follow the tag forecasting.
Answers to the exercises are available here.
Exercise 1
Load the dataset, and plot the sales
variable.
Exercise 2
Create the trend
variable (by assigning a successive number to each observation), and lagged versions of the variables income
, unemp
, and rate
(lagged by one period). Add them to the dataset.
(Note that the base R libraries do not include functions for creating lags for nontimeseries data, so the variables can be created manually).
Exercise 3
Run all possible linear regressions with sales
as the dependent variable and the others as independent variables using the regsubsets
function from the leaps
package (pass a formula with all possible dependent variables, and the dataset as inputs to the function).
Plot the output of the function.
Exercise 4
Note that regsubsets
returns only one “best” model (in terms of BIC) for each possible number of dependent variables. Run all regressions again, but increase the number of returned models for each size to 2.
Plot the output of the function.
Exercise 5
Look at the plots from the previous exercises and find the model with the lowest value of BIC. Run a linear regression for the model, save the result in a variable, and print its summary.
Exercise 6
Load an additional dataset with assumptions on future values of dependent variables. Use the dataset and the model obtained in the previous exercise to make a forecast for the next 4 quarters with the forecast
function (from the package with the same name). Note that the names of the lagged variables in the assumptions data have to be identical to the names of the corresponding variables in the main dataset.
Plot the summary of the forecast.
Exercise 7
The plot
function does not automatically draw plots for forecasts obtained from regression models with multiple predictors, but such plots can be created manually. As the first step, create a vector from the sales
variable, and append the forecast (mean) values to this vector. Then use the ts
function to transform the vector to a quarterly time series that starts in the first quarter of 1976.
Exercise 8
Plot the forecast in the following steps:
(1) create an empty plot for the period from the first quarter of 2000 to the fourth quarter of 2017,
(2) plot a black line for the sales time series for the period 20002016,
(3) plot a thick blue line for the sales time series for the fourth quarter of 2016 and all quarters of 2017.
Note that a line can be plotted using the lines
function, and a subset of a time series can be obtained with the window
function.
Exercise 9
Perform the BreuschGodfrey test (the bgtest
function from the lmtest
package) to test the linear model obtained in the exercise 5 for autocorrelation of residuals. Set the maximum order of serial correlation to be tested to 4.
Is the autocorrelation present?
(Note that the null hypothesis of the test is the absence of autocorrelation of the specified orders).
Exercise 10
Use the Pacf
function from the
forecast
package to explore autocorrelation of residuals of the linear model obtained in the exercise 5. Find at which lags partial correlation between lagged values is statistically significant at 5% level.
Residuals can be obtained from the model using the residuals
function.
]]>
https://www.rexercises.com/2017/05/01/forecastingmultivariateregressionexercisespart4/feed/
1

Forecasting for small business Exercises (Part3)
https://www.rexercises.com/2017/04/25/forecastingforsmallbusinessexercisespart3/
https://www.rexercises.com/2017/04/25/forecastingforsmallbusinessexercisespart3/#respond
Tue, 25 Apr 2017 16:05:14 +0000
https://www.rexercises.com/?p=5414
Related exercise sets:
 Forecasting for small business Exercises (Part2)
 Forecasting: Time Series Exploration Exercises (Part1)
 Forecasting: Exponential Smoothing Exercises (Part3)
 Become a Top R Programmer Fast with our Individual Coaching Program
 Explore all our (>4000) R exercises
 Find an R course using our R Course Finder directory
]]>
Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In the last set of exercises, we’ve seen how to make predictions on a random walk by isolating the white noise components via differentiation of the term of the time series. But this approach is valid only if the random components of the time series follow a normal distribution of constant mean and variance and if those components are added together in each iteration to create the new observations.
Today, we’ll see some transformations we can apply on the time series make them stationary, especially how to stabilise variance and how to detect and remove seasonality in a time series.
To be able to do theses exercise, you have to have installed the packages forecast
and tseries
.
Answers to the exercises are available here.
Exercise 1
Use the data()
function to load the EuStockMarkets
dataset from the R library. Then use the diff()
function on EuStockMarkets[,1]
to isolate the random component and plot them. This differentiation is the most used transformation with time series.
We can see that the mean of the random component of the time series seems to stay constant over time, but the variance seems to get bigger near 1997.
Exercise 2
Apply a the log()
function on EuStockMarkets[,1]
and repeat the step of exercise 1. The logarithmic transformation is often used to stabilise non constant variance.
Exercise 3
Use the adf.test()
function from the tseries
package to test if the time series you obtain in the last exercise is stationary. Use a lag of 1.
Exercise 4
Load and plot the co2
dataset from the R library dataset
. Use the lowess()
function to create a trend line and add it to the plot of the time series.
By looking at the last plot, we can see that the time series oscillate from one side to the other of the trend line with a constant period. That characteristic is called seasonally and is often observed in time series. Just think about temperature, which change predictably from season to season.
Exercise 5
To eliminate the upward trend in the data use the diff()
function and save the resulting time series in a variable called diff.co2
. Plot the autocorrelation plot of diff.co2
.
Exercise 6
This last autocorrelation plot has years for unit which is not really intuitive in our scenario. Make another autocorrelation plot where the x axis has months as units. By looking at this plot, can you tell what is the seasonal period of this time series?
Another way to verify if the time series show seasonnality is to use the tbats
function from the forecast package. As his named says, this function fits a tbats model on the time series and return a smooth curve that fit the data. We’ll learn more about that model in a future post.
Exercise 7
Use the tbats
function on the co2
time series and store the result in a variable called tbats.model
. If the time series show sign of seasonality, the $seasonal.periods
value of tbats.model
will store the period value, else the result will be null.
Exercise 8
Use the diff()
function with the appropriate lag to remove the seasonality of the co2
time series, then use it again to remove the trend. Plot the resulting random component and the autocorrelation plot.
Exercise 9
Apply the adf test, the kpss test and the LjungBox test on the result of the last exercise to make sure that the random component is stationary.
Exercise 10
An interesting way to analyse a time series is to use the decompose()
function which uses a moving average to estimate the seasonal, random and trend component of a time series. With that in mind, use this function and plot each component of the co2
time series.
]]>
https://www.rexercises.com/2017/04/25/forecastingforsmallbusinessexercisespart3/feed/
0

Forecasting: Exponential Smoothing Exercises (Part3)
https://www.rexercises.com/2017/04/17/forecastingexponentialsmoothingexercisespart3/
https://www.rexercises.com/2017/04/17/forecastingexponentialsmoothingexercisespart3/#comments
Mon, 17 Apr 2017 15:50:37 +0000
https://www.rexercises.com/?p=5283
Related exercise sets:
 Forecasting: Linear Trend and ARIMA Models Exercises (Part2)
 Forecasting: ARIMAX Model Exercises (Part5)
 Forecasting: Time Series Exploration Exercises (Part1)
 Become a Top R Programmer Fast with our Individual Coaching Program
 Explore all our (>4000) R exercises
 Find an R course using our R Course Finder directory
]]>
Exponential smoothing is a method of finding patterns in time series, which can be used to make forecasts. In its simple form, exponential smoothing is a weighted moving average: each smoothed value is a weighted average of all past time series values (with weights decreasing exponentially from the most recent to the oldest values). In more complicated forms, exponential smoothing is applied to a time series recursively to allow for a trend and seasonality. In that case, the model is said to consist of three components – error, trend, and seasonality, from which another notation for exponential smoothing (“ETS”) is derived.
This set of exercises focuses primarily on the ets
function from the forecast
package. The function can be used to apply various exponential smoothing methods (including Holt’s and HoltWinters’ methods), and allows for both automatic and manual selection of the model structure (for example, whether the model includes trend and seasonal components). The exercises are based on the monthly data on US civilian unemployment rate as a percentage of the labor force for 20122017 retrieved from FRED, the Federal Reserve Bank of St. Louis database (download here)
For other parts of the series follow the tag forecasting.
Answers to the exercises are available here.
Exercise 1
Load the data, transform it the the ts
type (indicating that the data is monthly and the first period is January 2012), and plot it.
Exercise 2
Use the ses
function from the forecast
package to get a forecast based on simple exponential smoothing for the next 12 months, and plot the forecast.
Exercise 3
Estimate an exponential smoothing model using the ets
function with default parameters. Then pass the model as input to the forecast
function to get a forecast for the next 12 months, and plot the forecast (both functions are from the forecast
package).
Exercise 4
Print a summary of the model estimated in the previous exercise, and find the automatically estimated structure of the model. Does it include trend and seasonal components? If those components are present are they additive or multiplicative?
Exercise 5
Use the ets
function to estimate an exponential smoothing model with a damped trend. Make a forecast based on the model for the next 12 months, and plot it.
Exercise 6
Use the ets
function to estimate another model that does not include a trend component. Make a forecast based on the model for the next 12 months, and plot it.
Exercise 7
Find a function in the forecast
package that estimates the BATS model (exponential smoothing state space model with BoxCox transformation, ARMA errors, trend and seasonal components). Use it to estimate the model with a dumped trend, and make a forecast. Plot the forecast.
Exercise 8
Use the accuracy
function from the forecast
package to get a matrix of accuracy measures for the forecast obtained in the previous exercise. Explore the structure of the matrix, and save a measure of the mean absolute error (MAE) in a variable.
Exercise 9
Write a function that inputs a time series and a list of model estimation functions, calculates forecasts for the next 12 periods using each of the functions (with default parameters), and outputs the forecast with the smallest mean absolute error.
Run the function using the unemployment
time series and a list of functions that includes ets
, bats
, and auto.arima
. Plot the obtained result.
Exercise 10
Modify the function written in the previous exercise so that it prints the mean absolute error for each forecasting model along with the name of that model (the name can be retrieved from the forecast
object).
]]>
https://www.rexercises.com/2017/04/17/forecastingexponentialsmoothingexercisespart3/feed/
1

Forecasting: Linear Trend and ARIMA Models Exercises (Part2)
https://www.rexercises.com/2017/04/15/forecastinglinearmodelsexercisespart2/
https://www.rexercises.com/2017/04/15/forecastinglinearmodelsexercisespart2/#comments
Sat, 15 Apr 2017 15:50:24 +0000
http://rexercises.com/?p=5026
Related exercise sets:
 Forecasting: ARIMAX Model Exercises (Part5)
 Forecasting: Exponential Smoothing Exercises (Part3)
 Forecasting: Multivariate Regression Exercises (Part4)
 Become a Top R Programmer Fast with our Individual Coaching Program
 Explore all our (>4000) R exercises
 Find an R course using our R Course Finder directory
]]>
There are two main approaches to time series forecasting. One of them is to find persistent patterns in a time series itself, and extrapolate those patterns. Another approach is to discover how a series depend on other variables, which serve as predictors.
This set of exercises focuses on the first approach, while the second one will be considered in a later set. The present set allows to practice in applying three forecasting models:
– a naive model, which provides probably the simplest forecasting technique, but still can be useful as a benchmark for evaluating other methods,
– a linear trend model (based on a simple linear regression),
– the ARIMA model, a more sophisticated and popular model, which assumes a linear dependence of a time series on its past values and random shocks.
The exercises do not require a deep understanding of underlying theories, and make use of automatic model estimation functions included in the forecast
package. The set also help to practice in retrieving useful data from forecasts (confidence intervals, forecast errors), and comparing predictive accuracy of different models. The exercises are based on data on ecommerce retail sales in the USA for 19992016 retrieved from FRED, the Federal Reserve Bank of St. Louis database (download here). The data represent quarterly sales volumes in millions of dollars.
For other parts of the series follow the tag forecasting
Answers to the exercises are available here
Exercise 1
Read the data from the file, and transform it into a time series (ts
) object (given that the data is quarterly and the starting period is the fourth quarter of 1999).
Plot the obtained series.
Exercise 2
Make a naive forecast for the next 8 periods using the appropriate function from the forecast
package (i.e. create an object of the class forecast
using the function that implements the naive method of forecasting) (Note that this method sets all forecast values equal to the last known time series value).
Exercise 3
Plot the forecast values.
Exercise 4
Make a forecast for the next 8 periods based on a linear model in two steps:
(1) create a linear regression model for the forecast using the tslm
function from the forecast
package (use the series as the dependent variable, trend and season as independent variables),
(2) make a forecast based on the model using the forecast
function from the same package.
Plot the forecast.
Exercise 5
Retrieve forecast errors (residuals) from the linear model based forecast and save them as a separate variable.
Exercise 6
Make a forecast for the next 8 periods based on the ARIMA model in two steps:
(1) create a model using the auto.arima
function from the forecast
package,
(2) make a forecast based on the model using the forecast
function from the same package.
Plot the forecast.
Exercise 7
Print the summary of the forecast based on the ARIMA model.
Exercise 8
Explore the structure the forecast summary. Find the forecast value for the last period, and its 5% confidence interval values.
Exercise 9
Retrieve forecast errors (residuals) from the ARIMA based forecast.
Exercise 10
Use the errors from the ARIMA based forecast and the errors from the linear model based forecast to compare predictive accuracy of the two models with the DieboldMariano test (implemented as a function in the forecast
package). Test the hypothesis that the ARIMA based forecast is more accurate than the linear model based forecast.
]]>
https://www.rexercises.com/2017/04/15/forecastinglinearmodelsexercisespart2/feed/
5

Forecasting for small business Exercises (Part1)
https://www.rexercises.com/2017/04/13/forecastingforsmallbusinessexercisespart1/
https://www.rexercises.com/2017/04/13/forecastingforsmallbusinessexercisespart1/#respond
Thu, 13 Apr 2017 15:50:50 +0000
http://rexercises.com/?p=4977
Related exercise sets:
 Forecasting for small business Exercises (Part2)
 Forecasting for small business Exercises (Part4)
 Forecasting: Linear Trend and ARIMA Models Exercises (Part2)
 Become a Top R Programmer Fast with our Individual Coaching Program
 Explore all our (>4000) R exercises
 Find an R course using our R Course Finder directory
]]>
Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of the demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive.
This series of posts will teach you how to use data to make sound prediction. In this first set of exercises, you’ll learn the essential concepts we’ll use throughout the series: how to use the fundamental R tools for time series analysis, how to verify if a time series is stationary and how to make prediction in that context.
Answers to the exercises are available here.
To be able to do theses exercises, you have to have installed the R packages forecast
and tseries
.
Exercise 1
Use the data
function to load the treering
dataset from the R library. This dataset is loaded as an R time series object which is a vector whose value are ordered chronologically. Look at the structure of this ts
object and use another function to find the number of observations in the dataset.
Exercise 2
Use the function window(ts, start, end)
to select the observations in the treering dataset from the year 1500 to the year 2000.
Exercise 3
Do a basic plot of the treering dataset and use the abline
function to add an horizontal red line representing the mean of the dataset.
Looking at the previous plot, we get the sense that these values are randomly distributed around the red line who represent the mean of the dataset. Also the magnitude of the random fluctuation of the points seems to stay stable over time. Such a time series is called “stationary” and it is a propriety we prefer to observe in a time series when we want to make predictions.
To make sure that the time series is stationary, we’ll draw the autocorrelation plot and run the BoxLjung test, KwiatkowskiPhillipsSchmidtShin test and the Augmented Dickey–Fuller test on the dataset.
Exercise 4
Load the forecast
package and use the Acf
function to draw the autocorrelation plot of the time series.
Exercise 5
Use the Box.test
function to apply the BoxLjung test on the data set. Set the parameter lag to the maximum value of Lag in the previous plot.
Exercise 6
Load the tseries
package and apply the KwiatkowskiPhillipsSchmidtShin test by using the kpss.test
function on the data.
Exercise 7
Use the adf.test
function to apply the Augmented Dickey–Fuller tstatistic test on the dataset. Set the argument alternative
to “stationary”.
Exercise 8
Use the HoltWinters Filtering method to apply an exponential smoothing on the time series. Use the function HoltWinters
with the parameters beta
and gamma
set to “FALSE” to select the exponential smoothing and start the function at the first observation of the dataset. Store the result in a variable named HW
.
Exercise 9
With the forecast.HoltWinters
function make predictions for the next 5 years, store the results in a variable named prediction
and print it to the screen.
Exercise 10
Use the plot.forecast
function from the forecast
package to plot your predictions.
]]>
https://www.rexercises.com/2017/04/13/forecastingforsmallbusinessexercisespart1/feed/
0

Forecasting: Time Series Exploration Exercises (Part1)
https://www.rexercises.com/2017/04/10/forecastingtimeseriesexplorationexercisespart1/
https://www.rexercises.com/2017/04/10/forecastingtimeseriesexplorationexercisespart1/#comments
Mon, 10 Apr 2017 15:50:21 +0000
http://rexercises.com/?p=5000
Related exercise sets:
 Forecasting: Exponential Smoothing Exercises (Part3)
 Forecasting: Linear Trend and ARIMA Models Exercises (Part2)
 Forecasting: ARIMAX Model Exercises (Part5)
 Become a Top R Programmer Fast with our Individual Coaching Program
 Explore all our (>4000) R exercises
 Find an R course using our R Course Finder directory
]]>
R provides powerful tools for forecasting time series data such as sales volumes, population sizes, and earthquake frequencies. A number of those tools are also simple enough to be used without mastering sophisticated underlying theories. This set of exercises is the first in a series offering a possibility to practice in the use of such tools, which include the ARIMA model, exponential smoothing models, and others.
The first set provides a training in exploration of regularly spaced time series data (such as weekly, monthly, and quarterly), which may be useful for selection of a predictive model. The set covers:
– visual inspection of time series,
– estimation of trend and seasonal patterns,
– finding whether a series is stationary (i.e. whether it has a constant mean and variance),
– examination of correlation between lagged values of a time series (autocorrelation).
The exercises make use of functions from the packages forecast
, and tseries
. Exercises are based on a dataset on retail sales volume by US clothing and clothing accessory stores for 19922016 retrieved from FRED, the Federal Reserve Bank of St. Louis database (download here). The data represent monthly sales in millions of dollars.
For other parts of the series follow the tag forecasting
Answers to the exercises are available here
Exercise 1
Read the data from the file sales.csv
.
Exercise 2
Transform the data into a time series object of the ts
type (indicate that the data is monthly, and the starting period is January 1992).
Print the data.
Exercise 3
Plot the time series. Ensure that the y
axis starts from zero.
Exercise 4
Use the gghistogram
function from the forecast
package to visually inspect the distribution of time series values. Add a kernel density estimate and a normal density function to the plot.
Exercise 5
Use the decompose
function to break the series into seasonal, trend, and irregular components (apply multiplicative decomposition).
Plot the decomposed series.
Exercise 6
Explore the structure of the decomposed object, and find seasonal coefficients (multiples). Identify the three months with the greatest coefficients, and the three months with the smallest coefficients. (Note that the coefficients are equal in different years).
Exercise 7
Check whether the time series is trendstationary (i.e. its mean and variance are constant with respect to a trend) using the function kpss.test
from the tseries
package. (Note that the null hypothesis of the test is that the series is trendstationary).
Exercise 8
Use the diff
function to create a differenced time series (i.e. a series that includes differences between the values of the original series), and test it for trend stationarity.
Exercise 9
Plot the differenced time series.
Exercise 10
Use the Acf
and Pacf
functions from the forecast
package to explore autocorrelation of the differenced series. Find at which lags correlation between lagged values is statistically significant at 5% level.
]]>
https://www.rexercises.com/2017/04/10/forecastingtimeseriesexplorationexercisespart1/feed/
5