Introduction

Automatic forecasts of large numbers of univariate time series are often needed in business. It is common to have over one thousand product lines that need forecasting at least monthly. Even when a smaller number of forecasts are required, there may be nobody suitably trained in the use of time series models to produce them. In these circumstances, an automatic forecasting algorithm is an essential tool. Automatic forecasting algorithms must determine an appropriate time series model, estimate the parameters and compute the forecasts. They must be robust to unusual time series patterns, and applicable to large numbers of series without user intervention. The most popular automatic forecasting algorithms are based on either exponential smoothing or ARIMA models.

In this article, we discuss the implementation of two automatic univariate forecasting methods in the package for . We also briefly describe some univariate forecasting methods that are part of the package.

The package for the system for statistical computing ( Development Core Team 2008) is available from the Comprehensive Archive Network at . Version 8.23.0.9000 of the package was used for this paper. The package contains functions for univariate forecasting and a few examples of real time series data. For more extensive testing of forecasting methods, the package contains the 90 data sets from Spyros Makridakis, Wheelwright, and Hyndman (1998), the package contains 24 data sets from Hyndman et al. (2008), and the package contains the 1001 time series from the M-competition (S. Makridakis et al. 1982) and the 3003 time series from the M3-competition (Spyros Makridakis and Hibon 2000).

The package implements automatic forecasting using exponential smoothing, ARIMA models, the Theta method (Assimakopoulos and Nikolopoulos 2000), cubic splines (Hyndman, King, et al. 2005), as well as other common forecasting methods. In this article, we primarily discuss the exponential smoothing approach (in Section ) and the ARIMA modelling approach (in Section ) to automatic forecasting. In Section , we describe the implementation of these methods in the package, along with other features of the package.

Exponential smoothing

Although exponential smoothing methods have been around since the 1950s, a modelling framework incorporating procedures for model selection was not developed until relatively recently. J. K. Ord, Koehler, and Snyder (1997), Hyndman et al. (2002) and Hyndman, Koehler, et al. (2005) have shown that all exponential smoothing methods (including non-linear methods) are optimal forecasts from innovations state space models.

Exponential smoothing methods were originally classified by Pegels’ (1969) taxonomy. This was later extended by Gardner (1985), modified by Hyndman et al. (2002), and extended again by Taylor (2003), giving a total of fifteen methods seen in the following table.

Some of these methods are better known under other names. For example, cell (N,N) describes the simple exponential smoothing (or SES) method, cell (A,N) describes Holt’s linear method, and cell (A,N) describes the damped trend method. The additive Holt-Winters’ method is given by cell (A,A) and the multiplicative Holt-Winters’ method is given by cell (A,M). The other cells correspond to less commonly used but analogous methods.

Point forecasts for all methods

We denote the observed time series by y₁, y₂, …, y_n. A forecast of y_t + h based on all of the data up to time t is denoted by ŷ_t + h|t. To illustrate the method, we give the point forecasts and updating equations for method (A,A), the Holt-Winters’ additive method: where m is the length of seasonality (e.g., the number of months or quarters in a year), ℓ_t represents the level of the series, b_t denotes the growth, s_t is the seasonal component, ŷ_t + h|t is the forecast for h periods ahead, and h_m⁺ = [(h − 1) mod m] + 1. To use method , we need values for the initial states ℓ₀, b₀ and s_1 − m, …, s₀, and for the smoothing parameters α, β^* and γ. All of these will be estimated from the observed data.

Equation is slightly different from the usual Holt-Winters equations such as those in Spyros Makridakis, Wheelwright, and Hyndman (1998) or Bowerman, O’Connell, and Koehler (2005). These authors replace with s_t = γ^*(y_t − ℓ_t) + (1 − γ^*)s_t − m. If ℓ_t is substituted using , we obtain s_t = γ^*(1 − α)(y_t − ℓ_t − 1 − b_t − 1) + {1 − γ^*(1 − α)}s_t − m. Thus, we obtain identical forecasts using this approach by replacing γ in with γ^*(1 − α). The modification given in was proposed by J. K. Ord, Koehler, and Snyder (1997) to make the state space formulation simpler. It is equivalent to Archibald’s (1990) variation of the Holt-Winters’ method.

Table gives recursive formulae for computing point forecasts h periods ahead for all of the exponential smoothing methods. Some interesting special cases can be obtained by setting the smoothing parameters to extreme values. For example, if α = 0, the level is constant over time; if β^* = 0, the slope is constant over time; and if γ = 0, the seasonal pattern is constant over time. At the other extreme, naïve forecasts (i.e., ŷ_t + h|t = y_t for all h) are obtained using the (N,N) method with α = 1. Finally, the additive and multiplicative trend methods are special cases of their damped counterparts obtained by letting ϕ = 1.

Innovations state space models

For each exponential smoothing method in Table , Hyndman et al. (2008) describe two possible innovations state space models, one corresponding to a model with additive errors and the other to a model with multiplicative errors. If the same parameter values are used, these two models give equivalent point forecasts, although different prediction intervals. Thus there are 30 potential models described in this classification.

Historically, the nature of the error component has often been ignored, because the distinction between additive and multiplicative errors makes no difference to point forecasts.

We are careful to distinguish exponential smoothing from the underlying state space . An exponential smoothing method is an algorithm for producing point forecasts only. The underlying stochastic state space model gives the same point forecasts, but also provides a framework for computing prediction intervals and other properties.

To distinguish the models with additive and multiplicative errors, we add an extra letter to the front of the method notation. The triplet (E,T,S) refers to the three components: error, trend and seasonality. So the model ETS(A,A,N) has additive errors, additive trend and no seasonality—in other words, this is Holt’s linear method with additive errors. Similarly, ETS(M,M,M) refers to a model with multiplicative errors, a damped multiplicative trend and multiplicative seasonality. The notation ETS(⋅,⋅,⋅) helps in remembering the order in which the components are specified.

Once a model is specified, we can study the probability distribution of future values of the series and find, for example, the conditional mean of a future observation given knowledge of the past. We denote this as $\mu_{t+h|t} = \E(y_{t+h} \mid \bm{x}_t)$, where x_t contains the unobserved components such as ℓ_t, b_t and s_t. For h = 1 we use μ_t ≡ μ_t + 1|t as a shorthand notation. For many models, these conditional means will be identical to the point forecasts given in Table , so that μ_t + h|t = ŷ_t + h|t. However, for other models (those with multiplicative trend or multiplicative seasonality), the conditional mean and the point forecast will differ slightly for h ≥ 2.

We illustrate these ideas using the damped trend method of Gardner and McKenzie (1985).

Let μ_t = ŷ_t = ℓ_t − 1 + b_t − 1 denote the one-step forecast of y_t assuming that we know the values of all parameters. Also, let ε_t = y_t − μ_t denote the one-step forecast error at time t. From the equations in Table , we find that We simplify the last expression by setting β = αβ^*. The three equations above constitute a state space model underlying the damped Holt’s method. Note that it is an state space model (Anderson and Moore 1979; Aoki 1987) because the same error term appears in each equation. We an write it in standard state space notation by defining the state vector as x_t = (ℓ_t, b_t)′ and expressing – as The model is fully specified once we state the distribution of the error term ε_t. Usually we assume that these are independent and identically distributed, following a normal distribution with mean 0 and variance σ², which we write as ε_t ∼ NID(0, σ²).

A model with multiplicative error can be derived similarly, by first setting ε_t = (y_t − μ_t)/μ_t, so that ε_t is the relative error. Then, following a similar approach to that for additive errors, we find or Again we assume that ε_t ∼ NID(0, σ²).

Of course, this is a nonlinear state space model, which is usually considered difficult to handle in estimating and forecasting. However, that is one of the many advantages of the innovations form of state space models — we can still compute forecasts, the likelihood and prediction intervals for this nonlinear model with no more effort than is required for the additive error model.

State space models for all exponential smoothing methods

There are similar state space models for all 30 exponential smoothing variations. The general model involves a state vector x_t = (ℓ_t, b_t, s_t, s_t − 1, …, s_{t − m + 1})′ and state space equations of the form where {ε_t} is a Gaussian white noise process with mean zero and variance σ², and μ_t = w(x_t − 1). The model with additive errors has r(x_t − 1) = 1, so that y_t = μ_t + ε_t. The model with multiplicative errors has r(x_t − 1) = μ_t, so that y_t = μ_t(1 + ε_t). Thus, ε_t = (y_t − μ_t)/μ_t is the relative error for the multiplicative model. The models are not unique. Clearly, any value of r(x_t − 1) will lead to identical point forecasts for y_t.

All of the methods in Table can be written in the form and . The specific form for each model is given in Hyndman et al. (2008).

Some of the combinations of trend, seasonality and error can occasionally lead to numerical difficulties; specifically, any model equation that requires division by a state component could involve division by zero. This is a problem for models with additive errors and either multiplicative trend or multiplicative seasonality, as well as for the model with multiplicative errors, multiplicative trend and additive seasonality. These models should therefore be used with caution.

The multiplicative error models are useful when the data are strictly positive, but are not numerically stable when the data contain zeros or negative values. So when the time series is not strictly positive, only the six fully additive models may be applied.

The point forecasts given in Table are easily obtained from these models by iterating equations and for t = n + 1, n + 2, …, n + h, setting ε_n + j = 0 for j = 1, …, h. In most cases (notable exceptions being models with multiplicative seasonality or multiplicative trend for h ≥ 2), the point forecasts can be shown to be equal to $\mu_{t+h|t} = \E(y_{t+h} \mid \bm{x}_t)$, the conditional expectation of the corresponding state space model.

The models also provide a means of obtaining prediction intervals. In the case of the linear models, where the forecast distributions are normal, we can derive the conditional variance $v_{t+h|t} = \VAR (y_{t+h} \mid \bm{x}_t)$ and obtain prediction intervals accordingly. This approach also works for many of the nonlinear models. Detailed derivations of the results for many models are given in Hyndman, Koehler, et al. (2005).

A more direct approach that works for all of the models is to simply simulate many future sample paths conditional on the last estimate of the state vector, x_t. Then prediction intervals can be obtained from the percentiles of the simulated sample paths. Point forecasts can also be obtained in this way by taking the average of the simulated values at each future time period. An advantage of this approach is that we generate an estimate of the complete predictive distribution, which is especially useful in applications such as inventory planning, where expected costs depend on the whole distribution.

Estimation

In order to use these models for forecasting, we need to know the values of x₀ and the parameters α, β, γ and ϕ. It is easy to compute the likelihood of the innovations state space model , and so obtain maximum likelihood estimates. J. K. Ord, Koehler, and Snyder (1997) show that is equal to twice the negative logarithm of the likelihood function (with constant terms eliminated), conditional on the parameters θ = (α, β, γ, ϕ)′ and the initial states x₀ = (ℓ₀, b₀, s₀, s₋₁, …, s_−m + 1)′, where n is the number of observations. This is easily computed by simply using the recursive equations in Table . Unlike state space models with multiple sources of error, we do not need to use the Kalman filter to compute the likelihood.

The parameters θ and the initial states x₀ can be estimated by minimizing L^*. Most implementations of exponential smoothing use an ad hoc heuristic scheme to estimate x₀. However, with modern computers, there is no reason why we cannot estimate x₀ along with θ, and the resulting forecasts are often substantially better when we do.

We constrain the initial states x₀ so that the seasonal indices add to zero for additive seasonality, and add to m for multiplicative seasonality. There have been several suggestions for restricting the parameter space for α, β and γ. The traditional approach is to ensure that the various equations can be interpreted as weighted averages, thus requiring α, β^* = β/α, γ^* = γ/(1 − α) and ϕ to all lie within (0, 1). This suggests 0 < α < 1, 0 < β < α, 0 < γ < 1 − α, and 0 < ϕ < 1. However, Hyndman, Akram, and Archibald (2008) show that these restrictions are usually stricter than necessary (although in a few cases they are not restrictive enough).

Model selection

Forecast accuracy measures such as mean squared error (MSE) can be used for selecting a model for a given set of data, provided the errors are computed from data in a hold-out set and not from the same data as were used for model estimation. However, there are often too few out-of-sample errors to draw reliable conclusions. Consequently, a penalized method based on the in-sample fit is usually better.

One such approach uses a penalized likelihood such as Akaike’s Information Criterion: $$\mbox{AIC} = L^*(\hat{\bm\theta},\hat{\bm{x}}_0) + 2q, $$ where q is the number of parameters in θ plus the number of free states in x₀, and $\hat{\bm\theta}$ and $\hat{\bm{x}}_0$ denote the estimates of θ and x₀. We select the model that minimizes the AIC amongst all of the models that are appropriate for the data.

The AIC also provides a method for selecting between the additive and multiplicative error models. The point forecasts from the two models are identical so that standard forecast accuracy measures such as the MSE or mean absolute percentage error (MAPE) are unable to select between the error types. The AIC is able to select between the error types because it is based on likelihood rather than one-step forecasts.

Obviously, other model selection criteria (such as the BIC) could also be used in a similar manner.

Automatic forecasting

We combine the preceding ideas to obtain a robust and widely applicable automatic forecasting algorithm. The steps involved are summarized below.

Hyndman et al. (2002) applied this automatic forecasting strategy to the M-competition data (S. Makridakis et al. 1982) and the IJF-M3 competition data (Spyros Makridakis and Hibon 2000) using a restricted set of exponential smoothing models, and demonstrated that the methodology is particularly good at short term forecasts (up to about 6 periods ahead), and especially for seasonal short-term series (beating all other methods in the competitions for these series).

ARIMA models

A common obstacle for many people in using Autoregressive Integrated Moving Average (ARIMA) models for forecasting is that the order selection process is usually considered subjective and difficult to apply. But it does not have to be. There have been several attempts to automate ARIMA modelling in the last 25 years.

Hannan and Rissanen (1982) proposed a method to identify the order of an ARMA model for a stationary series. In their method the innovations can be obtained by fitting a long autoregressive model to the data, and then the likelihood of potential models is computed via a series of standard regressions. They established the asymptotic properties of the procedure under very general conditions.

Gómez (1998) extended the Hannan-Rissanen identification method to include multiplicative seasonal ARIMA model identification. Gómez and Maravall (1998) implemented this automatic identification procedure in the software and . For a given series, the algorithm attempts to find the model with the minimum BIC.

Liu (1989) proposed a method for identification of seasonal ARIMA models using a filtering method and certain heuristic rules; this algorithm is used in the software. Another approach is described by Mélard and Pasteels (2000) whose algorithm for univariate ARIMA models also allows intervention analysis. It is implemented in the software package ``Time Series Expert’’ ().

Other algorithms are in use in commercial software, although they are not documented in the public domain literature. In particular, (Goodrich 2000) is well-known for its excellent automatic ARIMA algorithm which was used in the M3-forecasting competition (Spyros Makridakis and Hibon 2000). Another proprietary algorithm is implemented in (Reilly 2000). K. Ord and Lowe (1996) provide an early review of some of the commercial software that implement automatic ARIMA forecasting.

Choosing the model order using unit root tests and the AIC

A non-seasonal ARIMA(p, d, q) process is given by ϕ(B)(1 − B^d)y_t = c + θ(B)ε_t where {ε_t} is a white noise process with mean zero and variance σ², B is the backshift operator, and ϕ(z) and θ(z) are polynomials of order p and q respectively. To ensure causality and invertibility, it is assumed that ϕ(z) and θ(z) have no roots for |z| < 1 (Brockwell and Davis 1991). If c ≠ 0, there is an implied polynomial of order d in the forecast function.

The seasonal ARIMA(p, d, q)(P, D, Q)_m process is given by Φ(B^m)ϕ(B)(1 − B^m)^D(1 − B)^dy_t = c + Θ(B^m)θ(B)ε_t where Φ(z) and Θ(z) are polynomials of orders P and Q respectively, each containing no roots inside the unit circle. If c ≠ 0, there is an implied polynomial of order d + D in the forecast function.

The main task in automatic ARIMA forecasting is selecting an appropriate model order, that is the values p, q, P, Q, D, d. If d and D are known, we can select the orders p, q, P and Q via an information criterion such as the AIC: AIC = −2log (L) + 2(p + q + P + Q + k) where k = 1 if c ≠ 0 and 0 otherwise, and L is the maximized likelihood of the model fitted to the data (1 − B^m)^D(1 − B)^dy_t. The likelihood of the full model for y_t is not actually defined and so the value of the AIC for different levels of differencing are not comparable.

One solution to this difficulty is the ``diffuse prior’’ approach which is outlined in Durbin and Koopman (2001) and implemented in the function (Ripley 2002) in . In this approach, the initial values of the time series (before the observed values) are assumed to have mean zero and a large variance. However, choosing d and D by minimizing the AIC using this approach tends to lead to over-differencing. For forecasting purposes, we believe it is better to make as few differences as possible because over-differencing harms forecasts (Smith and Yadav 1994) and widens prediction intervals. (Although, see Hendry 1997 for a contrary view.)

Consequently, we need some other approach to choose d and D. We prefer unit-root tests. However, most unit-root tests are based on a null hypothesis that a unit root exists which biases results towards more differences rather than fewer differences. For example, variations on the Dickey-Fuller test (Dickey and Fuller 1981) all assume there is a unit root at lag 1, and the HEGY test of Hylleberg et al. (1990) is based on a null hypothesis that there is a seasonal unit root. Instead, we prefer unit-root tests based on a null hypothesis of no unit-root.

For non-seasonal data, we consider ARIMA(p, d, q) models where d is selected based on successive KPSS unit-root tests (Kwiatkowski et al. 1992). That is, we test the data for a unit root; if the test result is significant, we test the differenced data for a unit root; and so on. We stop this procedure when we obtain our first insignificant result.

For seasonal data, we consider ARIMA(p, d, q)(P, D, Q)_m models where m is the seasonal frequency and D = 0 or D = 1 depending on an extended Canova-Hansen test (Canova and Hansen 1995). Canova and Hansen only provide critical values for 2 < m < 13. In our implementation of their test, we allow any value of m > 1. Let C_m be the critical value for seasonal period m. We plotted C_m against m for values of m up to 365 and noted that they fit the line C_m = 0.269m^0.928 almost exactly. So for m > 12, we use this simple expression to obtain the critical value.

We note in passing that the null hypothesis for the Canova-Hansen test is not an ARIMA model as it includes seasonal dummy terms. It is a test for whether the seasonal pattern changes sufficiently over time to warrant a seasonal unit root, or whether a stable seasonal pattern modelled using fixed dummy variables is more appropriate. Nevertheless, we have found that the test is still useful for choosing D in a strictly ARIMA framework (i.e., without seasonal dummy variables). If a stable seasonal pattern is selected (i.e., the null hypothesis is not rejected), the seasonality is effectively handled by stationary seasonal AR and MA terms.

After D is selected, we choose d by applying successive KPSS unit-root tests to the seasonally differenced data (if D = 1) or the original data (if D = 0). Once d (and possibly D) are selected, we proceed to select the values of p, q, P and Q by minimizing the AIC. We allow c ≠ 0 for models where d + D < 2.

A step-wise procedure for traversing the model space

Suppose we have seasonal data and we consider ARIMA(p, d, q)(P, D, Q)_m models where p and q can take values from 0 to 3, and P and Q can take values from 0 to 1. When c = 0 there is a total of 288 possible models, and when c ≠ 0 there is a total of 192 possible models, giving 480 models altogether. If the values of p, d, q, P, D and Q are allowed to range more widely, the number of possible models increases rapidly. Consequently, it is often not feasible to simply fit every potential model and choose the one with the lowest AIC. Instead, we need a way of traversing the space of models efficiently in order to arrive at the model with the lowest AIC value.

We propose a step-wise algorithm as follows. There are several constraints on the fitted models to avoid problems with convergence or near unit-roots. The constraints are outlined below.

The algorithm is guaranteed to return a valid model because the model space is finite and at least one of the starting models will be accepted (the model with no AR or MA parameters). The selected model is used to produce forecasts.

Comparisons with exponential smoothing

There is a widespread myth that ARIMA models are more general than exponential smoothing. This is not true. The two classes of models overlap. The linear exponential smoothing models are all special cases of ARIMA models—the equivalences are discussed in Hyndman, Akram, and Archibald (2008). However, the non-linear exponential smoothing models have no equivalent ARIMA counterpart. On the other hand, there are many ARIMA models which have no exponential smoothing counterpart. Thus, the two model classes overlap and are complimentary; each has its strengths and weaknesses.

The exponential smoothing state space models are all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models—that is, non-seasonal models with either no trend or damped trend—have one unit root. It is possible to define a stationary model with similar characteristics to exponential smoothing, but this is not normally done. The philosophy of exponential smoothing is that the world is non-stationary. So if a stationary model is required, ARIMA models are better.

One advantage of the exponential smoothing models is that they can be non-linear. So time series that exhibit non-linear characteristics including heteroscedasticity may be better modelled using exponential smoothing state space models.

For seasonal data, there are many more ARIMA models than the 30 possible models in the exponential smoothing class of Section . It may be thought that the larger model class is advantageous. However, the results in Hyndman et al. (2002) show that the exponential smoothing models performed better than the ARIMA models for the seasonal M3 competition data. (For the annual M3 data, the ARIMA models performed better.) In a discussion of these results, Hyndman (2001) speculates that the larger model space of ARIMA models actually harms forecasting performance because it introduces additional uncertainty. The smaller exponential smoothing class is sufficiently rich to capture the dynamics of almost all real business and economic time series.

The forecast package

The algorithms and modelling frameworks for automatic univariate time series forecasting are implemented in the package in . We illustrate the methods using the following four real time series shown in Figure .

$Four time series showing point forecasts and 80\% \& 95\% prediction intervals obtained using exponential smoothing state space models.$

Four time series showing point forecasts and 80% & 95% prediction intervals obtained using exponential smoothing state space models.

Implementation of the automatic exponential smoothing algorithm

The innovations state space modelling framework described in Section is implemented via the function in the package. (The default settings of do not allow models with multiplicative trend, but they can be included using .) The models chosen via the algorithm for the four data sets were:

Although there is a lot of computation involved, it can be handled remarkably quickly on modern computers. Each of the forecasts shown in Figure took no more than a few seconds on a standard PC. The US electricity generation series took the longest as there are no analytical prediction intervals available for the ETS(M,M,N) model. Consequently, the prediction intervals for this series were computed using simulation of 5000 future sample paths.

To apply the algorithm to the US net electricity generation time series , we use the following command.

etsfit <- ets(usnetelec)

The object is of class ``’’ and contains all of the necessary information about the fitted model including model parameters, the value of the state vector x_t for all t, residuals and so on. Printing the object shows the main items of interest.

etsfit

## ETS(M,A,N) 
## 
## Call:
## ets(y = usnetelec)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.2191 
## 
##   Initial states:
##     l = 254.9338 
##     b = 38.3125 
## 
##   sigma:  0.0259
## 
##      AIC     AICc      BIC 
## 634.0437 635.2682 644.0803

Some goodness-of-fit measures (defined in Hyndman and Koehler 2006) are obtained using .

accuracy(etsfit)

##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 1.162583 52.00363 36.77721 0.2629582 1.942062 0.5211014
##                     ACF1
## Training set 0.006113498

There are also , , , , and methods for objects of class ``’’. The function shows time plots of the original time series along with the extracted components (level, growth and seasonal).

The function computes the required forecasts which are then plotted as in Figure (b).

Printing the object gives a table showing the prediction intervals.

fcast

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2004       3900.329 3770.801 4029.857 3702.233 4098.425
## 2005       3952.650 3747.279 4158.022 3638.562 4266.738
## 2006       4004.972 3725.589 4284.355 3577.692 4432.251
## 2007       4057.293 3701.885 4412.701 3513.743 4600.842
## 2008       4109.614 3674.968 4544.259 3444.881 4774.347
## 2009       4161.935 3644.367 4679.503 3370.383 4953.487
## 2010       4214.256 3609.881 4818.632 3289.944 5138.569
## 2011       4266.577 3571.428 4961.726 3203.439 5329.716
## 2012       4318.898 3528.985 5108.812 3110.830 5526.967
## 2013       4371.220 3482.552 5259.888 3012.119 5730.320

The function also provides the useful feature of applying a fitted model to a new data set. For example, we could withhold 10 observations from the data set when fitting, then compute the one-step forecast errors for the out-of-sample data.

fit <- ets(usnetelec[1:45])
test <- ets(usnetelec[46:55], model = fit)
accuracy(test)

We can also look at the measures of forecast accuracy where the forecasts are based on only the fitting data.

accuracy(forecast(fit,10), usnetelec[46:55])

The HoltWinters() function

There is another implementation of exponential smoothing in via the function (Meyer 2002) in the package. It implements only the (N,N), (A,N), (A,A) and (A,M) methods. The initial states x₀ are fixed using a heuristic algorithm. Because of the way the initial states are estimated, a full three years of seasonal data are required to implement the seasonal forecasts using . (See Hyndman and Kostenko (2007) for the minimal sample size required.) The smoothing parameters are optimized by minimizing the average squared prediction errors, which is equivalent to minimizing in the case of additive errors.

There is a method for the resulting object which can produce point forecasts and prediction intervals. Although it is nowhere documented, it appears that the prediction intervals produced by for an object of class are based on an equivalent ARIMA model in the case of the (N,N), (A,N) and (A,A) methods, assuming additive errors. These prediction intervals are equivalent to the prediction intervals that arise from the (A,N,N), (A,A,N) and (A,A,A) state space models. For the (A,M) method, the prediction interval provided by appears to be based on Chatfield and Yar (1991) which is an approximation to the true prediction interval arising from the (A,A,M) model. Prediction intervals with multiplicative errors are not possible using the function.

Implementation of the automatic ARIMA algorithm

$Four time series showing point forecasts and 80\% \& 95\% prediction intervals obtained using ARIMA models.$

Four time series showing point forecasts and 80% & 95% prediction intervals obtained using ARIMA models.

The algorithm of Section is applied to the same four time series. Unlike the exponential smoothing algorithm, the ARIMA class of models assumes homoscedasticity, which is not always appropriate. Consequently, transformations are sometimes necessary. For these four time series, we model the raw data for series (a)–(c), but the logged data for series (d). The prediction intervals are back-transformed with the point forecasts to preserve the probability coverage.

To apply this algorithm to the US net electricity generation time series , we use the following commands.

arimafit <- auto.arima(usnetelec)
fcast <- forecast(arimafit)
plot(fcast)

The function implements the algorithm of Section and returns an object of class . The resulting forecasts are shown in Figure . The fitted models are as follows:

Note that the parameterization has θ(B) = (1 + θ₁B + … + θ_qB) and ϕ(B) = (1 − ϕ₁B + … − ϕ_qB), and similarly for the seasonal terms.

A summary of the forecasts is available, part of which is shown below.

Forecast method: ARIMA(2,1,2) with drift
Series: usnetelec

Coefficients:
          ar1      ar2     ma1     ma2    drift
      -1.3032  -0.4332  1.5284  0.8340  66.1585
s.e.   0.2122   0.2084  0.1417  0.1185   7.5595

sigma^2 estimated as 2262:  log likelihood=-283.34
AIC=578.67   AICc=580.46   BIC=590.61

Error measures:
                   ME   RMSE    MAE      MPE   MAPE    MASE     ACF1
Training set 0.046402 44.894 32.333 -0.61771 2.1012 0.45813 0.022492

Forecasts:
     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2004       3968.957 3908.002 4029.912 3875.734 4062.180
2005       3970.350 3873.950 4066.751 3822.919 4117.782
2006       4097.171 3971.114 4223.228 3904.383 4289.959
2007       4112.332 3969.691 4254.973 3894.182 4330.482
2008       4218.671 4053.751 4383.591 3966.448 4470.894
2009       4254.559 4076.108 4433.010 3981.641 4527.476
2010       4342.760 4147.088 4538.431 4043.505 4642.014
2011       4393.306 4185.211 4601.401 4075.052 4711.560
2012       4470.261 4248.068 4692.455 4130.446 4810.077
2013       4529.113 4295.305 4762.920 4171.535 4886.690

The training set error measures for the two models are very similar. Note that the information criteria are not comparable.

The package also contains the function which is largely a wrapper to the function in the package. The function in the package makes it easier to include a drift term when d + D = 1. (Setting in the function from the package will only work when d + D = 0.) It also provides the facility for fitting an existing ARIMA model to a new data set (as was demonstrated for the function earlier).

One-step forecasts for ARIMA models are now available via a function. We also provide a new function which returns the original time series after adjusting for regression variables. If there are no regression variables in the ARIMA model, then the errors will be identical to the original series. If there are regression variables in the ARIMA model, then the errors will be equal to the original series minus the effect of the regression variables, but leaving in the serial correlation that is modelled with the AR and MA terms. In contrast, provides true residuals, removing the AR and MA terms as well.

The generic functions , , and apply to models obtained from either the or functions.

The forecast() function

The function is generic and has S3 methods for a wide range of time series models. It computes point forecasts and prediction intervals from the time series model. Methods exist for models fitted using , , , , , and .

There is also a method for a object. If a time series object is passed as the first argument to , the function will produce forecasts based on the exponential smoothing algorithm of Section .

In most cases, there is an existing function which is intended to do much the same thing. Unfortunately, the resulting objects from the function contain different information in each case and so it is not possible to build generic functions (such as and ) for the results. So, instead, acts as a wrapper to , and packages the information obtained in a common format (the class). We also define a default method which is used when no existing function exists, and calls the relevant function. Thus, methods parallel methods, but the latter provide consistent output that is more usable.

The output from the function is an object of class ``’’ and includes at least the following information:

There are , and methods for the ``’’ class. Figures and were produced using the method.

The prediction intervals are, by default, computed for 80% and 95% coverage, although other values are possible if requested. Fan charts (Wallis 1999) are possible using the combination .

Other functions

We now briefly describe some of the other features of the package. Each of the following functions produces an object of class ``’’.

: implements the method of Croston (1972) for intermittent demand forecasting. In this method, the time series is decomposed into two separate sequences: the non-zero values and the time intervals between non-zero values. These are then independently forecast using simple exponential smoothing and the forecasts of the original series are obtained as ratios of the two sets of forecasts. No prediction intervals are provided because there is no underlying stochastic model (Shenstone and Hyndman 2005).

: provides forecasts from the Theta method (Assimakopoulos and Nikolopoulos 2000). Hyndman and Billah (2003) showed that these were equivalent to a special case of simple exponential smoothing with drift.

: gives cubic-spline forecasts, based on fitting a cubic spline to the historical data and extrapolating it linearly. The details of this method, and the associated prediction intervals, are discussed in Hyndman, King, et al. (2005).

: returns forecasts based on the historical mean.

: gives ``naïve’’ forecasts equal to the most recent observation assuming a random walk model. This function also allows forecasting using a random walk with drift.

In addition, there are some new plotting functions for time series.

: provides a time plot along with an ACF and PACF.

: produces a seasonal plot as described in Spyros Makridakis, Wheelwright, and Hyndman (1998).

Bibliography

Anderson, B. D. O., and J. B. Moore. 1979. Optimal Filtering. Englewood Cliffs: Prentice-Hall.

Aoki, Masanao. 1987. State Space Modeling of Time Series. Berlin: Springer-Verlag.

Assimakopoulos, V., and K. Nikolopoulos. 2000. “The Theta Model: A Decomposition Approach to Forecasting.” International Journal of Forecasting 16: 521–30.

Bowerman, B. L., R. T. O’Connell, and Anne B. Koehler. 2005. Forecasting, Time Series and Regression: An Applied Approach. Belmont CA: Thomson Brooks/Cole.

Brockwell, P. J., and R. A Davis. 1991. Time Series: Theory and Methods. 2nd ed. New York: Springer-Verlag.

Canova, F., and B. E. Hansen. 1995. “Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability.” Journal of Business and Economic Statistics 13: 237–52.

Chatfield, Chris, and Mohammad Yar. 1991. “Prediction Intervals for Multiplicative Holt-Winters.” International Journal of Forecasting 7: 31–37.

Croston, J. D. 1972. “Forecasting and Stock Control for Intermittent Demands.” Operational Research Quarterly 23 (3): 289–304.

Development Core Team. 2008. : A Language and Environment for Statistical Computing. Vienna, Austria: Foundation for Statistical Computing. http://www.R-project.org/.

Dickey, D. A., and W. A. Fuller. 1981. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica 49: 1057–71.

Durbin, J, and Siem J Koopman. 2001. Time Series Analysis by State Space Methods. Oxford: Oxford University Press.

Gardner, Everette S., Jr. 1985. “Exponential Smoothing: The State of the Art.” Journal of Forecasting 4: 1–28.

Gardner, Everette S., Jr, and Ed McKenzie. 1985. “Forecasting Trends in Time Series.” Management Science 31 (10): 1237–46.

Gómez, Victor. 1998. “Automatic Model Identification in the Presence of Missing Observations and Outliers.” Working paper D-98009. Ministerio de Economı́a y Hacienda, Dirección General de Análisis y Programación Presupuestaria.

Gómez, Victor, and Agustín Maravall. 1998. “Programs and , Instructions for the Users.” Working paper 97001. Beta version. Ministerio de Economı́a y Hacienda, Dirección General de Análisis y Programación Presupuestaria.

Goodrich, Robert L. 2000. “The Methodology.” International Journal of Forecasting 16 (4): 533–35.

Hannan, E. J., and J. Rissanen. 1982. “Recursive Estimation of Mixed Autoregressive-Moving Average Order.” Biometrika 69 (1): 81–94.

Hendry, David F. 1997. “The Econometrics of Macroeconomic Forecasting.” The Economic Journal 107 (444): 1330–1357.

Hylleberg, S., R. Engle, C. Granger, and B. Yoo. 1990. “Seasonal Integration and Cointegration.” Journal of Econometrics 44: 215–38.

Hyndman, Rob J. 2001. “It’s Time to Move from ‘What’ to ‘Why’—Comments on the M3-Competition.” International Journal of Forecasting 17 (4): 567–70.

Hyndman, Rob J, Md Akram, and Blyth C Archibald. 2008. “The Admissible Parameter Space for Exponential Smoothing Models.” Annals of the Institute of Statistical Mathematics 60 (2): 407–26.

Hyndman, Rob J, and Baki Billah. 2003. “Unmasking the Theta Method.” International Journal of Forecasting 19 (2): 287–90.

Hyndman, Rob J, and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for r.” Journal of Statistical Software 27.

Hyndman, Rob J, Maxwell L. King, Ivet Pitrun, and Baki Billah. 2005. “Local Linear Forecasts Using Cubic Smoothing Splines.” Australian & New Zealand Journal of Statistics 47 (1): 87–99.

Hyndman, Rob J, and Anne B Koehler. 2006. “Another Look at Measures of Forecast Accuracy.” International Journal of Forecasting 22: 679–88.

Hyndman, Rob J, Anne B Koehler, J Keith Ord, and Ralph D Snyder. 2005. “Prediction Intervals for Exponential Smoothing Using Two New Classes of State Space Models.” Journal of Forecasting 24: 17–37.

———. 2008. Forecasting with Exponential Smoothing: The State Space Approach. Springer-Verlag. http://www.exponentialsmoothing.net/.

Hyndman, Rob J, Anne B Koehler, Ralph D Snyder, and Simone Grose. 2002. “A State Space Framework for Automatic Forecasting Using Exponential Smoothing Methods.” International Journal of Forecasting 18 (3): 439–54.

Hyndman, Rob J, and Andrey V Kostenko. 2007. “Minimum Sample Size Requirements for Seasonal Forecasting Models.” Foresight: The International Journal of Applied Forecasting 6: 12–15.

Kwiatkowski, Denis, Peter C. B. Phillips, Peter Schmidt, and Yongcheol Shin. 1992. “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root.” Journal of Econometrics 54: 159–78.

Liu, L. M. 1989. “Identification of Seasonal Arima Models Using a Filtering Method.” Communications in Statistics: Theory & Methods 18: 2279–88.

Makridakis, S., A. Anderson, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J. Newton, E. Parzen, and R. Winkler. 1982. “The Accuracy of Extrapolation (Time Series) Methods: Results of a Forecasting Competition.” Journal of Forecasting 1: 111–53.

Makridakis, Spyros, and Michele Hibon. 2000. “The M3-Competition: Results, Conclusions and Implications.” International Journal of Forecasting 16: 451–76.

Makridakis, Spyros, Steven C. Wheelwright, and Rob J Hyndman. 1998. Forecasting: Methods and Applications. 3rd ed. New York: John Wiley & Sons. http://www.robhyndman.info/forecasting/.

Mélard, G., and J.-M Pasteels. 2000. “Automatic ARIMA Modeling Including Intervention, Using Time Series Expert Software.” International Journal of Forecasting 16: 497–508.

Meyer, David. 2002. “Naive Time Series Forecasting Methods.” News 2 (2): 7–10. http://CRAN.R-project.org/doc/Rnews/.

Ord, J. Keith, Anne B. Koehler, and Ralph D. Snyder. 1997. “Estimation and Prediction for a Class of Dynamic Nonlinear Statistical Models.” Journal of the American Statistical Association 92: 1621–29.

Ord, Keith, and Sam Lowe. 1996. “Automatic Forecasting.” The American Statistician 50 (1): 88–94.

Reilly, David. 2000. “The System.” International Journal of Forecasting 16 (4): 531–33.

Ripley, Brian D. 2002. “Time Series in 1.5.0.” News 2 (2): 2–7. http://CRAN.R-project.org/doc/Rnews/.

Shenstone, Lydia, and Rob J Hyndman. 2005. “Stochastic Models Underlying Croston’s Method for Intermittent Demand Forecasting.” Journal of Forecasting 24: 389–402.

Smith, Jeremy, and Sanjay Yadav. 1994. “Forecasting Costs Incurred from Unit Differencing Fractionally Integrated Processes.” International Journal of Forecasting 10 (4): 507–14.

Taylor, James W. 2003. “Exponential Smoothing with a Damped Multiplicative Trend.” International Journal of Forecasting 19: 715–25.

Wallis, K. F. 1999. “Asymmetric Density Forecasts of Inflation and the Bank of England’s Fan Chart.” National Institute Economic Review 167 (1): 106–12.