2.2.5.1.2 Algorithm for ARIMA

ARIMA model means an autoregressive integrated moving average model. And it may include autoregressive(AR), moving average (MA) or differencing. In this app, nag function nag_tsa_multi_inp_model_estim (g13bec) is used to fit an ARIMA model [1], and nag function nag_tsa_multi_inp_model_forecast (g13bjc) is used to forecast future values by a known ARIMA model [2].

1 ARIMA Model
2 Estimation
- 2.1 Estimation Criterion
3 Quantities
4 Forecast
5 Reference

ARIMA Model

For a general ARIMA model,

$\begin{equation}\tag{1} \begin{split} \nabla ^d \nabla_s^D y_t &= c + w_t \\ w_t &= \Phi_1 w_{t-s} + \Phi_2 w_{t-2s} + ... + \Phi_P w_{t-Ps} + e_t - \Theta_1 e_{t-s} - \Theta_2 e_{t-2s} - ... - \Theta_Q e_{t-Qs} \\ e_t &= \phi_1 e_{t-1} + \phi_2 e_{t-2} + ... + \phi_p e_{t-p} + a_t - \theta_1 a_{t-1} - \theta_2 a_{t-2} - ... - \theta_q e_{t-q} \end{split} \end{equation}$

where $y_t$ is the input time series (t = 1 ... n), P, Q, D, p, q, d are orders of seasonal autoregressive, seasonal moving average, seasonal differencing, autoregressive, moving average and differencing respectively. And s is the seasonal period. c is the mean of the differenced values, $\Phi_i (i=1 ... P)$ , $\Theta_i (i=1 ... Q)$ , $\phi_i (i=1 ... p)$ , $\theta_i (i=1 ... q)$ are coefficients for seasonal autoregressive, seasonal moving average, autoregressive and moving average. $a_t$ is the residual.

Estimation

Residual series $a_t$ can be obtained by $y_t$ in equation 1. Sum squares of residuals:

$\begin{equation}\tag{2}S = \sum_{-\infty}^n a_t^2 \end{equation}$

Estimation Criterion

Three criteria are available:

Least Squares

$D = S$

Iterate by minimizing D.

Exact Likelihood

$y_i, \; i=0, -1, ...$ are considered as unobserved random variables with known distribution.

$D = M \times S$

where the multiplier M is a function calculated from the ARIMA model arguments.

Minimizing D is equivalent to maximizing the exact likelihood of the data.

Marginal Likelihood

$D = M \times S$

but with a different value of M. It is distinct from exact likelihood method only if the mean term is included in the model.

In this app, Marquardt method [4] is used to minimize the objective function.

Quantities

Residual $a_t$

Residuals are available at $t \ge 1 + d + s \times D$ .

Fitted y

$\hat{y}_t = y_t - a_t$

Residual Degrees of Freedom

Differenced series length is: $N = n - d - s \times D$ , and $df = N - (\text{number of parameters})$ .

Residual Variance

$erv = \frac{S}{df}$

Covariance Matrix of Parameters

$C = erv \times H^{-1}$

where H is the linearised least squares matrix in the final iteration.

Forecast

To predict time series $y_t$ at t = n + 1, ... n + L, set $a_t = 0$ for t = n + 1, ... n + L, and calculate the predicted value by reversing Eq 1.

$\begin{equation}\tag{3} \begin{split} e_t &= \phi_1 e_{t-1} + \phi_2 e_{t-2} + ... + \phi_p e_{t-p} + a_t - \theta_1 a_{t-1} - \theta_2 a_{t-2} - ... - \theta_q e_{t-q} \\ w_t &= \Phi_1 w_{t-s} + \Phi_2 w_{t-2s} + ... + \Phi_P w_{t-Ps} + e_t - \Theta_1 e_{t-s} - \Theta_2 e_{t-2s} - ... - \Theta_Q e_{t-Qs} \\ y_t &= (\nabla ^d \nabla_s^D)^{-1} (c + w_t) \end{split} \end{equation}$

The forecast error variance of $y_{n+L}$ can be calculated as:

$S_L^2 = V_n \times (\psi_0^2 + \psi_1^2 + ... + \psi_{L-1}^2)$

where V_n is the residual variance of the ARIMA model, and $\psi_i$ is the "psi-weights" of the model as defined in [3].

Reference

nag_tsa_multi_inp_model_estim (g13bec)
nag_tsa_multi_inp_model_forecast (g13bjc)
George E. P. Box and Gwilym M. Jenkins (1976). Time Series Analysis: Forecasting and Control. (Revised Edition) Holden–Day
D. W. Marquardt (1963). "An algorithm for least squares estimation of nonlinear parameters". J. Soc. Indust. Appl. Math. 11 431.