Fast estimation methods for time series models in state

Fast estimation methods for time series models
in state-space form
Alfredo Garcia-Hiernaux∗
José Casals
Miguel Jerez
March 28, 2007
Abstract
We propose two new, fast and stable methods to estimate time
series models written in their equivalent state-space form. They are
useful both, to obtain adequate initial conditions for a maximumlikelihood iteration, or to provide final estimates when maximumlikelihood is considered inadequate or computationally expensive. The
state-space foundation of these procedures provides flexibility, as they
can be applied to any linear fixed-coefficients model, such as ARIMA,
VARMAX or structural time series models. A simulation exercise
shows that their computational costs and finite-sample performance
are very good.
Keywords: State-space models, subspace methods, Kalman filter, system
identification, maximum-likelihood
∗
Corresponding author. Departamento de Fundamentos del Análisis Económico II.
Facultad de Ciencias Económicas. Campus de Somosaguas. 28223 Madrid (SPAIN).
email: [email protected], tel: (+34) 91 394 26 26, fax: (+34) 91 394 25 91. The
authors gratefully acknowledge financial support from Ministerio de Educación y Ciencia,
ref. SEJ2005-07388.
1
1
Introduction
Many model estimation methods are based either on Least Squares (LS)
or Maximum Likelihood (ML). The LS approach has clear advantages in
terms of computational simplicity, stability and cost. However it is limited
in scope, as it cannot be directly applied to many models such as those with
moving average terms or multiplicative seasonality. On the other hand ML
typically requires iterative optimization, so its statistical efficiency and wide
scope is somewhat balanced with increased instability, complexity and computational cost. Specifically when modeling high-frequency time series, such
as those generated by financial markets, it may be hard to accept the assumptions and costs implied by gaussian ML.
As the pros and cons of LS and ML are complementary, many works
focus on devising methods that provide the best of both worlds. For example, Spliid (1983), Koreisha and Pukkila (1989, 1990), Flores and Serrano
(2002) and Dufour and Pelletier (2004) extend ordinary and generalized LS
to VARMAX modeling. Also Lütkepohl and Poskitt (1996) propose a LSbased method to specify and estimate a canonical VARMA structure. Finally,
Francq and Zakoı̈an (2000) provide weak GARCH representations that can
be estimated using two-stage LS. All these methods share two substantial
limitations. First, they avoid iteration when there are no exclusion constraints, but imposing such restrictions requires iterating and this partially
offsets their computational advantage. Second, these procedures are typically
devised for specific parameterizations, such as VARMAX, and their imple-
2
mentation in other frameworks is not trivial.
In this paper we propose two fast and stable algorithms to estimate time
series models written in their equivalent State-Space (SS) form. They are
useful, either to obtain adequate initial conditions for ML iteration, or to
provide final estimates when ML is considered inadequate or computationally expensive. On the other hand, their SS foundations provide a wide
scope, as they can be applied to any linear fixed-coefficients model with an
equivalent SS form.
Both algorithms are based on subspace methods, see Favoreel et al. (2000),
Katayama (2005), or Bauer (2005). The first method, SUBEST1, consists of
estimating the parametric matrices in the SS model by solving a weighted
reduced-rank LS problem, defined over a set of subspace regressions. Iteration is required when there are constraints relating the parameters in the
standard and SS representation of the model. Any exclusion or fixed-value
constraint is treated in the same way. Montecarlo experiments show that
this method is very fast and provides estimates close to those of ML in finite
samples. However, when compared with ML, it tends to underestimate the
MA parameters.
The second and more sophisticated procedure, SUBEST2, provides a
gaussian ML solution to the subspace regressions. Simulations show that it
improves over SUBEST1, both in precision and dispersion of the estimates,
while keeping a substantial advantage over ML in terms of computational
3
cost.
The structure of the paper is as follows. Section 2 defines the basic notation and previous results. Building on them, Section 3 derives the estimation
algorithms and Section 4 discusses their performance in finite samples. In
Section 5 we discuss the conditions under which each method is more adequate and indicate how to obtain a free MATLAB Toolbox implementing
them.
2
2.1
Notation and previous results
General state-space model
Consider an m-vector of endogenous outputs, zt , which is related to its
past and to the current and past values of an r-vector of exogenous inputs,
ut , through a time series model depending on a vector of unknown constant
parameters, β. Assume also that this model can be written in an equivalent
SS representation, such as:
xt+1 = Φxt + Γut + Ewt
zt = Hxt + Dut + Cvt
(1a)
(1b)
where the matrices Φ, Γ, E, H, D and C depend on β; xt ∈ Rn is the state
vector and, wt ∈ Rn and vt ∈ Rm are uncorrelated sequences of gaussian
errors, independent of ut , with zero mean and such that: E(wt wt0 ) = Q,
E(vt vt0 ) = R, E(vt wt0 ) = S. We will also assume this model to be minimal
4
and stable. Minimality means that there is no SS model that can realize zt
with less than n states. Stability requires all the eigenvalues of Φ to lie inside
the unit circle.
Subspace methods derive from a matrix representation of the system (1a1b). To obtain it, note that a recursive substitution in (1a) yields,
t
xt = Φ x0 +
t−1
X
j
Φ Ewt−j−1 +
j=0
t−1
X
Φj Γut−j−1
(2)
j=0
and substituting (2) into the observation equation (1b), we get:
t
zt = HΦ x0 + H
t−1
X
j
Φ Γut−j−1 + Dut + H
j=0
t−1
X
Φj Ewt−j−1 + Cvt (3)
j=0
so the endogenous variable, zt , depends on the initial state vector, x0 , the
present and past values of the inputs, ut , and both errors wt and vt . Under
these conditions, equation (3) can be written in matrix form as,
Zf = Oi Xf + Hiu Uf + Hiw Wf + Hiv Vf
(4)
The regression model (4) includes both, data-dependent and parameterdependent matrices. The data-dependent matrices are the Block-Hankel matrices (BHMs) Zf , Uf , Wf and Vf , as well as the state sequence Xf . The
row space of these matrices, which is denoted by their subscript and will be
defined in (6), depends on the parameter i. In this work, we will determine
i as the nearest integer to log T , being T the sample size. This heuristic
5
choice is supported by empirical experience, but our procedures could be implemented with any other choice, such as those suggested by Van Overschee
and De Moor (1994), Deistler et al. (1995), Ljung (1999) or Bauer (2005).
The BHM Zf organizes the “future” values of the outputs from t = i
onwards. Its structure is:

Zf = Zi:2i−1

zi+1 . . . zT −i 
 zi


 zi+1 zi+2 . . . zT −i+1 


= .
..
.. 
 ..
.
. 




z2i−1 z2i . . . zT −1
(5)
and the complement of (5) is the matrix of “past” values Zp , which structure
is:

Zp = Z0:i−1




=



z0
z1 . . .
z1
..
.
z2
..
.
zi−1 zi

zT −2i 

. . . zT −2i+1 


..

.


. . . zT −i−1
(6)
The matrix Zp will be employed in Section 3. The other BHMs Uf , Wf
and Vf have the same structure as Zf but depending, respectively, on ut ,
wt and vt instead of zt . For short hand notation we denote, any BHM A
as: A = A0:2i−1 , Ap+ = A0:i , Apr = Ai:i and Af + = Ai+1:2i−1 .
The last data-dependent matrix is the state sequence Xf , given by:
Xf = (xi
xi+1
xi+2
...
6
xT −i ),
with f = i.
(7)
The states in Xf are unobserved, so they must be estimated from the
sample. In Section 3.1 we describe how to do this.
On the other hand, equation (4) also depends on the following parameterrelated matrices: the Extended Observability matrix, Oi , the lower-block
diagonal Toeplitz matrices, Hiu and Hiw , and the block-diagonal matrix
Hiv . These matrices are defined by:
0
Oi =
H 0 (HΦ)0 (HΦ2 )0 . . . (HΦi−1 )0

D
0
0


 HΓ
D
0


Hiu = 
HΓ
D
 HΦΓ

..
..
..

.
.
.


HΦi−2 Γ HΦi−3 Γ HΦi−4 Γ

Hiw
0
0
0


 HE
0
0


=
HE
0
 HΦE

..
..
..

.
.
.


HΦi−2 E HΦi−3 E HΦi−4 E
7
...
...
...
..
.
...
...
...
...
..
.
...
0
∈ Rim×n ,
(8)



0


im×ir
,
0
∈R

.. 
.

D

0


0


im×im
,
0
∈R

.. 
.

0
(9)
(10)
and


C 0 . . . 0 


0 C ... 0


Hiv =  . .
 ∈ Rim×im
.
.
 .. ..
. . .. 




0 0 ... C
2.2
(11)
Steady-state innovations model
Our algorithms become simpler and more efficient when applied to a special SS representation known as steady-state innovations model, hereafter
“innovations model”. An innovations model is given by:
xt+1 = Φxt + Γut + Eψt
zt = Hxt + Dut + ψt
(12a)
(12b)
Note that this model can be viewed as a special case of (1a-1b) with C = Im ,
which is an identity m-by-m matrix, and wt = vt = ψt . Besides the assumptions made over model (1a-1b), we add a strict minimum-phase condition, so
all the eigenvalues (Φ − EH) lie inside the unit circle.
The innovations form (12a-12b) is convenient since it is general and it is
computationally efficient.
It is general because of two reasons. First, many time series models, including rational transfer functions and VARMAX, can be directly written in
the form (12a-12b), see Hannan and Deistler (1988), since they present only
8
one source of unobserved errors. On the other hand, any model in the form
(1a-1b) such as, e.g., the Structural Time Series Models (STSM, Harvey,
1989), can be written under weak assumptions in its equivalent innovations
form (see Casals et al., 1999, Theorem 1).
It is efficient because, its structure presents less parameters than the
general form (1a-1b), reducing the complexity, the computational cost and
improving the likelihood conditioning. Specifically, when the model is written
in the innovations form (12a-12b), equation (4) collapses to the simpler form:
Zf = Oi Xf + Hiu Uf + Hiψ Ψf
(13)
where Hiw and Hiv are replaced with Hiψ , that is just as (10) but with Im
in the main block diagonal.
3
Main results
The basic problem consists of estimating β, which is the vector of param-
eters in the original model. The parametric matrices Φ, Γ, E, H, D, C, Q,
R and S in the SS representation (1a-1b) of the model will be, in general,
nonlinear functions of β. In this section we describe two new algorithms,
based on subspace methods, to estimate all these parametric matrices by
iterating on β. To this end, we will start by discussing how can one compute
the state sequences in the matrix (7).
9
3.1
Estimation of the state sequences
As the final result does not depend on the specific SS representation employed, we will develop it from the simpler regression model (13). It implies
that the expectation of future output values, conditional to U , which contains the past and future information of the observed inputs, and to Zp ,
which contains the past information of the output, is,
E(Zf |U, Zp ) = Oi E(Xf |U, Zp )+Hiu E(Uf |U, Zp )+Hiψ E(Ψf |U, Zp ) (14)
where Uf is included in U . As we assumed in Section 2, the innovations ψt
are independent of the inputs and, by construction, E(zt−j ψt ) = 0 ∀j > 0,
so (14) simplifies to,
E(Zf |U, Zp ) = Oi E(Xf |U, Zp ) + Hiu Uf
(15)
Rearranging the terms of this equation, we obtain,
E(Xf |U, Zp ) = Oi† [E(Zf |U, Zp ) − Hiu Uf ]
(16)
which is an expression for the states conditional to the past of the output
and to the observed inputs. As all these variables are observable, from (16)
we can estimate the states by means of the orthogonal projections:
X̂f = Oi† (Zf ΠU,Zp − Hiu Uf )
10
(17)
where Oi† is the Moore-Penrose pseudo-inverse of Oi . Expression (17) contains parameters in Oi† and Hiu , while Zf ΠU,Zp denotes the orthogonal
projection of the stacked matrix ZUp .
3.2
Algorithm SUBEST1
3.2.1
Models in innovations form
We will now set a generalized non-linear least square problem for all the
parameters β. To do so, we start by writing equation (13) as:
Zf + = Oi−1 (Φ − EH)Xf + EZpr + (Γ − ED)Upr
ψ
u
Ψf +
+ Hi−1
Uf + + Hi−1
(18)
see Appendix A, where Xf and Ψf + are unobserved components. The first
addend in the square brackets corresponds to the states related to the past,
Zp and Up (see equation A.3). The second and third addends are updating
terms related to the present. Now, the expectation of (18) conditional to
Zp+ is:
E(Zf + |U, Zp+ ) = Oi−1 (Φ − EH)E(Xf |U, Zp+ )
+ EE(Zpr |U, Zp+ ) + (Γ − ED)E(Upr |U, Zp+ )
ψ
u
+ Hi−1
E(Uf + |U, Zp+ ) + Hi−1
E(Ψf + |U, Zp+ ) (19)
where, as Xf is only related to the past, one may substitute E(Xf |U, Zp+ )
for E(Xf |U, Zp ). Furthermore, since Upr and Uf + are included in U , Zpr
11
in Zp+ and E(Ψf + |U, Zp+ ) = 0, equation (19) simplifies to,
E(Zf + |U, Zp+ ) = Oi−1 (Φ − EH)E(Xf |U, Zp ) + EZpr
u
+ (Γ − ED)Upr + Hi−1
Uf +
(20)
Finally, substituting E(Xf |U, Zp ) by its estimate X̂f given in (17), we obtain
an estimate for the expected value of Zf + conditional to U and Zp+ :
u
Uf + (21)
Ẑf + = Oi−1 (Φ − EH)X̂f + EZpr + (Γ − ED)Upr + Hi−1
Note that this equation includes all the parametric matrices in (12a-12b),
Φ, Γ, E, H and D which, therefore, can be estimated simultaneously by
solving the weighted least-squares problem:
min
{Φ,Γ,E,H,D}
−1 2
Ω 2 Zf + ΠU,Z − Ẑf + p+
(22)
F
where k · kF denotes de Frobenius norm and Ω is the covariance matrix of
the prediction errors, which can be written as Ω = Zf + Π⊥
U,Z
cause Π⊥
U,Z
p+
p+
Zf + 0 , be-
is idempotent and where Π⊥ = I − Π. Note that Ω and
Zf + ΠU,Zp+ are projections of data-dependent matrices, while Ẑf + , defined
in (21), depends on both, the data and the parameters. This projections
can be efficiently computed using the QR decomposition (see Golub and Van
Loan, 1996).
In the same way that the matrices in model (12a-12b) depend on β, the
parametric matrices in the subspace equation (13) depend on the parameters
12
of the SS representation. Therefore, solving (22) requires iterative techniques
such as those described in, e.g., Dennis and Schnabel (1983).
Finally, we compute the residual vector, Z̃pr , by shifting the time subscripts in (13):
Z̃pr = Zpr − Ôi X̂f − Ĥiu Upr
(23)
Note that the row space dimension of Zpr is equal to 1, so by using just the
first row of matrices Ôi and Ĥiu , we obtain,
Z̃pr = Zpr − Ĥ X̂1 + D̂Upr
(24)
0
and, consequently, Q̂ = (T −2i+1)−1 Z̃pr Z̃pr
, where T −2i+1 is the number
of elements of the vector Z̃pr , is an estimate for the innovation variance.
3.2.2
General SS models
SUBEST1 can be applied to SS models with more than one source of
unobserved errors by using the following two-stage variant of the algorithm.
In the first step, we estimate the parameters in Φ, Γ, H and D by solving,
min
{Φ,Γ,H,D}
i
−1 h
2
Ω 2 Zf ΠU,Zp − Oi X̂f − H u Uf i
(25)
F
0
where Ω = Zf Π⊥
U,Zp Zf and the expression for the states is given in equa-
tion (17). Note that, while in (22) E is jointly estimated with the rest of
parametric matrices, here it will be calculated in a second stage.
13
In the second step, we use the state resulting from the solution of (25)
to obtain initial conditions for the Kalman filter gain, Ki and the prediction
error covariance, Bi (see Appendix B). Then, we compute the estimates for
E, C, Q, R and S by solving the constrained problem:
2
K̄
K̂i −
min
{E,C,Q,R,S} B̄
B̂i F
(26)
where the underlined matrices were previously estimated and the matrices
K̂i and B̂i in (26) must also satisfy the covariance equations of the Kalman
filter propagated i times:
0
K̂i = (Φ̂P̂i Ĥ + Ê Ŝ Ĉ 0 )B̂i−1
(27)
0
P̂i+1 = Φ̂P̂i Φ̂ + Ê Q̂Ê 0 − K̂i B̂i K̂i0
0
B̂i = Ĥ P̂i Ĥ + Ĉ R̂Ĉ 0
Therefore, in this second step we compute estimates for E, C, Q, R and S
such that they, (a) return K̂i and B̂i propagating equations (27) i times,
and (b) are close to the values K̄ and B̄ resulting from the previous stage.
3.3
Algorithm SUBEST2
Our second algorithm requires the SS model to be written in innova-
tions form. Actually this is not a limitation, because in each iteration we can
transform the input model to the equivalent innovations form using the algorithm of Casals et al. (1999). Note that we could not do this in SUBEST1,
14
since the matrices related to the noise E, C, Q, R and S cannot be jointly
estimated with the remaining parametric matrices, implementing a two-stage
estimation instead.
Equation (13) implies,
Hiψ Ψf = Zf − Oi Xf − Hiu Uf
(28)
where the states, Xf , will be replaced with the estimates given by (17). The
noise term Hiψ Ψf has a particular structure, with one-step-ahead errors in
the first row, two-step-ahead errors in the second row and so on. Under these
conditions, the gaussian loglikelihood function can then be written as,
log l(β) = −
0
im
i
1
log(2π) − log(|Σ|) − tr(Ψ0f Hiψ Σ−1 Hiψ Ψf )
2
2
2
(29)
where, tr(.) is the trace operator and Σ is the Prediction Error (PE) covariance. The noise term is obviously autocorrelated, so Σ has the following
structure,
Σ = Oi Pi Oi0 + Hiψ (Ii ⊗ Q)Hiψ
0
(30)
where ⊗ is the Kronecker product and Pi is the state covariance matrix.
There are two components in (30): the first addend in the right-hand-side
refers to the error covariance of the states, while the second addend corresponds to the future output error variance, conditional to the estimated
states. Taking into account the structure of Hiψ Ψf , its covariance matrix
15
can be defined as:


 Σjk ≡ P Ej covariance matrix when j = k
Σ=

 Σjk ≡ cov(P Ej , P Ek )
when j =
6 k
j, k = 1, 2, ..., i (31)
where PEj denotes the j-step-ahead prediction errors. Computing expression (30) requires, therefore, propagating i times the Kalman filter covariance
equations (27) as in Subsection 3.2.2, to get an estimate of Pi .
4
Montecarlo experiments
We will now analyze the performance of SUBEST1 and SUBEST2 in
finite samples using as benchmark a SS based ML algorithm (Casals et al.,
1999) initialized with the true parameter values. On the other hand, SUBEST1
and SUBEST2 will always be initialized with β = 0 and a value of i heuristically chosen as the nearest integer to log T , where T is the sample size.
Obviously, these settings deliberately favor the ML algorithm.
Tables 1-8 summarize the main results. The simulations in Tables 1-6
refer to different homoskedastic models. For each data generating process,
we have obtained 1,000 samples with T = 50 and T = 300 observations,
after discarding the first 50 values in each case to improve randomization.
On the other hand, Tables 7-8 show the results obtained for two common
conditional heteroskedastic models. In these cases the sample sizes are increased to T = 500 and T = 3, 000, to make them representative of the high
16
frequency series to which these models are typically applied.
Tables 1-2 show the results obtained assuming gaussian AR(2), ARMA(2,1)
and bivariate VARMA(2,1) data generating processes, respectively. In all
cases the AR parameters were chosen so that the roots are complex and far
from the unit circle. This avoids ill-conditioning situations due to approximate cancellation of real AR and MA roots. As could be expected:
1. The averages of ML estimates are closer to the true parameter values
for models with MA terms. However, if the model is pure AR, the root
mean-squared errors (RMSEs) of SUBEST2 estimates are comparable
to those of ML and generally better than those of SUBEST1.
2. Computational cost of ML ranges from 3.5-4 and 8-10 times the cost of
SUBEST1 for T = 50 and T = 300, respectively. On the other hand,
the cost of SUBEST2 typically doubles that of SUBEST1 for T = 50.
This overhead decreases when the sample size grows.
[INSERT TABLES 1-2]
Table 3 shows the results corresponding to a gaussian ARMA(1,1)×(0, 1)s
process with quarterly (s = 4) and monthly data (s = 12). We deliberately
deteriorated the conditioning of the loss function by choosing parameter values that generate a partial redundancy between the roots of the regular AR
and MA factors. Note that ML keeps an advantage in precision, but its
computational overhead multiplies the cost of SUBEST1 by a 4-6 factor in
the quarterly model and a 9-14 factor in the monthly model. This is due to
17
the high dimension of the state vector, which also affects the relative performance of SUBEST2, and to the ill-conditioning of the loss function. With
T = 300, ML and SUBEST2 provide similar results in terms of accuracy.
[INSERT TABLE 3]
The literature shows that, in small samples, ML estimates display a “PileUp Effect” (PUE). This effect means that the distribution of ML estimates
of moving average terms is multimodal, being some modes in the invertibility
boundary, see Ansley and Newbold (1980). To further analyze the properties
of the estimation methods, we therefore need more details about this PUE.
Table 4 shows the results of this analysis, which main conclusions are: i)
SUBEST1 and SUBEST2 present less PUE than ML, ii) discounting the
PUE, estimates of the MA parameters are very similar for SUBEST2 and
ML, and iii) SUBEST1 and SUBEST2 show slower convergence to the true
MA seasonal parameters when the seasonal frequency is large, e.g. s = 12.
[INSERT TABLE 4]
The state-space foundation of the methods proposed makes them very
flexible, in the sense that they can accommodate a wide class of models. To
illustrate this flexibility, Tables 5-6 and 7-8 show, respectively, the results
obtained with two unobserved variable models and two conditionally heteroscedastic formulations.
Specifically, Tables 5 and 6 show the results obtained with two typical
formulations in the SS literature: an STSM with a low signal-to-noise ratio
18
and an AR(2) model with observation errors, respectively. Table 6 shows
that in this case SUBEST1 is, not only very imprecise, but also the slower
method. This is due to the fact that the loss function considered in SUBEST1
does not depend on the error covariance matrix. Therefore, its estimates do
not take into account the null covariance restrictions. Also, the low signalto-noise ratio makes the matter worst, as it deteriorates the conditioning of
the likelihood surface, inducing convergence problems. On the other hand
SUBEST2 and ML estimates are more precise, as they take into account this
independence restriction, and their computational overhead is also similar,
due to the very small number of parameters to be estimated. Results provided
by the Table 6 are rather similar, being the main difference the improvement
of SUBEST1 performance due to a higher signal-to-noise ratio.
[INSERT TABLES 5-6]
Finally, Tables 7 and 8 present the results for two common high frequency
financial models: Autoregressive Stochastic Volatility (ARSV, see Harvey
et al., 1994), and Generalized Autoregressive Conditional Heteroskedasticity
(GARCH, see Bollerslev, 1986), models.
Table 7 displays an ARSV(1) model, where zt are the observed returns
and σt are the corresponding volatilities. In this type of models, the variance
of t is assumed to be known and, therefore, it is not estimated. Results of the
simulation are close to those of an AR with observation errors, because the
SS representation of both models is very similar. However, when the sample
size becomes larger, the computational cost of ML increases to 5-20 times
19
the cost of SUBEST1 and 3-12 times the cost of SUBEST2, for T = 500 and
T = 3, 000, respectively.
Table 8 shows that the computational disadvantage of ML in the GARCH
model is larger than in the ARSV. In particular, the load of ML is 49 times the
cost of SUBEST1 and more than 16 times the cost of SUBEST2 for T = 500.
When the sample size increases (T = 3, 000) SUBEST2 becomes the fastest
method. In fact, ML converges 90 and 79 times slower than SUBEST2 and
SUBEST1, respectively, while the three methods provide very similar results
in terms of precision.
[INSERT TABLES 7-8]
5
Concluding Remarks
We presented two new algorithms to estimate time series models. These procedures are fast, stable and flexible, as they can be applied to any model
written in its equivalent SS form.
Simulation results characterize the situations where each algorithm is
more adequate. SUBEST1 is a good choice when the model does not include
seasonal moving average terms and, as it is very fast, also when one wants
to compute initial conditions estimates for a posterior ML iteration. On
the other hand, SUBEST2 is more suitable as final estimation method, because its estimates are comparable to those of ML for large gaussian samples,
while maintaining a substantial advantage in terms of both, computational
20
cost and numerical stability.
SUBEST1 and SUBEST2 may particularly useful when modeling high
frequency financial data. The computational cost and instability of ML, as
well as the fact that samples are far from the Gaussian assumption, strongly
motivates the use of alternative estimation methods.
These algorithms are implemented in a MATLAB toolbox for time series
modeling called E 4 . The source code of this toolbox is freely provided under
the terms of the GNU General Public License and can be downloaded at
www.ucm.es/inf o/icae/e4. This site also includes a complete user manual
and other reference materials.
21
References
Ansley, C. F. and Newbold, P. (1980). Finite sample properties of estimators for autoregression moving average models. Journal of Econometrics,
13:159–183.
Bauer, D. (2005). Estimating linear dynamical systems using subspace methods. Econometric Theory, 21:181–211.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31:307–327.
Casals, J., Sotoca, S., and Jerez, M. (1999). A fast stable method to compute
the likelihood of time invariant state space models. Economics Letters,
65(3):329–337.
Deistler, M., Peternell, K., and Scherrer, W. (1995). Consistency and relative
efficency of subspace methods. Automatica, 31(12):1865–1875.
Dennis, J. E. and Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Prentice Hall, Englewood
Cliffs, NJ.
Dufour, J. M. and Pelletier, D. (2004). Linear estimation of weak VARMA
models with macroeconomic applications. Technical report, Centre Interuniversitaire de Recherche en Economie Quantitative (CIREQ), Université de Montréal, CANADA.
Favoreel, W., De Moor, B., and Van Overschee, P. (2000). Subspace state
22
space system identification for industrial processes. Journal of Process
Control, 10:149–155.
Flores, R. and Serrano, G. (2002). A generalized least squares estimation
method for VARMA models. Statistics, 36(4):303–316.
Francq, C. and Zakoı̈an, J. M. (2000). Estimating weak GARCH representations. Econometric Theory, 16:692–728.
Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations. Johns
Hopkins University Press, Baltimore, third edition.
Hannan, E. J. and Deistler, M. (1988). The Statistical Theory of Linear
Systems. John Wiley, New York.
Harvey, A. C. (1989). Forecasting, structural time series models and the
Kalman Filter. Cambridge University Press.
Harvey, A. C., Ruiz, E., and Shepard, N. (1994). Multivariate stochastic
variance models. Review of Economic Studies, 61:247–264.
Katayama, T. (2005). Subspace Methods for System Identification. Springer
Verlag, London.
Koreisha, S. G. and Pukkila, T. H. (1989). Fast linear estimation methods
for vector autoregressive moving average models. Journal of Time Series
Analysis, 10:325–339.
Koreisha, S. G. and Pukkila, T. H. (1990). A generalized least-squares approach for estimation of autoregressive moving-average models. Journal of
Time Series Analysis, 11:139–151.
23
Ljung, L. (1999). System Identification, Theory for the User. PTR Prentice
Hall, New Jersey, second edition.
Lütkepohl, H. and Poskitt, D. S. (1996).
Specification of echelon form
VARMA models. Journal of Business and Economic Statistics, 14(1):69–
79.
Spliid, H. (1983). A fast estimation method for the vector autoregressive
moving average model with exogenous variables. Journal of the American
Statistical Association, 78(384):843–849.
Van Overschee, P. and De Moor, B. (1994). N4SID: Subspace algorithms
for the identification of combined deterministic-stochastic systems. Automatica, 30(1):75–93.
24
Appendix A
From equation (13), displacing time subscripts, we obtain:
ψ
u
Zf + = Oi−1 Xi+1 + Hi−1
Uf + + Hi−1
Ψf +
(A.1)
Note that Zf + includes only the future block information while Zf integrates
the present and the future blocks. On the other hand, substituting (12b) into
(12a) and propagating the state equation yields:
i
xt = (Φ − EH) xt−i +
i−1
X
(Φ − EH)j [Ezt−1−j + (Γ − ED)ut−1−j ]
j=0
(A.2)
which can be written in subspace form as,
Xi = (Φ − EH)i Xp + Ξi (Φ − EH, E)Zp + Ξi (Φ − EH, Γ − ED)Up
(A.3)
where Ξi (A, B) = [Ai−1 B
Ai−2 B
... AB
B], for any matrices A, B.
Again, displacing time indices, we obtain:
Xi+1 = (Φ − EH)i+1 Xp + Ξi+1 (Φ − EH, E)Zp
+ Ξi+1 (Φ − EH, Γ − ED)Up
(A.4)
or, in the same way,
Xi+1 = (Φ − EH)Xi + EZpr + (Γ − ED)Upr
25
(A.5)
Finally, substituting (A.5) into (A.1), we get (18).
Appendix B
To obtain the initial conditions K̄ and B̄, we use the state equation of the
Kalman filter, that can be written in subspace form as:
X̂i+1 = Φ̂X̂i + Γ̂Upr + Ki Z̃pr
(B.1)
where Φ̂, Γ̂ and X̂i are previously estimated in (25) and the residual vector,
Z̃pr , is exactly as in (24). Also, as in equation (17), the state sequence X̂i+1
is computed as:
†
u
Uf + )
X̂i+1 = Oi−1
(Zf + ΠU,Zp+ − Hi−1
(B.2)
Thus, the only unknown matrix is Ki . From (B.1) we get K̄, an approximation for Ki , as:
†
K̄ = (X̂i+1 − Φ̂X̂i − Γ̂Upr )Z̃pr
(B.3)
Note that K̄ is a least square estimation of the Kalman filter gain from equation (B.1).
0
Finally, an approximation of Bi is obtained as, B̄ = (T −2i+1)−1 Z̃pr Z̃pr
.
26
Tables
Table 1† : Univariate nonseasonal models with gaussian errors.
Table 1.1 AR(2) model: (1-.4B +.3B 2 )zt = at ; at ∼ iidN (0, 1.0).
Method
T True values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
SUBEST1
-.4 .3 1.0
-.384 .305 .931
.144 .144 .203
.145 .144 .203
100%
-.397 .306 .985
.059 .066 .083
.059 .066 .083
100%
SUBEST2
-.4 .3 1.0
-.367 .284 .943
.136 .127 .198
.136 .127 .198
216%
-.391 .296 .992
.053 .055 .082
.054 .055 .083
219%
ML
-.4 .3 1.0
-.386 .304 .946
.137 .135 .196
.138 .135 .196
416%
-.395 .300 .994
.053 .056 .082
.054 .065 .083
1040%
Table 1.2 ARMA(2,1) model: (1-.4B +.3B 2 )zt = (1 − .8B)at ;
at ∼ iidN (0, 1.0).
Method
T True values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
SUBEST1
-.4 .3 -.8 1.0
-.360 .289 -.671 1.020
.305 .175 .291 .214
.308 .175 .318 .215
100%
-.398 .309 -.763 1.014
.085 .071 .061 .085
.085 .072 .072 .086
100%
†
SUBEST2
-.4 .3 -.8 1.0
-.307 .297 -.680 1.000
.222 .141 .254 .206
.240 .141 .281 .206
244%
-.389 .304 -.805 1.001
.066 .061 .074 .083
.067 .062 .074 .083
230%
-.4
-.354
.196
.201
-.395
.066
.066
ML
.3 -.8
.313 -.800
.141 .197
.141 .197
359%
.304 -.797
.061 .050
.062 .050
881%
ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined.
27
1.0
.927
.194
.207
.993
.081
.081
Table 2† : Bivariate VARMA model with gaussian errors.
1 − .70B + .60B 2
0
z1t
=
2
0
1 − 1.30B +
.50B
z2t
1 − .30B −.90B
a1t
.60B
1 − .80B
a2t
a1t
0
.07 .02
∼ iidN
,
a2t
0
− .05
Table 2.1: Estimation results with sample size T = 50
True values
Method
Average
Std. Dev
RMSE
Time
Method
Average
Std. Dev
RMSE
Time
Method
Average
Std. Dev
RMSE
Time
-.70 -1.30 .60 .50 -.30 .60 -.90
SUBEST1
-.671 -1.359 .545 .557 -.267 .431 -.743
.137 .233 .104 .177 .200 .104 .177
.140 .240 .118 .186 .203 .199 .236
100%
SUBEST2
-.620 -1.297 .519 .500 -.331 .453 -.832
.133 .202 .100 .151 .208 .121 .195
.155 .202 .129 .151 .210 .190 .207
348%
ML
-.686 -1.279 .589 .492 -.344 .654 -1.024
.087 .188 .072 .155 .175 .122 .179
.088 .189 .073 .155 .181 .134 .218
398%
†
-.80 .07 .02 .05
-.736 .079 .019 .056
.246 .017 .010 .014
.254 .019 .010 .015
-.776 .073 .017 .056
.251 .016 .010 .014
.252 .017 .010 .015
-.872 .060 .017 .044
.220 .014 .010 .010
.231 .017 .010 .012
ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined.
28
Table 2.2: Estimation results with sample size T = 300
True values
Method
Average
Std. Dev
RMSE
Time
Method
Average
Std. Dev
RMSE
Time
Method
Average
Std. Dev
RMSE
Time
-.70 -1.30 .60 .50 -.30 .60 -.90
SUBEST1
-.693 -1.318 .580 .515 -.276 .536 -.829
.044 .090 .038 .072 .084 .044 .063
.045 .092 .043 .074 .087 .078 .094
100%
SUBEST2
-.686 -1.311 .584 .506 -.289 .585 -.902
.039 .076 .034 .062 .070 .043 .072
.042 .076 .037 .062 .070 .046 .072
342%
ML
-.699 -1.296 .599 .500 -.305 .605 -.913
.030 .055 .023 .047 .062 .035 .052
.030 .055 .023 .047 .062 .035 .054
744%
29
-.80 .07 .02 .05
-.769 .074 .021 .054
.099 .006 .004 .005
.104 .008 .004 .006
-.831 .072 .019 .053
.089 .006 .004 .005
.095 .006 .004 .006
-.804 .069 .019 .049
.066 .006 .004 .004
.066 .006 .004 .004
Table 3† : Univariate seasonal ARMA models.
Table 3.1 ARMA(1, 1) × (0, 1)4 model: (1 − .4B)zt = (1 − .7B)(1 − .8B 4 )at ;
at ∼ iidN (0, 1.0)
Method
T True Values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
-.4
-.208
.557
.589
-.302
.250
.269
SUBEST1
-.7 -.8
-.390 -.456
.585 .168
.662 .383
100%
-.591 -.575
.267 .088
.288 .242
100%
SUBEST2
1.0
-.4 -.7 -.8
1.174 -.294 -.632 -.742
.336 .486 .482 .153
.378 .497 .486 .163
217%
1.228 -.365 -.665 -.826
.153 .147 .132 .064
.275 .151 .136 .069
227%
1.0
-.4
.924 -.325
.210 .325
.223 .333
.998 -.388
.084 .146
.084 .146
ML
-.7 -.8
-.683 -.835
.329 .169
.330 .173
441%
-.695 -.800
.122 .042
.122 .042
577%
1.0
.886
.204
.233
.990
.080
.081
Table 3.2 ARMA(1, 1) × (0, 1)12 model:
(1 − .4B)zt = (1 − .7B)(1 − .8B 12 )at ;
at ∼ iidN (0, 1.0)
Method
True values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
T
SUBEST1
-.7 -.8
-.493 -.368
.534 .128
.572 .450
100%
-.374 -.661 -.586
.221 .201 .055
.222 .205 .221
100%
-.4
-.240
.525
.549
SUBEST2
1.0
-.4 -.7 -.8
1.133 -.244 -.592 -.494
.261 .438 .454 .042
.293 .465 .467 .309
436%
1.221 -.379 -.691 -.812
.118 .143 .128 .073
.250 .145 .128 .074
371%
†
ML
1.0
-.4 -.7 -.8
1.073 -.324 -.680 -.807
.237 .331 .323 .224
.248 .340 .323 .224
907%
.970 -.386 -.694 -.805
.084 .135 .112 .048
.089 .136 .112 .048
1415%
ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined.
30
1.0
.902
.225
.245
.984
.083
.085
31
ARMA(2,1)
VARMA(2,1)
VARMA(2,1)
ARMA(1, 1) × (0, 1)4
ARMA(1, 1) × (0, 1)4
ARMA(1, 1) × (0, 1)12
ARMA(1, 1) × (0, 1)12
1.1
2.1
2.1
3.1
3.1
3.2
3.2
θ1 = −.8
θ11 = −.3
θ22 = −.8
θ1 = −.7
Θ1 = −.8
θ1 = −.7
Θ1 = −.8
Parameter
% of PUE
SUBEST1 SUBEST2
10.2
2.1
.0
.0
10.2
15.9
.0
1.5
.0
2.7
3.5
.6
.0
.0
We consider PUE when a MA parameter estimate is lesser or equal to -1.00.
Model
Table
Estimates discounting PUE
ML SUBEST1 SUBEST2 ML
21.5
-.624
-.671
-.740
.0
-.267
-.331
-.344
26.6
-.695
-.704
-.777
22.6
-.390
-.623
-.627
32.2
-.456
-.738
-.735
23.9
-.472
-.590
-.590
46.4
-.368
-.494
-.629
Table 4: Measuring the Pile-Up Effect (PUE) in small samples (T = 50).
Table 5† : Structural time series model.
Tt+1
1 1
Tt
0
=
+
η
∆t+1
0 1
∆t
1 t
Tt
ηt
0
.01 0
zt = (1 0)
+ t ;
∼ iidN
,
∆t
t
0
0 10
Method
T True values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
SUBEST1
.01
10
1.475 8.473
3.275 3.693
3.588 3.996
395%
.461 10.137
.146 .476
.474 .495
273%
SUBEST2
.01 10
.022 9.726
.040 1.406
.042 1.432
125%
.019 9.977
.014 .443
.017 .444
100%
ML
.01 10
.035 9.911
.069 1.005
.073 1.009
100%
.008 10.005
.006 .409
.007 .409
171%
Table 6† : AR(2) model with observation errors.
(1 − 1.5B + .8B 2 )zt∗ = at ; zt = zt∗ + vt ;
at
0
1.0 0
∼ iidN
,
vt
0
0 .5
Method
T True values
Average
50 Std. Dev
RMSE
Time
Average
300 Std. Dev
RMSE
Time
SUBEST1
-1.5 .8 1.0
-1.425 .731 1.030
.268 .187 .219
.278 .199 .221
100%
-1.491 .798 .990
.057 .046 .081
.057 .047 .081
100%
.5
.558
.202
.210
.530
.069
.076
†
SUBEST2
-1.5 .8 1.0
-1.429 .737 1.045
.133 .121 .195
.150 .136 .200
204%
-1.487 .788 1.010
.047 .044 .075
.048 .046 .076
193%
.5
-1.5
.558 -1.480
.177 .128
.186 .129
.513 -1.492
.057 .044
.059 .045
ML
.8 1.0
.781 .954
.119 .185
.121 .190
142%
.793 .997
.043 .072
.043 .072
253%
ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined.
32
.5
.469
.169
.172
.493
.055
.056
Table 7† : ARSV(1) model in State Space form.
zt = 1.5σt t
2
) = .95 log(σt2 ) + ηt
log(σt+1
where ηt ∼ iidN (0, .5), t ∼ iidN (0, 1) and, t and ηt are independent.
Method
True values
Average
500 Std. Dev
RMSE
Time
Average
3000 Std. Dev
RMSE
Time
T
SUBEST1
.95 1.50 .50
.932 1.414 .469
.164 1.105 .146
.165 1.108 .149
100%
.949 1.485 .497
.010 .202 .044
.010 .202 .044
100%
SUBEST2
.95 1.50 .50
.945 1.453 .462
.041 1.027 .146
.041 1.028 .151
168%
.949 1.484 .494
.009 .198 .044
.009 .198 .044
166%
ML
1.50 .50
1.471 .517
.460 .104
.461 .106
476%
.948 1.490 .501
.009 .183 .038
.009 .184 .039
2059%
.95
.933
.034
.038
Table 8† : GARCH(1,1) model in ARMA form.
yt = t with t |Ωt−1 ∼ iidN (0, σt2 ) where
2t = 1.0 +
Method
T
True values
Average
500 Std. Dev
RMSE
Time
Average
3000 Std. Dev
RMSE
Time
1−.80B
v
1−.97B t
with vt = 2t − σt2
SUBEST1
SUBEST2
ML
1.0 -.97 -.80
1.0 -.97 -.80
1.0 -.97 -.80
1.005 -.972 -.798 1.002 -.969 -.814 1.209 -.952 -.784
.766 .025 .076 .760 .031 .106 1.420 .037 .056
.766 .025 .076 .760 .031 .107 1.436 .041 .058
100%
298%
4916%
.999 -.974 -.810 .998 -.972 -.809 1.095 -.967 -.797
.314 .015 .052 .313 .019 .068 .880 .011 .018
.314 .016 .053 .313 .019 .069 .885 .011 .018
114%
100%
9047%
†
ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined.
33