Fast estimation methods for time series models in state-space form Alfredo Garcia-Hiernaux∗ José Casals Miguel Jerez March 28, 2007 Abstract We propose two new, fast and stable methods to estimate time series models written in their equivalent state-space form. They are useful both, to obtain adequate initial conditions for a maximumlikelihood iteration, or to provide final estimates when maximumlikelihood is considered inadequate or computationally expensive. The state-space foundation of these procedures provides flexibility, as they can be applied to any linear fixed-coefficients model, such as ARIMA, VARMAX or structural time series models. A simulation exercise shows that their computational costs and finite-sample performance are very good. Keywords: State-space models, subspace methods, Kalman filter, system identification, maximum-likelihood ∗ Corresponding author. Departamento de Fundamentos del Análisis Económico II. Facultad de Ciencias Económicas. Campus de Somosaguas. 28223 Madrid (SPAIN). email: [email protected], tel: (+34) 91 394 26 26, fax: (+34) 91 394 25 91. The authors gratefully acknowledge financial support from Ministerio de Educación y Ciencia, ref. SEJ2005-07388. 1 1 Introduction Many model estimation methods are based either on Least Squares (LS) or Maximum Likelihood (ML). The LS approach has clear advantages in terms of computational simplicity, stability and cost. However it is limited in scope, as it cannot be directly applied to many models such as those with moving average terms or multiplicative seasonality. On the other hand ML typically requires iterative optimization, so its statistical efficiency and wide scope is somewhat balanced with increased instability, complexity and computational cost. Specifically when modeling high-frequency time series, such as those generated by financial markets, it may be hard to accept the assumptions and costs implied by gaussian ML. As the pros and cons of LS and ML are complementary, many works focus on devising methods that provide the best of both worlds. For example, Spliid (1983), Koreisha and Pukkila (1989, 1990), Flores and Serrano (2002) and Dufour and Pelletier (2004) extend ordinary and generalized LS to VARMAX modeling. Also Lütkepohl and Poskitt (1996) propose a LSbased method to specify and estimate a canonical VARMA structure. Finally, Francq and Zakoı̈an (2000) provide weak GARCH representations that can be estimated using two-stage LS. All these methods share two substantial limitations. First, they avoid iteration when there are no exclusion constraints, but imposing such restrictions requires iterating and this partially offsets their computational advantage. Second, these procedures are typically devised for specific parameterizations, such as VARMAX, and their imple- 2 mentation in other frameworks is not trivial. In this paper we propose two fast and stable algorithms to estimate time series models written in their equivalent State-Space (SS) form. They are useful, either to obtain adequate initial conditions for ML iteration, or to provide final estimates when ML is considered inadequate or computationally expensive. On the other hand, their SS foundations provide a wide scope, as they can be applied to any linear fixed-coefficients model with an equivalent SS form. Both algorithms are based on subspace methods, see Favoreel et al. (2000), Katayama (2005), or Bauer (2005). The first method, SUBEST1, consists of estimating the parametric matrices in the SS model by solving a weighted reduced-rank LS problem, defined over a set of subspace regressions. Iteration is required when there are constraints relating the parameters in the standard and SS representation of the model. Any exclusion or fixed-value constraint is treated in the same way. Montecarlo experiments show that this method is very fast and provides estimates close to those of ML in finite samples. However, when compared with ML, it tends to underestimate the MA parameters. The second and more sophisticated procedure, SUBEST2, provides a gaussian ML solution to the subspace regressions. Simulations show that it improves over SUBEST1, both in precision and dispersion of the estimates, while keeping a substantial advantage over ML in terms of computational 3 cost. The structure of the paper is as follows. Section 2 defines the basic notation and previous results. Building on them, Section 3 derives the estimation algorithms and Section 4 discusses their performance in finite samples. In Section 5 we discuss the conditions under which each method is more adequate and indicate how to obtain a free MATLAB Toolbox implementing them. 2 2.1 Notation and previous results General state-space model Consider an m-vector of endogenous outputs, zt , which is related to its past and to the current and past values of an r-vector of exogenous inputs, ut , through a time series model depending on a vector of unknown constant parameters, β. Assume also that this model can be written in an equivalent SS representation, such as: xt+1 = Φxt + Γut + Ewt zt = Hxt + Dut + Cvt (1a) (1b) where the matrices Φ, Γ, E, H, D and C depend on β; xt ∈ Rn is the state vector and, wt ∈ Rn and vt ∈ Rm are uncorrelated sequences of gaussian errors, independent of ut , with zero mean and such that: E(wt wt0 ) = Q, E(vt vt0 ) = R, E(vt wt0 ) = S. We will also assume this model to be minimal 4 and stable. Minimality means that there is no SS model that can realize zt with less than n states. Stability requires all the eigenvalues of Φ to lie inside the unit circle. Subspace methods derive from a matrix representation of the system (1a1b). To obtain it, note that a recursive substitution in (1a) yields, t xt = Φ x0 + t−1 X j Φ Ewt−j−1 + j=0 t−1 X Φj Γut−j−1 (2) j=0 and substituting (2) into the observation equation (1b), we get: t zt = HΦ x0 + H t−1 X j Φ Γut−j−1 + Dut + H j=0 t−1 X Φj Ewt−j−1 + Cvt (3) j=0 so the endogenous variable, zt , depends on the initial state vector, x0 , the present and past values of the inputs, ut , and both errors wt and vt . Under these conditions, equation (3) can be written in matrix form as, Zf = Oi Xf + Hiu Uf + Hiw Wf + Hiv Vf (4) The regression model (4) includes both, data-dependent and parameterdependent matrices. The data-dependent matrices are the Block-Hankel matrices (BHMs) Zf , Uf , Wf and Vf , as well as the state sequence Xf . The row space of these matrices, which is denoted by their subscript and will be defined in (6), depends on the parameter i. In this work, we will determine i as the nearest integer to log T , being T the sample size. This heuristic 5 choice is supported by empirical experience, but our procedures could be implemented with any other choice, such as those suggested by Van Overschee and De Moor (1994), Deistler et al. (1995), Ljung (1999) or Bauer (2005). The BHM Zf organizes the “future” values of the outputs from t = i onwards. Its structure is: Zf = Zi:2i−1 zi+1 . . . zT −i zi zi+1 zi+2 . . . zT −i+1 = . .. .. .. . . z2i−1 z2i . . . zT −1 (5) and the complement of (5) is the matrix of “past” values Zp , which structure is: Zp = Z0:i−1 = z0 z1 . . . z1 .. . z2 .. . zi−1 zi zT −2i . . . zT −2i+1 .. . . . . zT −i−1 (6) The matrix Zp will be employed in Section 3. The other BHMs Uf , Wf and Vf have the same structure as Zf but depending, respectively, on ut , wt and vt instead of zt . For short hand notation we denote, any BHM A as: A = A0:2i−1 , Ap+ = A0:i , Apr = Ai:i and Af + = Ai+1:2i−1 . The last data-dependent matrix is the state sequence Xf , given by: Xf = (xi xi+1 xi+2 ... 6 xT −i ), with f = i. (7) The states in Xf are unobserved, so they must be estimated from the sample. In Section 3.1 we describe how to do this. On the other hand, equation (4) also depends on the following parameterrelated matrices: the Extended Observability matrix, Oi , the lower-block diagonal Toeplitz matrices, Hiu and Hiw , and the block-diagonal matrix Hiv . These matrices are defined by: 0 Oi = H 0 (HΦ)0 (HΦ2 )0 . . . (HΦi−1 )0 D 0 0 HΓ D 0 Hiu = HΓ D HΦΓ .. .. .. . . . HΦi−2 Γ HΦi−3 Γ HΦi−4 Γ Hiw 0 0 0 HE 0 0 = HE 0 HΦE .. .. .. . . . HΦi−2 E HΦi−3 E HΦi−4 E 7 ... ... ... .. . ... ... ... ... .. . ... 0 ∈ Rim×n , (8) 0 im×ir , 0 ∈R .. . D 0 0 im×im , 0 ∈R .. . 0 (9) (10) and C 0 . . . 0 0 C ... 0 Hiv = . . ∈ Rim×im . . .. .. . . .. 0 0 ... C 2.2 (11) Steady-state innovations model Our algorithms become simpler and more efficient when applied to a special SS representation known as steady-state innovations model, hereafter “innovations model”. An innovations model is given by: xt+1 = Φxt + Γut + Eψt zt = Hxt + Dut + ψt (12a) (12b) Note that this model can be viewed as a special case of (1a-1b) with C = Im , which is an identity m-by-m matrix, and wt = vt = ψt . Besides the assumptions made over model (1a-1b), we add a strict minimum-phase condition, so all the eigenvalues (Φ − EH) lie inside the unit circle. The innovations form (12a-12b) is convenient since it is general and it is computationally efficient. It is general because of two reasons. First, many time series models, including rational transfer functions and VARMAX, can be directly written in the form (12a-12b), see Hannan and Deistler (1988), since they present only 8 one source of unobserved errors. On the other hand, any model in the form (1a-1b) such as, e.g., the Structural Time Series Models (STSM, Harvey, 1989), can be written under weak assumptions in its equivalent innovations form (see Casals et al., 1999, Theorem 1). It is efficient because, its structure presents less parameters than the general form (1a-1b), reducing the complexity, the computational cost and improving the likelihood conditioning. Specifically, when the model is written in the innovations form (12a-12b), equation (4) collapses to the simpler form: Zf = Oi Xf + Hiu Uf + Hiψ Ψf (13) where Hiw and Hiv are replaced with Hiψ , that is just as (10) but with Im in the main block diagonal. 3 Main results The basic problem consists of estimating β, which is the vector of param- eters in the original model. The parametric matrices Φ, Γ, E, H, D, C, Q, R and S in the SS representation (1a-1b) of the model will be, in general, nonlinear functions of β. In this section we describe two new algorithms, based on subspace methods, to estimate all these parametric matrices by iterating on β. To this end, we will start by discussing how can one compute the state sequences in the matrix (7). 9 3.1 Estimation of the state sequences As the final result does not depend on the specific SS representation employed, we will develop it from the simpler regression model (13). It implies that the expectation of future output values, conditional to U , which contains the past and future information of the observed inputs, and to Zp , which contains the past information of the output, is, E(Zf |U, Zp ) = Oi E(Xf |U, Zp )+Hiu E(Uf |U, Zp )+Hiψ E(Ψf |U, Zp ) (14) where Uf is included in U . As we assumed in Section 2, the innovations ψt are independent of the inputs and, by construction, E(zt−j ψt ) = 0 ∀j > 0, so (14) simplifies to, E(Zf |U, Zp ) = Oi E(Xf |U, Zp ) + Hiu Uf (15) Rearranging the terms of this equation, we obtain, E(Xf |U, Zp ) = Oi† [E(Zf |U, Zp ) − Hiu Uf ] (16) which is an expression for the states conditional to the past of the output and to the observed inputs. As all these variables are observable, from (16) we can estimate the states by means of the orthogonal projections: X̂f = Oi† (Zf ΠU,Zp − Hiu Uf ) 10 (17) where Oi† is the Moore-Penrose pseudo-inverse of Oi . Expression (17) contains parameters in Oi† and Hiu , while Zf ΠU,Zp denotes the orthogonal projection of the stacked matrix ZUp . 3.2 Algorithm SUBEST1 3.2.1 Models in innovations form We will now set a generalized non-linear least square problem for all the parameters β. To do so, we start by writing equation (13) as: Zf + = Oi−1 (Φ − EH)Xf + EZpr + (Γ − ED)Upr ψ u Ψf + + Hi−1 Uf + + Hi−1 (18) see Appendix A, where Xf and Ψf + are unobserved components. The first addend in the square brackets corresponds to the states related to the past, Zp and Up (see equation A.3). The second and third addends are updating terms related to the present. Now, the expectation of (18) conditional to Zp+ is: E(Zf + |U, Zp+ ) = Oi−1 (Φ − EH)E(Xf |U, Zp+ ) + EE(Zpr |U, Zp+ ) + (Γ − ED)E(Upr |U, Zp+ ) ψ u + Hi−1 E(Uf + |U, Zp+ ) + Hi−1 E(Ψf + |U, Zp+ ) (19) where, as Xf is only related to the past, one may substitute E(Xf |U, Zp+ ) for E(Xf |U, Zp ). Furthermore, since Upr and Uf + are included in U , Zpr 11 in Zp+ and E(Ψf + |U, Zp+ ) = 0, equation (19) simplifies to, E(Zf + |U, Zp+ ) = Oi−1 (Φ − EH)E(Xf |U, Zp ) + EZpr u + (Γ − ED)Upr + Hi−1 Uf + (20) Finally, substituting E(Xf |U, Zp ) by its estimate X̂f given in (17), we obtain an estimate for the expected value of Zf + conditional to U and Zp+ : u Uf + (21) Ẑf + = Oi−1 (Φ − EH)X̂f + EZpr + (Γ − ED)Upr + Hi−1 Note that this equation includes all the parametric matrices in (12a-12b), Φ, Γ, E, H and D which, therefore, can be estimated simultaneously by solving the weighted least-squares problem: min {Φ,Γ,E,H,D} −1 2 Ω 2 Zf + ΠU,Z − Ẑf + p+ (22) F where k · kF denotes de Frobenius norm and Ω is the covariance matrix of the prediction errors, which can be written as Ω = Zf + Π⊥ U,Z cause Π⊥ U,Z p+ p+ Zf + 0 , be- is idempotent and where Π⊥ = I − Π. Note that Ω and Zf + ΠU,Zp+ are projections of data-dependent matrices, while Ẑf + , defined in (21), depends on both, the data and the parameters. This projections can be efficiently computed using the QR decomposition (see Golub and Van Loan, 1996). In the same way that the matrices in model (12a-12b) depend on β, the parametric matrices in the subspace equation (13) depend on the parameters 12 of the SS representation. Therefore, solving (22) requires iterative techniques such as those described in, e.g., Dennis and Schnabel (1983). Finally, we compute the residual vector, Z̃pr , by shifting the time subscripts in (13): Z̃pr = Zpr − Ôi X̂f − Ĥiu Upr (23) Note that the row space dimension of Zpr is equal to 1, so by using just the first row of matrices Ôi and Ĥiu , we obtain, Z̃pr = Zpr − Ĥ X̂1 + D̂Upr (24) 0 and, consequently, Q̂ = (T −2i+1)−1 Z̃pr Z̃pr , where T −2i+1 is the number of elements of the vector Z̃pr , is an estimate for the innovation variance. 3.2.2 General SS models SUBEST1 can be applied to SS models with more than one source of unobserved errors by using the following two-stage variant of the algorithm. In the first step, we estimate the parameters in Φ, Γ, H and D by solving, min {Φ,Γ,H,D} i −1 h 2 Ω 2 Zf ΠU,Zp − Oi X̂f − H u Uf i (25) F 0 where Ω = Zf Π⊥ U,Zp Zf and the expression for the states is given in equa- tion (17). Note that, while in (22) E is jointly estimated with the rest of parametric matrices, here it will be calculated in a second stage. 13 In the second step, we use the state resulting from the solution of (25) to obtain initial conditions for the Kalman filter gain, Ki and the prediction error covariance, Bi (see Appendix B). Then, we compute the estimates for E, C, Q, R and S by solving the constrained problem: 2 K̄ K̂i − min {E,C,Q,R,S} B̄ B̂i F (26) where the underlined matrices were previously estimated and the matrices K̂i and B̂i in (26) must also satisfy the covariance equations of the Kalman filter propagated i times: 0 K̂i = (Φ̂P̂i Ĥ + Ê Ŝ Ĉ 0 )B̂i−1 (27) 0 P̂i+1 = Φ̂P̂i Φ̂ + Ê Q̂Ê 0 − K̂i B̂i K̂i0 0 B̂i = Ĥ P̂i Ĥ + Ĉ R̂Ĉ 0 Therefore, in this second step we compute estimates for E, C, Q, R and S such that they, (a) return K̂i and B̂i propagating equations (27) i times, and (b) are close to the values K̄ and B̄ resulting from the previous stage. 3.3 Algorithm SUBEST2 Our second algorithm requires the SS model to be written in innova- tions form. Actually this is not a limitation, because in each iteration we can transform the input model to the equivalent innovations form using the algorithm of Casals et al. (1999). Note that we could not do this in SUBEST1, 14 since the matrices related to the noise E, C, Q, R and S cannot be jointly estimated with the remaining parametric matrices, implementing a two-stage estimation instead. Equation (13) implies, Hiψ Ψf = Zf − Oi Xf − Hiu Uf (28) where the states, Xf , will be replaced with the estimates given by (17). The noise term Hiψ Ψf has a particular structure, with one-step-ahead errors in the first row, two-step-ahead errors in the second row and so on. Under these conditions, the gaussian loglikelihood function can then be written as, log l(β) = − 0 im i 1 log(2π) − log(|Σ|) − tr(Ψ0f Hiψ Σ−1 Hiψ Ψf ) 2 2 2 (29) where, tr(.) is the trace operator and Σ is the Prediction Error (PE) covariance. The noise term is obviously autocorrelated, so Σ has the following structure, Σ = Oi Pi Oi0 + Hiψ (Ii ⊗ Q)Hiψ 0 (30) where ⊗ is the Kronecker product and Pi is the state covariance matrix. There are two components in (30): the first addend in the right-hand-side refers to the error covariance of the states, while the second addend corresponds to the future output error variance, conditional to the estimated states. Taking into account the structure of Hiψ Ψf , its covariance matrix 15 can be defined as: Σjk ≡ P Ej covariance matrix when j = k Σ= Σjk ≡ cov(P Ej , P Ek ) when j = 6 k j, k = 1, 2, ..., i (31) where PEj denotes the j-step-ahead prediction errors. Computing expression (30) requires, therefore, propagating i times the Kalman filter covariance equations (27) as in Subsection 3.2.2, to get an estimate of Pi . 4 Montecarlo experiments We will now analyze the performance of SUBEST1 and SUBEST2 in finite samples using as benchmark a SS based ML algorithm (Casals et al., 1999) initialized with the true parameter values. On the other hand, SUBEST1 and SUBEST2 will always be initialized with β = 0 and a value of i heuristically chosen as the nearest integer to log T , where T is the sample size. Obviously, these settings deliberately favor the ML algorithm. Tables 1-8 summarize the main results. The simulations in Tables 1-6 refer to different homoskedastic models. For each data generating process, we have obtained 1,000 samples with T = 50 and T = 300 observations, after discarding the first 50 values in each case to improve randomization. On the other hand, Tables 7-8 show the results obtained for two common conditional heteroskedastic models. In these cases the sample sizes are increased to T = 500 and T = 3, 000, to make them representative of the high 16 frequency series to which these models are typically applied. Tables 1-2 show the results obtained assuming gaussian AR(2), ARMA(2,1) and bivariate VARMA(2,1) data generating processes, respectively. In all cases the AR parameters were chosen so that the roots are complex and far from the unit circle. This avoids ill-conditioning situations due to approximate cancellation of real AR and MA roots. As could be expected: 1. The averages of ML estimates are closer to the true parameter values for models with MA terms. However, if the model is pure AR, the root mean-squared errors (RMSEs) of SUBEST2 estimates are comparable to those of ML and generally better than those of SUBEST1. 2. Computational cost of ML ranges from 3.5-4 and 8-10 times the cost of SUBEST1 for T = 50 and T = 300, respectively. On the other hand, the cost of SUBEST2 typically doubles that of SUBEST1 for T = 50. This overhead decreases when the sample size grows. [INSERT TABLES 1-2] Table 3 shows the results corresponding to a gaussian ARMA(1,1)×(0, 1)s process with quarterly (s = 4) and monthly data (s = 12). We deliberately deteriorated the conditioning of the loss function by choosing parameter values that generate a partial redundancy between the roots of the regular AR and MA factors. Note that ML keeps an advantage in precision, but its computational overhead multiplies the cost of SUBEST1 by a 4-6 factor in the quarterly model and a 9-14 factor in the monthly model. This is due to 17 the high dimension of the state vector, which also affects the relative performance of SUBEST2, and to the ill-conditioning of the loss function. With T = 300, ML and SUBEST2 provide similar results in terms of accuracy. [INSERT TABLE 3] The literature shows that, in small samples, ML estimates display a “PileUp Effect” (PUE). This effect means that the distribution of ML estimates of moving average terms is multimodal, being some modes in the invertibility boundary, see Ansley and Newbold (1980). To further analyze the properties of the estimation methods, we therefore need more details about this PUE. Table 4 shows the results of this analysis, which main conclusions are: i) SUBEST1 and SUBEST2 present less PUE than ML, ii) discounting the PUE, estimates of the MA parameters are very similar for SUBEST2 and ML, and iii) SUBEST1 and SUBEST2 show slower convergence to the true MA seasonal parameters when the seasonal frequency is large, e.g. s = 12. [INSERT TABLE 4] The state-space foundation of the methods proposed makes them very flexible, in the sense that they can accommodate a wide class of models. To illustrate this flexibility, Tables 5-6 and 7-8 show, respectively, the results obtained with two unobserved variable models and two conditionally heteroscedastic formulations. Specifically, Tables 5 and 6 show the results obtained with two typical formulations in the SS literature: an STSM with a low signal-to-noise ratio 18 and an AR(2) model with observation errors, respectively. Table 6 shows that in this case SUBEST1 is, not only very imprecise, but also the slower method. This is due to the fact that the loss function considered in SUBEST1 does not depend on the error covariance matrix. Therefore, its estimates do not take into account the null covariance restrictions. Also, the low signalto-noise ratio makes the matter worst, as it deteriorates the conditioning of the likelihood surface, inducing convergence problems. On the other hand SUBEST2 and ML estimates are more precise, as they take into account this independence restriction, and their computational overhead is also similar, due to the very small number of parameters to be estimated. Results provided by the Table 6 are rather similar, being the main difference the improvement of SUBEST1 performance due to a higher signal-to-noise ratio. [INSERT TABLES 5-6] Finally, Tables 7 and 8 present the results for two common high frequency financial models: Autoregressive Stochastic Volatility (ARSV, see Harvey et al., 1994), and Generalized Autoregressive Conditional Heteroskedasticity (GARCH, see Bollerslev, 1986), models. Table 7 displays an ARSV(1) model, where zt are the observed returns and σt are the corresponding volatilities. In this type of models, the variance of t is assumed to be known and, therefore, it is not estimated. Results of the simulation are close to those of an AR with observation errors, because the SS representation of both models is very similar. However, when the sample size becomes larger, the computational cost of ML increases to 5-20 times 19 the cost of SUBEST1 and 3-12 times the cost of SUBEST2, for T = 500 and T = 3, 000, respectively. Table 8 shows that the computational disadvantage of ML in the GARCH model is larger than in the ARSV. In particular, the load of ML is 49 times the cost of SUBEST1 and more than 16 times the cost of SUBEST2 for T = 500. When the sample size increases (T = 3, 000) SUBEST2 becomes the fastest method. In fact, ML converges 90 and 79 times slower than SUBEST2 and SUBEST1, respectively, while the three methods provide very similar results in terms of precision. [INSERT TABLES 7-8] 5 Concluding Remarks We presented two new algorithms to estimate time series models. These procedures are fast, stable and flexible, as they can be applied to any model written in its equivalent SS form. Simulation results characterize the situations where each algorithm is more adequate. SUBEST1 is a good choice when the model does not include seasonal moving average terms and, as it is very fast, also when one wants to compute initial conditions estimates for a posterior ML iteration. On the other hand, SUBEST2 is more suitable as final estimation method, because its estimates are comparable to those of ML for large gaussian samples, while maintaining a substantial advantage in terms of both, computational 20 cost and numerical stability. SUBEST1 and SUBEST2 may particularly useful when modeling high frequency financial data. The computational cost and instability of ML, as well as the fact that samples are far from the Gaussian assumption, strongly motivates the use of alternative estimation methods. These algorithms are implemented in a MATLAB toolbox for time series modeling called E 4 . The source code of this toolbox is freely provided under the terms of the GNU General Public License and can be downloaded at www.ucm.es/inf o/icae/e4. This site also includes a complete user manual and other reference materials. 21 References Ansley, C. F. and Newbold, P. (1980). Finite sample properties of estimators for autoregression moving average models. Journal of Econometrics, 13:159–183. Bauer, D. (2005). Estimating linear dynamical systems using subspace methods. Econometric Theory, 21:181–211. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31:307–327. Casals, J., Sotoca, S., and Jerez, M. (1999). A fast stable method to compute the likelihood of time invariant state space models. Economics Letters, 65(3):329–337. Deistler, M., Peternell, K., and Scherrer, W. (1995). Consistency and relative efficency of subspace methods. Automatica, 31(12):1865–1875. Dennis, J. E. and Schnabel, R. B. (1983). Numerical methods for unconstrained optimization and nonlinear equations. Prentice Hall, Englewood Cliffs, NJ. Dufour, J. M. and Pelletier, D. (2004). Linear estimation of weak VARMA models with macroeconomic applications. Technical report, Centre Interuniversitaire de Recherche en Economie Quantitative (CIREQ), Université de Montréal, CANADA. Favoreel, W., De Moor, B., and Van Overschee, P. (2000). Subspace state 22 space system identification for industrial processes. Journal of Process Control, 10:149–155. Flores, R. and Serrano, G. (2002). A generalized least squares estimation method for VARMA models. Statistics, 36(4):303–316. Francq, C. and Zakoı̈an, J. M. (2000). Estimating weak GARCH representations. Econometric Theory, 16:692–728. Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations. Johns Hopkins University Press, Baltimore, third edition. Hannan, E. J. and Deistler, M. (1988). The Statistical Theory of Linear Systems. John Wiley, New York. Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman Filter. Cambridge University Press. Harvey, A. C., Ruiz, E., and Shepard, N. (1994). Multivariate stochastic variance models. Review of Economic Studies, 61:247–264. Katayama, T. (2005). Subspace Methods for System Identification. Springer Verlag, London. Koreisha, S. G. and Pukkila, T. H. (1989). Fast linear estimation methods for vector autoregressive moving average models. Journal of Time Series Analysis, 10:325–339. Koreisha, S. G. and Pukkila, T. H. (1990). A generalized least-squares approach for estimation of autoregressive moving-average models. Journal of Time Series Analysis, 11:139–151. 23 Ljung, L. (1999). System Identification, Theory for the User. PTR Prentice Hall, New Jersey, second edition. Lütkepohl, H. and Poskitt, D. S. (1996). Specification of echelon form VARMA models. Journal of Business and Economic Statistics, 14(1):69– 79. Spliid, H. (1983). A fast estimation method for the vector autoregressive moving average model with exogenous variables. Journal of the American Statistical Association, 78(384):843–849. Van Overschee, P. and De Moor, B. (1994). N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems. Automatica, 30(1):75–93. 24 Appendix A From equation (13), displacing time subscripts, we obtain: ψ u Zf + = Oi−1 Xi+1 + Hi−1 Uf + + Hi−1 Ψf + (A.1) Note that Zf + includes only the future block information while Zf integrates the present and the future blocks. On the other hand, substituting (12b) into (12a) and propagating the state equation yields: i xt = (Φ − EH) xt−i + i−1 X (Φ − EH)j [Ezt−1−j + (Γ − ED)ut−1−j ] j=0 (A.2) which can be written in subspace form as, Xi = (Φ − EH)i Xp + Ξi (Φ − EH, E)Zp + Ξi (Φ − EH, Γ − ED)Up (A.3) where Ξi (A, B) = [Ai−1 B Ai−2 B ... AB B], for any matrices A, B. Again, displacing time indices, we obtain: Xi+1 = (Φ − EH)i+1 Xp + Ξi+1 (Φ − EH, E)Zp + Ξi+1 (Φ − EH, Γ − ED)Up (A.4) or, in the same way, Xi+1 = (Φ − EH)Xi + EZpr + (Γ − ED)Upr 25 (A.5) Finally, substituting (A.5) into (A.1), we get (18). Appendix B To obtain the initial conditions K̄ and B̄, we use the state equation of the Kalman filter, that can be written in subspace form as: X̂i+1 = Φ̂X̂i + Γ̂Upr + Ki Z̃pr (B.1) where Φ̂, Γ̂ and X̂i are previously estimated in (25) and the residual vector, Z̃pr , is exactly as in (24). Also, as in equation (17), the state sequence X̂i+1 is computed as: † u Uf + ) X̂i+1 = Oi−1 (Zf + ΠU,Zp+ − Hi−1 (B.2) Thus, the only unknown matrix is Ki . From (B.1) we get K̄, an approximation for Ki , as: † K̄ = (X̂i+1 − Φ̂X̂i − Γ̂Upr )Z̃pr (B.3) Note that K̄ is a least square estimation of the Kalman filter gain from equation (B.1). 0 Finally, an approximation of Bi is obtained as, B̄ = (T −2i+1)−1 Z̃pr Z̃pr . 26 Tables Table 1† : Univariate nonseasonal models with gaussian errors. Table 1.1 AR(2) model: (1-.4B +.3B 2 )zt = at ; at ∼ iidN (0, 1.0). Method T True values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time SUBEST1 -.4 .3 1.0 -.384 .305 .931 .144 .144 .203 .145 .144 .203 100% -.397 .306 .985 .059 .066 .083 .059 .066 .083 100% SUBEST2 -.4 .3 1.0 -.367 .284 .943 .136 .127 .198 .136 .127 .198 216% -.391 .296 .992 .053 .055 .082 .054 .055 .083 219% ML -.4 .3 1.0 -.386 .304 .946 .137 .135 .196 .138 .135 .196 416% -.395 .300 .994 .053 .056 .082 .054 .065 .083 1040% Table 1.2 ARMA(2,1) model: (1-.4B +.3B 2 )zt = (1 − .8B)at ; at ∼ iidN (0, 1.0). Method T True values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time SUBEST1 -.4 .3 -.8 1.0 -.360 .289 -.671 1.020 .305 .175 .291 .214 .308 .175 .318 .215 100% -.398 .309 -.763 1.014 .085 .071 .061 .085 .085 .072 .072 .086 100% † SUBEST2 -.4 .3 -.8 1.0 -.307 .297 -.680 1.000 .222 .141 .254 .206 .240 .141 .281 .206 244% -.389 .304 -.805 1.001 .066 .061 .074 .083 .067 .062 .074 .083 230% -.4 -.354 .196 .201 -.395 .066 .066 ML .3 -.8 .313 -.800 .141 .197 .141 .197 359% .304 -.797 .061 .050 .062 .050 881% ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined. 27 1.0 .927 .194 .207 .993 .081 .081 Table 2† : Bivariate VARMA model with gaussian errors. 1 − .70B + .60B 2 0 z1t = 2 0 1 − 1.30B + .50B z2t 1 − .30B −.90B a1t .60B 1 − .80B a2t a1t 0 .07 .02 ∼ iidN , a2t 0 − .05 Table 2.1: Estimation results with sample size T = 50 True values Method Average Std. Dev RMSE Time Method Average Std. Dev RMSE Time Method Average Std. Dev RMSE Time -.70 -1.30 .60 .50 -.30 .60 -.90 SUBEST1 -.671 -1.359 .545 .557 -.267 .431 -.743 .137 .233 .104 .177 .200 .104 .177 .140 .240 .118 .186 .203 .199 .236 100% SUBEST2 -.620 -1.297 .519 .500 -.331 .453 -.832 .133 .202 .100 .151 .208 .121 .195 .155 .202 .129 .151 .210 .190 .207 348% ML -.686 -1.279 .589 .492 -.344 .654 -1.024 .087 .188 .072 .155 .175 .122 .179 .088 .189 .073 .155 .181 .134 .218 398% † -.80 .07 .02 .05 -.736 .079 .019 .056 .246 .017 .010 .014 .254 .019 .010 .015 -.776 .073 .017 .056 .251 .016 .010 .014 .252 .017 .010 .015 -.872 .060 .017 .044 .220 .014 .010 .010 .231 .017 .010 .012 ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined. 28 Table 2.2: Estimation results with sample size T = 300 True values Method Average Std. Dev RMSE Time Method Average Std. Dev RMSE Time Method Average Std. Dev RMSE Time -.70 -1.30 .60 .50 -.30 .60 -.90 SUBEST1 -.693 -1.318 .580 .515 -.276 .536 -.829 .044 .090 .038 .072 .084 .044 .063 .045 .092 .043 .074 .087 .078 .094 100% SUBEST2 -.686 -1.311 .584 .506 -.289 .585 -.902 .039 .076 .034 .062 .070 .043 .072 .042 .076 .037 .062 .070 .046 .072 342% ML -.699 -1.296 .599 .500 -.305 .605 -.913 .030 .055 .023 .047 .062 .035 .052 .030 .055 .023 .047 .062 .035 .054 744% 29 -.80 .07 .02 .05 -.769 .074 .021 .054 .099 .006 .004 .005 .104 .008 .004 .006 -.831 .072 .019 .053 .089 .006 .004 .005 .095 .006 .004 .006 -.804 .069 .019 .049 .066 .006 .004 .004 .066 .006 .004 .004 Table 3† : Univariate seasonal ARMA models. Table 3.1 ARMA(1, 1) × (0, 1)4 model: (1 − .4B)zt = (1 − .7B)(1 − .8B 4 )at ; at ∼ iidN (0, 1.0) Method T True Values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time -.4 -.208 .557 .589 -.302 .250 .269 SUBEST1 -.7 -.8 -.390 -.456 .585 .168 .662 .383 100% -.591 -.575 .267 .088 .288 .242 100% SUBEST2 1.0 -.4 -.7 -.8 1.174 -.294 -.632 -.742 .336 .486 .482 .153 .378 .497 .486 .163 217% 1.228 -.365 -.665 -.826 .153 .147 .132 .064 .275 .151 .136 .069 227% 1.0 -.4 .924 -.325 .210 .325 .223 .333 .998 -.388 .084 .146 .084 .146 ML -.7 -.8 -.683 -.835 .329 .169 .330 .173 441% -.695 -.800 .122 .042 .122 .042 577% 1.0 .886 .204 .233 .990 .080 .081 Table 3.2 ARMA(1, 1) × (0, 1)12 model: (1 − .4B)zt = (1 − .7B)(1 − .8B 12 )at ; at ∼ iidN (0, 1.0) Method True values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time T SUBEST1 -.7 -.8 -.493 -.368 .534 .128 .572 .450 100% -.374 -.661 -.586 .221 .201 .055 .222 .205 .221 100% -.4 -.240 .525 .549 SUBEST2 1.0 -.4 -.7 -.8 1.133 -.244 -.592 -.494 .261 .438 .454 .042 .293 .465 .467 .309 436% 1.221 -.379 -.691 -.812 .118 .143 .128 .073 .250 .145 .128 .074 371% † ML 1.0 -.4 -.7 -.8 1.073 -.324 -.680 -.807 .237 .331 .323 .224 .248 .340 .323 .224 907% .970 -.386 -.694 -.805 .084 .135 .112 .048 .089 .136 .112 .048 1415% ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined. 30 1.0 .902 .225 .245 .984 .083 .085 31 ARMA(2,1) VARMA(2,1) VARMA(2,1) ARMA(1, 1) × (0, 1)4 ARMA(1, 1) × (0, 1)4 ARMA(1, 1) × (0, 1)12 ARMA(1, 1) × (0, 1)12 1.1 2.1 2.1 3.1 3.1 3.2 3.2 θ1 = −.8 θ11 = −.3 θ22 = −.8 θ1 = −.7 Θ1 = −.8 θ1 = −.7 Θ1 = −.8 Parameter % of PUE SUBEST1 SUBEST2 10.2 2.1 .0 .0 10.2 15.9 .0 1.5 .0 2.7 3.5 .6 .0 .0 We consider PUE when a MA parameter estimate is lesser or equal to -1.00. Model Table Estimates discounting PUE ML SUBEST1 SUBEST2 ML 21.5 -.624 -.671 -.740 .0 -.267 -.331 -.344 26.6 -.695 -.704 -.777 22.6 -.390 -.623 -.627 32.2 -.456 -.738 -.735 23.9 -.472 -.590 -.590 46.4 -.368 -.494 -.629 Table 4: Measuring the Pile-Up Effect (PUE) in small samples (T = 50). Table 5† : Structural time series model. Tt+1 1 1 Tt 0 = + η ∆t+1 0 1 ∆t 1 t Tt ηt 0 .01 0 zt = (1 0) + t ; ∼ iidN , ∆t t 0 0 10 Method T True values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time SUBEST1 .01 10 1.475 8.473 3.275 3.693 3.588 3.996 395% .461 10.137 .146 .476 .474 .495 273% SUBEST2 .01 10 .022 9.726 .040 1.406 .042 1.432 125% .019 9.977 .014 .443 .017 .444 100% ML .01 10 .035 9.911 .069 1.005 .073 1.009 100% .008 10.005 .006 .409 .007 .409 171% Table 6† : AR(2) model with observation errors. (1 − 1.5B + .8B 2 )zt∗ = at ; zt = zt∗ + vt ; at 0 1.0 0 ∼ iidN , vt 0 0 .5 Method T True values Average 50 Std. Dev RMSE Time Average 300 Std. Dev RMSE Time SUBEST1 -1.5 .8 1.0 -1.425 .731 1.030 .268 .187 .219 .278 .199 .221 100% -1.491 .798 .990 .057 .046 .081 .057 .047 .081 100% .5 .558 .202 .210 .530 .069 .076 † SUBEST2 -1.5 .8 1.0 -1.429 .737 1.045 .133 .121 .195 .150 .136 .200 204% -1.487 .788 1.010 .047 .044 .075 .048 .046 .076 193% .5 -1.5 .558 -1.480 .177 .128 .186 .129 .513 -1.492 .057 .044 .059 .045 ML .8 1.0 .781 .954 .119 .185 .121 .190 142% .793 .997 .043 .072 .043 .072 253% ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined. 32 .5 .469 .169 .172 .493 .055 .056 Table 7† : ARSV(1) model in State Space form. zt = 1.5σt t 2 ) = .95 log(σt2 ) + ηt log(σt+1 where ηt ∼ iidN (0, .5), t ∼ iidN (0, 1) and, t and ηt are independent. Method True values Average 500 Std. Dev RMSE Time Average 3000 Std. Dev RMSE Time T SUBEST1 .95 1.50 .50 .932 1.414 .469 .164 1.105 .146 .165 1.108 .149 100% .949 1.485 .497 .010 .202 .044 .010 .202 .044 100% SUBEST2 .95 1.50 .50 .945 1.453 .462 .041 1.027 .146 .041 1.028 .151 168% .949 1.484 .494 .009 .198 .044 .009 .198 .044 166% ML 1.50 .50 1.471 .517 .460 .104 .461 .106 476% .948 1.490 .501 .009 .183 .038 .009 .184 .039 2059% .95 .933 .034 .038 Table 8† : GARCH(1,1) model in ARMA form. yt = t with t |Ωt−1 ∼ iidN (0, σt2 ) where 2t = 1.0 + Method T True values Average 500 Std. Dev RMSE Time Average 3000 Std. Dev RMSE Time 1−.80B v 1−.97B t with vt = 2t − σt2 SUBEST1 SUBEST2 ML 1.0 -.97 -.80 1.0 -.97 -.80 1.0 -.97 -.80 1.005 -.972 -.798 1.002 -.969 -.814 1.209 -.952 -.784 .766 .025 .076 .760 .031 .106 1.420 .037 .056 .766 .025 .076 .760 .031 .107 1.436 .041 .058 100% 298% 4916% .999 -.974 -.810 .998 -.972 -.809 1.095 -.967 -.797 .314 .015 .052 .313 .019 .068 .880 .011 .018 .314 .016 .053 .313 .019 .069 .885 .011 .018 114% 100% 9047% † ML algorithm is initialized with the true parameter values. RMSE is the root meansquared error. The smallest RMSEs are underlined. 33

© Copyright 2018 AnyForm