Academia.eduAcademia.edu

The Econometrics of Financial Markets

1997

The paper provides a survey of the work that has been done in financial econometrics in the past decade. It proceeds by first establishing a set of stylized facts that are characteristics of financial series and then by detailing the range of techniques that have been developed to model series which possess these characteristics. Both univariate and multivariate models are considered.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/4852735 The Econometrics of Financial Markets Article in Journal of Empirical Finance · February 1996 DOI: 10.1016/0927-5398(95)00020-8 · Source: RePEc CITATIONS READS 564 1,959 1 author: Adrian Pagan University of Sydney 172 PUBLICATIONS 14,686 CITATIONS SEE PROFILE All content following this page was uploaded by Adrian Pagan on 09 February 2015. The user has requested enhancement of the downloaded file. Journal of EMPIRICAL ELSEVIER Journal of Empirical Finance 3 (1996) 15-102 FINANCE The econometrics of financial markets Adrian Pagan Economics Program, Research School Social Science, Australian National University, Canberra, A.C.T. 0200, Australia Abstract The paper provides a survey of the work that has been done in financial econometrics in the past decade. It proceeds by first establishing a set of stylized facts that are characteristics of financial series and then by detailing the range of techniques that have been developed to model series which possess these characteristics. Both univariate and multivariate models are considered. JEL classification: GI 1; GI2; G13; GI4 Keywords: Porfolio choice; Asset pricing; Contingent pricing; Futures pricing; Information; Market efficiency 1. Introduction Financial econometrics has emerged as one of the most vibrant areas of the discipline in the past decade, featuring an explosion of theoretical and applied work. Perhaps more so than in any other part of econometrics, the two have gone together, with the development of sophisticated techniques being driven by the special features seen in financial data. Understanding these characteristics, and the problems that they pose for modeling, must therefore be at the top o f any agenda purporting to describe " f i n a n c i a l e c o n o m e t r i c s " . It is worth thinking back to the econometrics of the fifties and sixties to realize the import of this. Then it would have been c o m m o n to assume that any series being investigated could be regarded as stationary, independentally, identically and (possibly) normally distributed, with 0927-5398/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0927-5398(95)00020-8 16 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 moments that existed. ~ To be sure, adjustments were sometimes made to allow for " c o m p l i c a t i o n s " , such as deterministically trending variables, but these were all treated as departures from a base model. Such a perspective has been slowly jettisoned by those dealing with financial series. Non-stationarity, a lack of independence, and non-normality have become the characteristics of the standard model, with the earlier set of assumptions now being regarded as the c u r i o s a . How and why this development came about is the concern of the first section of this paper. The " h o w " is a mechanical task; explaining " w h y " is the more interesting one, but we will have to content ourselves with the observation that the models of interest to financial economists depended very heavily upon the validity of these assumptions, so it is scarcely surprising that a major effort was mounted to assess their validity. As examples of this contention witness the fact that mean-variance portfolio models and optimal hedging formulae gave equal attention to the first and second moments of the data; option pricing formulae emphasized the need for a correct description of either the conditional or unconditional density of returns; while wealth choices based on maximizing expected utility E ( U ( y t I F t)), where F t was some information set, focused attention upon 2 the nature of higher order conditional moments. Having established a case for emphasizing features such as dependence and non-normality, Sections 2 and 3 seek to describe how the characteristics of the series have been captured by parametric models. Inevitably, this section is heavily oriented towards the construction, estimation and testing of models that explain the volatility of returns. Section 4 returns to the task of data description, now enquiring into what extra information is generated when moving from a univariate to a multivariate perspective. Our answer will be to concentrate upon locating the number and nature of the factors that are common to a set of returns. In Section 5, attention turns to economic models that have been proposed to account for the regularities seen in financial data. Because of the forward looking nature of financial markets, most of these models place heavy reliance on inter-temporal optimization, and this has led to an extensive literature which seeks to discover how accurate the description of agents' behavior provided by these models is. Finally, Section 6 provides a short conclusion. Financial data appears in many forms, each with its own idiosyncrasies. We will generally concentrate upon three representative series - stock prices, interest i Hence assumptions like T - ~ X ' X is a constant as T ~ and the prevalence of F-statistics for testing hypotheses. 2 In this review we will concentrateupon time series of financialdata, ignoringthe fact that data is sometimes available upon a cross section at a point in time or even on a panel. There are special econometric problems arising in the latter circumstances,but generallythey would appear in any cross section or panel, and are not specific to financialseries. It is in the time dimensionthat financial series are particularlydistinct, and it is for this reason that a very special set of techniqueshas arisen to deal with these features. A. Pagan/Journal of Empirical Finance3 (1996) 15-102 17 rates, and exchange rates, ignoring other series such as futures and options prices. Nevertheless, much of what is said about the three series in question applies to others. The series are the log of the monthly CRSP value weighted price of shares on the NYSE over the period 1 9 2 5 / 1 2 to 1989/12, the continuously compounded returns to one, three, six, and nine month zero coupon bonds over the period 1946(12) to 1987(12) (taken from McCulloch, 1989); and the log of the weekly exchange rate between the $US and the Swiss Franc over the period July 1973 to August 1985 (used by Engle and Bollerslev, 1986). The choice of the series was largely governed by their accessibility. Sometimes we will also refer to two much longer series of daily observations, stock returns computed from the S & P Composite Price Index, 1928-1987, as adjusted in Gallant et al. (1992) and U.S. daily stock returns from 1885-1987, constructed by Schwert (1990). 2. P r o p e r t i e s o f u n i v a r i a t e financial s e r i e s It is useful to begin by considering what type of models one might adopt for a series g(Yt) where g ( . ) is some known transformation of a random variable Yt, e.g., g(yt)= y2 or log(y2). Throughout, we will assume that g(Yt) has a zero expected value. Three simple models from time series analysis would be, 3 where L is the lag operator, A R M A ( 1 , 1 ) : g( y,) = [$1g( Yt-l) + et +etlet-i (1) F I : (1 - L ) d g ( y , ) = e, (2) OC : g ( y , ) = gp(y,) + g r ( Y t ) (1 - [31L ) gp(Yt) = wit (3) gr(Y,) = ~,, where et, "q, and ~t all have conditional expectation zero and will be taken to be identically and independently distributed over time. 4 The ARMA(1,1) model in (1) has been a work horse of time series econometrics since its popularization by Box and Jenkins in the late sixties. By setting [31 = 1 one can produce a process that is integrated of order one, I(1), so rendering g(Yt) no longer covariance stationary. The second model in (2) represents the class of fractionally integrated processes; provided d < 0.5 it is a covariance 3 Of course these can all be generalized by allowing for higher order autoregressive and moving average processes, but there is little advantage to doing that in this survey. 4 We will write this as E~_ 1(0~)= 0, where the subscript denotes that the expectation is conditional upon some set of past-dated random variables. Generally these will be the past history of vr 18 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 stationary process, but, if d = 1, it coincides with an I(1) series. Finally, (3) is an unobserved components model, composed of a permanent (gp(Yt)) and a transitory (gr(Yt)) part, each of which is generated as a stochastic process depending upon innovations TIt and ~t" Generally it is assumed that ~lt and ~, are independent of one another, and this yields a representation for g ( y t ) that is ARMA(1,1) but with a negative coefficient a . 5In some instances ~qt and ~t are assumed proportional and this provides the type of decomposition set out in Beveridge and Nelson (1981). 2.1. Are financial series stationary? Non-stationarity of a series could occur in many ways but, arising from the theory of efficient markets, it was natural for researchers to investigate whether there was a unit root associated with the log prices of financial assets, i.e., whether such a series defined as Yt was I(1), or not. A huge literature has been spawned on the question of unit roots, and the resulting tests have been extensively applied to financial data. These tests can be classified according to their responses to four issues. 1. Whether the null hypothesis is that the series is I(1) or I(0). 2. The model used to construct a test of this hypothesis. 3. Within a given model, which of the characteristics of an I(1) series is used to set up a test. 4. Whether the alternative distinguished in the testing procedure is composite or simple. Table 1 summarizes the literature according to this four way classification, culminating in five types of tests that we will focus our attention upon. In the table Table I Classification of unit root tests Hypoth Ho: I(1) Model ARMA Based on a.c.f. Nature H~ Example Index comp ADF (i) H0: 1(0) Var simp pt opt (ii) comp var rat (iii) simp '~ FI UC Degree(d) o'~ comp FI tests (iv) simp ? comp KPSS (v) simp ? s The error term e, is now a linear combination of ]E~=ocbfqt_j and L'~j=o@j~t_j and so is no longer independentlydistributedeven though "qt and ~, are (see McDonald and Darroch, 1983). A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 19 there are also question marks which represent what seem to be tests that are not in the literature to our knowledge. 6 (i) Selecting (1) with g ( Y t ) = Yt the process becomes Y, = ~31Y,- l + ut, (4) and we might test H o : 131 = 1 vs. H l : 131 < 1 using the t-ratio from the regression of A y t o n Yt-1" This leads to tests for a unit root of the Phillips-Perron, Dickey-Fuller (DF) type which feature in many econometric programs. (ii) Instead of using the composite alternative one might put H l : 131 = ~ . In this situation the best test can be derived from the N e y m a n - P e a r s o n Lemma as a likelihood ratio. This produces the class of point optimal tests examined in Dufour and King (1991) and Elliott et al. (1992). The latter also consider the case of near integration where 13~ = 1 - c / T , so that, as T ~ 2, the alternative collapses towards the null. (iii) The two tests just distinguished concentrate upon the autocorrelation coefficients of the series Yt, Pj(Yt) = cov( Yt Y,- j ) / v a r ( y , ) , as these are all unity if the process is I(1), so that it is natural to test if {31 is unity using ~j(yt) = 2 ) - l E T t=j+ ~Y~Yt-j. Instead, one might focus upon other implications of ( E rt=~+ i Yt-j a unit root, in particular the nature of the " v a r i a n c e " of the series. If (1) is a correct representation of the data, o~I = 0 and e t is assumed i.i.d. (O,cre2), then y t = yt_k + et + et_l + . . . +et_k+ I, (5) 2 7 will have variance k tl e. Taking the ratio h I = var(Ak y ) / v a r ( A l Yt) it would be k, and therefore a plot of h I against k should be an increasing straight line. Alternatively, h 2 = k - l h l should tend to unity as k increases. If, however, Yt does not have a unit root both var(Aky/) and vaffA~ Yt) will be constants, so that the ratio h I does not change after k becomes large enough, while h 2 will tend to zero. Cochrane (1988) formulated this idea and gave asymptotic standard errors for ~,2 when estimated from the data. Lo and MacKinlay (1989) give the asymptotic distribution of T I / 2 ( ~ 2 - 1)as N ( 0 , 2 ( 2 k - 1 ) ( k - 1 ) / 3 k ) i f e t is N(0,cr2), but found in simulations that the approximation was poor when ~ = k / T is large. Richardson and Stock (1989) determine the asymptotic distribution when k ~ 2 , T ~ oc, in such a way that B is a constant. This turns out to involve functionals of Brownian motion, and the distribution has to be found by simulation. When ~ = 1 / 3 they tabulate critical values of the statistic for various T. What is appealing about this test, from and the kth long difference of Yt, A k y t = Y t - Y t - k , 6 There is also a growing literature on Bayesian tests for unit roots that we will not summarize here, but which have sometimes used financial data to illustrate the techniques, e.g., Koop (1994) and Schotman and Van Dijk (1991). 7 With some modification, the results that follow also hold if et is correlated. Cochrane (1988) deals with the general case. 20 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 the perspective of financial data, is that, if y, is the log o f the price, the kth long difference can be interpreted as the returns to an asset held for k periods (assuming only capital gains and no dividends). A disadvantage of the test is its presumption that e t is normally distributed and that the var(A 1Yt) exists. Later in this survey both assumptions come under suspicion. (iv) Instead of the alternative model used to construct tests being A R M A it could be the fractionally integrated process in (2). It is possible to estimate d see Diebold and Rudebusch (1989) and Sowell (1992) - but probably not very precisely, although it has been done for various series, e.g., Baillie and Bollerslev (1994) with the forward premium. Another strategy is to derive tests such as an LM test or a one-sided locally best invariant test (LBI) that d takes a specified value, in particular whether d = 1. Writing the FI model as (1 - L ) d- t Ay~ = e t we can test this by checking if ( d - 1) is zero. Wu (1992) derives the LBI test for this hypothesis as - 2 E r - l(l/k)~)k(Ay,). It is probably not a good idea to base a test on a Pk that has k close to T, suggesting that we instead use ?,r = - E x,= i ( 1 / k ) ~ , ( A y t ) , with K set to a large value. Under the assumption that e~ is white noise, TJ/2s'?K will be asymptotically N(0,1),where s 2 ~ k X _ l l / k 2. 8 One might wish to use a one-tailed test given that the test statistic inherits its LBI properties from its one-sided nature. Another way of interpreting the test is to observe that it looks at the correlation of A y t with S,[=~(1/k)Ay,_ k, and so differs from the D F test in that it seeks to add Z,x_ l ( 1 / k ) A y , _ * to a regression with A Yt as dependent variable rather then y,_ 1. (v) Changing the model to that o f an unobserved components process (3), with g(Yt) = Yt, we might test if [31 is unity or, if the null hypothesis is taken to be that y, is an I(0) process, by testing if the variance of "q, is zero. If the latter is zero then Yt = Yr, and there is no unit root in the series, 3,,. Tests for this case have been developed by Tanaka (1990) and Kwiatowski et al. (1992). They are based upon the summation of the squares of the partial sums of the mean corrected and the statistic has the form KPSS = T - 2 [ E r = I s 2 ] / v , prices ( S t = E :=IYj), '. where v is an estimate of the " l o n g - r u n v a r i a n c e " , E [ T - i S2r]. As this term is essentially the spectral density for Yt at zero, it is necessary to specify a weighting function for the autocovariances as well as some lag truncation parameter in order to compute it. Note that KPSS = v - J Z , r 1 ( t / T ) 2 ~t,^2 where ~t = 1 / t E ' :=lY.i is the recursive estimate of the " m e a n " o f Yt. Since an 1(1) process does not have a mean, ~,t does not tend to zero as it would for a stationary random variable with = E( y t) = O. Table 2 presents the A D F , variance ratio (at lag length k = T / 3 ) , KPSS and fractional integration tests for d = 1 for the series mentioned earlier in the paper. s The result comes from the fact that 0j and 0, are asymptotically uncorrelated with variance T- i. The test can be made robust to heteroskedasticity in eI by replacing T ~/2 by a robust standard error, i.e., regress Ayr against a constant and Ay t_ l and use White's standard errors. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 21 Table 2 Tests for a unit root in selected financial series Series ~ DF ADF(4) V a t rat KPSS FI test L o g share pr Int rate $1ogUS/SF Returns 0.999 0.975 0.996 0.111 - 0.007 - 2.26 - 1.318 -24.749 0.07 - 1.98 - 1.564 - 11.751 0.73 0.23 0.79 0.004 1.00 1.02 1.40 0.08 - 1.40 0.43 - 1.77 Series are described in Section 1. A D F ( 4 ) is the a u g m e n t e d D i c k e y - F u l l e r test with 4 lags. V a r i a n c e ratio test is for lag k = T/3. 0.05 critical value is 0.1 l f r o m R i c h a r d s o n a n d Stock (1989), Table I. Critical values for the K P S S test are 0.46(0.05) a n d 0.35(0.1). To compute the KPSS test the weighting scheme was that used by Kwiatowski et al. (1992) and 48 lags were used in reflection of the large sample sizes. Conclusions are unaffected by this choice. The test for fractional integration is based on 74o and robust standard errors are used. One item worth mentioning in connection with the latter test is that it is very sensitive to the inclusion of ~ . In fact, one might wish to exclude this autocorrelation coefficient on the grounds that a single large autocorrelation coefficient should not be taken as evidence of fractional integration; the essence of fractional integration must be that the persistence only shows up in the cumulated autocorrelations. Doing so changes the test statistics to -0.06, 1.06 and - 1.41 respectively. Overall, the evidence for a unit root in asset prices is very strong while returns do not seem to possess one. 9 2.2. Are financial series independently distributed over time? All of this does suggest that there is indeed a unit root in asset prices, but the question of dependence in " r e t u r n s " , x t = Ayt has not yet been addressed. 10 There are a number of ways that this question has been investigated. (a) Since a series cannot be independently distributed if any of the pi(x~), j ~ - 1 . . . . . ~, are non-zero, this points to computation of the autocorrelation function (a.c.f.) of x t followed by tests that the serial correlation coefficients are zero. Generally these are found to be zero, except if data has been measured such 9 M a n y o f the a r g u m e n t s m a d e to a c c o u n t f o r unit roots involving " b r e a k i n g t r e n d s " d o not seem to be as relevant for asset prices w h e r e there are g o o d theoretical reasons to expect a unit root. iv In the r e m a i n d e r o f the p a p e r we will refer to the c h a n g e s in the logs o f stock prices, foreign e x c h a n g e rates a n d the level o f b o n d yields as returns. A s a o n e m o n t h b o n d yield is the log o f the price o f a zero c o u p o n b o n d this u s a g e seems consistent. 22 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 that there is an overlapping component. 1~ As well as returns the same conclusion would be reached for certain " s p r e a d s " , e.g., those between bid and ask prices. (b) A different viewpoint arises by thinking of the impact o f the news ~t upon returns. With no dependence, i.e., x t = %, the short run and long run impact of news is the same, i.e., Ox,/O~. t = 1, O x t + ~ / ~ , = 1. However, if the x t process has dependence, e.g., is an MA(1), et + a ~ t - 1, then Oxt/O~. t = 1, O x t + ~ / O ~ , = (1 + et). In general, for x t being an MA(q), e. t + o~ 1,~t - 1 + ... + oL q~. ~+ q, Ox t + ~ / ~ . , = (1 + ctj + a z + ..-+ etq), leading to the idea of testing for dependence by testing if H 0 : e t 1 + ct~ + . . . + ctq = 0. Testing if the sum of the a ' s is zero is likely to be more powerful than testing if the individual et's are zero, because only a scalar is being tested and it is more likely to be precisely estimated than any of its components. This has led some to estimate an MA(q) to get &~, t~ 2 . . . . . (~q and then to test if the sum is zero - see the review in Christiano and Eichenbaum (1990). Obviously this method has the disadvantage of assuming that the alternative is a moving average process. (c) F a m a and French (1988a) work with the UC model (3). Their idea is to form Yt+k - Y t a n d Y t - Y t - k , i.e., the kth forward and backward l o n g d i f f e r e n c e s of the series Yt, and to regress the former on the latter. With Yt as the log of stock prices, Y t + k - Yt is the continuously compounded kth period return and it will be the sum of k one-period returns, rt+~2= Yt+k - Y t + k - ~, r t + k - 1 ~ Y t + k - I - - Yt+k-2 etc., so that y t + k - y , = E ~ = ~ r , + j . In large samples the numerator of the ~1 and F a m a and French regression coefficient will tend to E[(E~= l r t + j ) ~tEk~ i=0~r ,-i-~ are therefore testing if this is zero. To appreciate what is being tested set k = 1,2,3, giving E(rt+~r,) =~,, (k= 1) E[(rt+l +rt+2)(rt+rt-i)] E[(rt+, +r,+ 2 +r,+3)(rt+r,_ =~/1+2~/2+3"/3+2~4+~5 ='Yl +2~/2 "t'-~3 ( k = 2 ) , +r,_2) ] (k=3), where -,// are the autocovariances of r,. Thus one is essentially testing if a weighted average o f the autocovariances is zero, rather than whether the autocovariances themselves are. One can think of F a m a and French's test as a modification o f the L M test for v a r ( y r , ) = 0 in the UC model (3). This is done by observing that the composite error term for A y t in (3) (with 13~ = 1 and g ( y t ) = y t ) is an MA(1), P t q - O l . V t _ l , and, when v a r ( Y r t ) -~ v a r ( ~ , ) = 0, the true M A ( 1 ) coefficient et equals 0. Inverting the M A to produce an infinite AR, (1 - a L + o~2L2 - o t 3 L 3 . . . ) x t = "1:t creates a J Many earnings series exhibit a weak first order dependence because of this measurement feature. J~ Returns will include the dividend yield so that Yt needs to be re-defined as beginning of period price plus dividend. In fact, the stability of the dividend yield means that most variation in returns is from the capital gains component. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 23 non-linear regression of the form x t = ( et x t_ l - ~ 2 x t - 2 .-I- ~ 3 x t _ 3 - . . . ) + 1)t = g ( z , , o t ) + v t, and the L M test that a = 0 involves examining the covariance between x , - g ( z t , e t = O) = x t, and [ 3 g ( z , , o t ) / O e t ] , ~ = o = {0[(otL - oL2L2 + o = x t - l , i.e., it examines the first order serial correlation coefficient of returns. Now, the L M test is a powerful test if the alternative is in the vicinity of the null, but m a y be dominated by others if the alternative is far from the null, i.e., et is not close to zero. This idea that one might use information that represents the alternative to improve the performance of the L M test has been exploited in the work of King (1985) on the design of point optimal tests. Under the null hypothesis, it is clear that any combination of past returns should be uncorrelated with current returns, and this indicates that we might form another test based on linear combinations of lagged returns that more closely approximates the alternative. Because the emphasis in F a m a and French is upon the importance of the temporary component and, as the var(Yrt) becomes large, et tends towards - 1, this points to an examination of the covariance b e t w e e n [ ~ g ( z t , a ) / a O t ] a = _ 1 = - - ( X t - 1 + 2 X t - 2 + 3 X t - 3 + "") and x r As this is ~/1 + 2"Y2 + 3"Y3 + .... (ignoring the sign), the argument suggests that F a m a and French are potentially improving on the L M test by utilizing information about the likely alternative. Ultimately, looked at from a variable addition perspective, one is trying to find suitable regressors to add to the equation describing returns x t, and there are many possibilities, of which F a m a and F r e n c h ' s is just one set. Power considerations will ultimately determine which variable addition test is best. 13 Let us term the coefficient estimated from the kth differenced regression, ~(k). Because the denominator of ~ ( k ) will tend to E [ ( ~ , ~ S ~ r , _ j ) 2 ] = k ~ if there is no temporary component, it is clear that ~ ( k ) will tend to zero as k rises. If there is a temporary component, the denominator still tends to infinity and the numerator remains hounded, so ~ ( k ) still tends to zero. However, for small k, ~ ( k ) will generally be negative if there is a temporary component, owing to the fact that returns have A k y r t in them and tit is uncorrelated. For example, if YTt was uncorrelated, then the first autocovariance of returns will be - v a r ( y r t ). Hence a temporary (mean-reverting) component makes ~ ( k ) negative, and a plot of ~ ( k ) against k should follow a U-shape with k. Indeed, F a m a and French find that this is so for many stocks they examine (for market, decile and industrial portfolios). ~ ( k ) can also be used as a measure of the importance of the temporary component. Because the R 2 from a regression of w t on zt is given by ~2 v a r ( z , ) / v a r ( w t ) , setting w, = Y t + k - Yt and Zt = Y~-- Y,-k, and using the fact that the variances of °t3L3"")xt]/O°t}a- 13 It needs to be emphasized that the LM test for et = - 1 is the KPSS test described earlier Tanaka (1990) - so that Fama and French are not performing an LM test that ct = - 1. Rather, they use the information about what the model would look like if ct was - 1 in order to construct a test that Ot~0. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 24 these two are equal due to stationarity, it is apparent that the R 2 is identical to the square of the estimated coefficient from the regression. If there is no temporary component this should be zero in large samples, so that [3(k) gives a measure of the importance of the temporary component at the kth period horizon. Because the [3(k) are the first order serial correlation coefficients of k-period returns, and it is known that there are small sample biases in this estimated coefficient, Fama and French provide some bias corrections that seem to be quite important. One should also note that, if [3(k) is zero, then the error in the regression is just the dependent variable Akye, and this will be an M A ( k - l) when r e -- A y, is white noise, necessitating the use of an autocorrelation consistent covariance matrix. Alternatively, one can generate standard errors by simulation methods; since the null hypothesis is the absence of temporary components, Ay e can be taken to be white noise and samples can be generated from this specification and the empirical distribution of [3(k) may be determined. Note that the test is very easily formed if we have available to us regression programs that do forward and backward lags. Autocorrelation-consistent standard errors for [~(k) can be computed in a number of ways. For a regression model Yt = x'ef3 + ue, where E(xeu ,) = O, the asymptotic variance of T 1/2 ( ~ - 8 ) is [ E ( x , x , ) '] - ['/ E -r E, x , u, E, x , u, )1 (6) [E(x,x;)]-' Defining d?,=xtu ,, the middle term will be ~ o ~ + 9~j~k ~ , . j =- l1,~, r j d~ when +, is an M A ( k - 1). Hansen and Hodrick (1980) used this approach, and it is the basis of Table 3, which presents [~(k) and t-ratios formed with these asymptotic standard Table 3 Long return tests for CRSP data Return length ( k ) FF ser. corr. FF t-stat Bol/Hod t 12 24 36 48 60 72 84 96 - 0,002 -0,159 --0.261 - 0.200 - 0.089 0.073 0.160 0.046 - 0.018 -- 1.11 - 1.67 - 1.5 - 0.60 0.43 0.76 0.18 -- 0.72 -- 1.78 - 1.58 - 0.79 -- 0.07 0.21 0.44 - 0.36 Column 2 gives the first order serial correlation coefficient of the long returns of length given in the first column. Column 3 is the F a m a - F r e n c h (FF) t-statistics constructed with H a n s e n - H o d r i c k (1980) standard errors. Column 4 gives the t-ratio that the coefficient in the regression o f one period returns r t k - I Wj r , + l - j is zero. upon ~ 2j=~ A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 25 errors. The evidence for predictability is not strong. Several authors have argued that the asymptotic theory may not be a good approximation, e.g., Nelson and Kim (1993) and Bollerslev and Hodrick (1992), finding that the use of t-ratios formed in this way results in a strong tendency to reject the null that [3(k) is zero. Richardson and Stock (1989) examine the distribution of sums of ~(k) as well as that of a joint test that the ~(k) are zero when k / T tends to a constant as T ~ oc. The distribution is non-standard. Bollerslev and Hodrick suggest an altemative way of measuring the predictability of long-horizon returns that exploits the presumed covariance stationarity of returns. As observed previously, the numerator of ~(k) involves the LW= L / )1 [ j= ] ] Thus, where t o ~ = j for l < _ j < k and t o j = 2 k - j for ( k + l ) < j < ( 2 k - 1 ) . when k = 3, one is considering the covariance of rt+ t with ( r t + 2 r t_ l + 3rt-2 + 2rt_ 3 + rt_4) , which equals ~'t + 2~/2 + 3"/3 + 2~/4 + "/5, and, as was established earlier, represents the numerator of ~(k) in large samples. A simple way to see the above for k = 3 is to write the LHS of (7) as COV COV = COV [{(1 +L-' +L-2)r,+,},{(1 +L +LZ)rt}] [rt+t,(L-Z+2L-l+3+2L+LZ)rt] [L-2r,.l,{1 +2L+3L 2+2L 3+L4lrt]. Hence they advocate a regression of rt+ t upon ]E~k=-~Itojrt+ l - j , and a t-test on the coefficient of this regressor, observing that the error term in this regression is serially uncorrelated, and so no special formula needs to be used to take account of the serial correlation. Simulations in their paper reveal this test to be much better behaved in " s m a l l " samples. Table 3 gives t-statistics that the coefficient in the regression of rt+ 1 against ~k=S Itojrt+ l--j is zero for various values of k. The pattern turns out to be very similar to that observed with Fama and French's method and one would draw essentially the same conclusions regardless of which type of test one adopted. Even if our series on (say) earnings had passed all the tests described above, it would still not follow that the series would be independent. The reason is that they are essentially tests of a lack of correlation in returns across time, and independence is a broader notion, referring to the ability to express a joint density as the product of marginals. One way to think of the relation of independence and correlation is to refer to "independence in the rth m o m e n t " . Thus what we are investigating above is independence in the first moment, whereas complete independence requires that this be true for all other moments as well. In particular, let x t and x,_~ (j4:0) be independent with zero means. Then independence 26 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 Table 4 Measures of non-linearity in series ACF STK-M t BOND t SUS/SF t STK-D t Lag 1 Lag 2 Lag 3 Lag 4 Lag 5 Lag 6 Lag 7 Lag 8 Lag 9 Lag l0 LM test BDS test Tsay test 0.292 0.159 0.225 0.122 0.060 0.093 0.100 0.339 0.335 0.201 429.5 8.88 3.35 8.1 4.4 6.3 3.4 1.7 2.6 2.8 9.5 9.4 5.6 0.298 0.308 0.207 0.275 0.200 0.251 0.285 0.212 0.173 0.158 301.6 22.74 9.45 6.5 6.8 4.6 6.1 4.4 5.6 6.3 4.7 3.9 3.6 0.114 0.118 0.169 0.212 0.083 0.110 0.161 0.012 0.102 0.102 111.9 6.88 1.02 2.9 3.0 4.3 5.4 2.1 2.8 4.1 0.3 2.6 0.1 0.189 0.174 0.140 0.092 0.142 0.113 0.076 0.083 0.076 0.074 24.0 22.1 17.8 11.7 18.0 14.4 9.6 10.5 9.6 9.0 The BDS test was constructed by setting 3' to 1.25 times the standard deviation of the series and using N = 4 histories. The test should be referred to an N(0,1) random variable. The Tsay test uses p = 5 lags and is referred to an F distribution with pm = p ( p + 1)/2 and T-pm-p-I degrees of freedom. For m = 5 the 5% critical value is approximately 2.07 (T =~). The LM test is T times the sum of the squared acf value for 12 lags. The 5% critical value for ×(12) is 21.02. b e t w e e n x t and x t _ i m e a n s that E [ g ( x , ) h ( x t _ j ) ] = E[g(xt)]E[h(xt_j)], imply0 for any m e a s u r a b l e functions g ( . ) and h ( . ) . ing that c o v ( g ( x t ) h ( x t _ j ) ) = O f course the n u m b e r o f functions is vast, m a k i n g it i m p o s s i b l e to truly test i n d e p e n d e n c e with a finite sample. O n e c o u l d replace g ( . ) and h ( - ) by p o l y n o mial approximations, e.g., g ( x ~ ) = et o + oLtx t + a 2 x ~ + .... h ( x , _ j ) = 8 0 + 8~ x , _ j + 82 xt2_j + ... and then test if all the pairwise terms, cov(xtkx~'~_j), are zero, a l l o w i n g the order o f a p p r o x i m a t i n g p o l y n o m i a l s to tend to infinity with the sample size. Such non-parametric tests h a v e b e e n suggested in the literature but rarely used - see C a m e r o n and Trivedi (1993) for such an approach. Instead, attention has focused upon w h e t h e r cov(xktx~_j) is zero for certain specific v a l u e s o f k and f . O f particular interest h a v e been the c h o i c e s o f k = f = 2 and k - - 2, f = 1. T h e s e h y p o t h e s e s could be tested in one o f two ways. The first i n v o l v e s regressing x~ against a constant and one of x~_ i and xt_ J, while the second f o r m s either X2X t t2- j or X 2 X t _ j and regresses these against a constant, testing if the intercept is zero. Table 4 presents these quantities for our financial series and it is clear f r o m this that there is higher order d e p e n d e n c e ( S T K - M and S T K - D are the m o n t h l y and daily stock returns data respectively). G e n e r a l l y , the a r g u m e n t has been that the lack o f i n d e p e n d e n c e arises f r o m the p r e s e n c e o f " n o n l i n e a r i t i e s " in the series. This is a s o m e w h a t i m p r e c i s e description and one that arises m o r e by default than anything else. Since the a.c.f, o f a series x t = A y t tests if x t has a linear structure, such as x, = ~ = 0 1 3 j e t _ j, any A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 27 dependence in x t manifest in higher order moments might cause us to say that x t has a non-linear structure. Attempts have been made to construct various tests for general non-linear dependence. The best known o f these are T s a y ' s test (Tsay, 1986) and the BDS test - Brock et al. (1987) - but many others could be designed• T s a y ' s test implicity involves testing if the coefficients of w ' t = ( x Z , _ t , 2 x t_ ~x,_ 2, • . . , X ,_p) are zero in the regression of x, upon z~P = (1, x,_ 1, • . ., X t _ p ) and w t. His test is not expressed in this way but the interpretation just given may be reconciled with his format by starting with x, = + + e,, (8) taking expectations with respect to zt, E ( x , I z,) = z;a + E ( w , I z , y ~ l , (9) and then subtracting (9) from (8) to leave x,- E ( x, [ zt) = ( w , - E ( w , l z,))'~/ + e,. (10) If one assumes that the condi6onal expectations in (10) are linear in zt and are estimated as the predictions, (Xt,Wt), from the regression of x t and w t upon zt respectively, one gets T s a y ' s test, expressed as a regression of the residuals xt - -~t against w t - fit- 14 Looking at T s a y ' s test in this way emphasizes that it is a test for non-linearities in the conditional mean• The BDS test can be shown to be testing for independence by focusing upon estimated marginal and joint densities. Consider two random variables X and Z. These are independent if fx( x ) f : ( z ) =fx:(x,z), (11) where f ( . ) are the density functions of the random variables. Consider replacing x and z with the random variables X and Z themselves, and, after that, taking the expectation. Then, due to independence, E[L (x)] E[L (z)] = E(L~(X,Z)), (12) and we will now show that the BDS test involves testing this condition. Constructing the test based on the sample moments gives r-' x, t z, - r - ' EL ( x,,z,). t~l The problem is that we d o n ' t know the densities and therefore we need to estimate them. Suppose we do this non-parametrically with symmetric kernels K x J4 The fact that the conditional expectations may not be linear is unimportant under the null hypothesis, since then E( xl i z t) = 0 and "y = O. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 28 and K z, widow widths h , and h : , using the leave-one-out estimator, and a product kernel for the estimation o f the joint density. 15 Then the test would be based on T-IE t t= E lj= I,je- h x ( r - 1) K x ( ( x t - X g ) / h x ) t × T-1 E t= - ~ T T | T -1 E E h " "r t=U=l,i~t 1 Kz((Zt-Zj)/hz t= l,j4=t ,nA -1) X x ( ( x , - x ; ) / h , ) K z ( ( z , - z;)/hz)) E K,.(( x , - 2hxT(T- 1) l_<t<j x~)/hx x 2 h z r ( r _ 1) ~<,<j 1 - r 2h~h.r(r- 1) . ~ E K.~((x,-xj)/hx)Kz((zt-zj)/hz). l<_t<j In the BDS case Z is simply the lagged value of X and the marginal density will be the same for both variables, so it makes sense to set J ~ ( x ) = J ~ ( z ) as well as making K , = K z = K, and h~. = h: = h. The effective sample size is now T 1 = T 1 and t = 1 corresponds to the second observation o f the original sample. Hence the test simplifies to E K((x,2 h T I ( T ~ - 1) l<_¢<j 1 -2h2TI(TI-1) xj)/h Tj E l<t<j K((xt-xj)//h)K((zt-zj)//h)), which can be written as C2 C2 h2 h2 , where C k = 2(1/T1(T 1 -- 1))ET'_<, < jH ~=i K((x a -- xjt)/h) and x,i = x t, Xt2 = Z , . Eliminating the constant o f proportionality h 2, and replacing the kernel by J5 For all these concepts see Silverman (1986) and H~irdle (1990). A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 29 Ix a - x j l [ ) , , where I ( . ) is an indicator function, gives the numerator of the BDS test when only x, and its lag is used. 16 Since the denominator is just the standard deviation of the numerator, the two test statistics are identical. Extension to higher order lags is straightforward. A choice of -/, the window width, and N (the maximum lag in x, used to form z, in constructing the test) needs to be made for the BDS test to be computed. In our situation we choose N = 4 and y as 1.25 times the sample standard deviation of the variable under investigation. Applying each of these tests to the series supports the lack of independence found from the a.c.f, of squares (see Table 4). It should be mentioned that non-linearities in stock returns, exchange rates and interest rates have been documented before with these tests by Scheinkman and LeBaron (1989), Hsieh (1989b) and Kearns (1990b) respectively. The connection between method of moments testing and the BDS test illustrated above is very useful for understanding many of the properties of the test. Defining vt = E ( f x ( X , ) ) E ( f z ( Z , ) ) - f x z ( X t , Z , ) the BDS test can be regarded as testing if the sample mean of v, is zero, i.e., whether the intercept in the regression of v, against unity is zero. Clearly it is possible that E ( v , ) = 0 but that the random variables are not independent: although (11) implies (12) the converse is not true. It is also interesting to observe that the t-statistic from such a regression is robust to heteroskedasticity in v,. To see why, specialize (6) to the case where there is only heteroskedasticity to get White's heteroskedastic consistent standard errors (W hite, 1980). These are ( T - ~ y 2 x 2 ) - ~ ( T - ' ~ x , x ' t ~ ¢ 2 , ) ( T - t ~ x Z t ) - j , where ~, are the residuals from the OLS regression of v, on x,. These are to be contrasted with the " f o r m u l a ^ 2u ( T - 1 E x t2 ) - 1 associated with OLS computer programs, where (3" variance" ~r ^ u2 --I T ^2 T El= lut. In the special case that x t = 1 the two estimators coincide, and this is the situation here. Hence, this suggests that the BDS test is likely to be robust to heteroskedasticity (but not to serial correlation). The implications of a lack of dependence between x, and its past history raises the issue of how to summarize such dependence in a useful way. One way is to treat functions of x, as being determined by models such as (1), (2) and (3). Thus, defining the expectation of a random variable conditioned upon its past history as Et- I , these models make E,_ i(g(xt)) a function of x t _ i ( j > 0). When g ( x t ) = x 2, and E,_ i ( x t ) = O, cr2 = E,_ l(X,2) is the conditional variance of x, and having xt2 generated by such models makes ~2 potentially dependent upon the past. The distinction between a conditional, cr2, and an unconditional variance, E(x2), was emphasized in Engle (1982), and it is a distinction that has proved to be more important in finance than in any other field of econometrics. Prior to this work, I(h/2- .6 The numerator of the BDS test is CN - C~ where CN = 2(I/TN(T N - 1))Ef~ ,< jFlkU_-0t 1(-,/-[xt+ k - xj+kl) and TN = T - N+ 1. When N= 1 we get the formula in the text. 30 A. P a g a n / J o u r n a l o f E m p i r i c a l F i n a n c e 3 (1996) 1 5 - 1 0 2 many finance models were understood in terms of the unconditional moments of the series, e.g., the C A P M was stated in terms of the covariance of the return on an asset with the market return, but it soon became evident from the theoretical derivations that the proper items of investigation were actually the conditional moments, unless assets were being held for long periods of time, whereupon the relevant conditioning set for decisions is far in the past and the distinction between conditional and unconditional moments evaporates, i.e., for stationary series in which E ( g ( x t ) ) exists lim k__,~ E , _ k ( g ( x t ) ) = E ( g ( x t ) ) . Table 4 not only reveals that the squares of returns are correlated, but the slow decline in the autocorrelation coefficients may be used to argue that the correlation is very persistent. ~7 Brockwell and Davis (1991) defined a short m e m o r y series as one whose autocorrelation coefficients were bounded like [P:I < Ca j, as contrasted with a long m e m o r y process in which pj declined like C 2 a - 1, where d < .5 and C is a non-zero constant. Using this definition, the squares of the returns seem to possess long memory. Findings of long memory in the squares of stock returns are legion - examples being Ding et al. (1993) and De Lima and Crato (1994) - but it has also been found in other series such as the forward premium, Baillie and Bollerslev (1994). Many techniques have been used to investigate this property - see Baillie (1996) for a survey. By far the simplest would be to fit models (1), (2) and (3) to suitably transformed xt, e.g., xtz o r [xt[ , and then to test for unit roots, fractional integration etc. as outlined earlier. Sometimes a " m o d e l independent" way of assessing long-range dependence is a d v o c a t e d in the f o r m o f the m o d i f i e d r e - s c a l e d r a n g e statistic, S- l{max 1 < t _< :rE'i= 1( xi -- 7c) -- min r <, _< T~..ti = i( X i - - .~)}, where S 2 is an estimate of the spectrum of x t using q autocovariances and ~ is the sample mean of x , see Lo (1991) and De L i m a et al. (1994). The problem with this measure is the choice of q. As q grows the test statistic is made completely robust to serial correlation of any order, including long-range dependence, so that the power of the test eventually equals its size. Hence any conclusions obtained simply reflect the choice of q. If you want to find long range dependence keep q short; if you don't, make q long. Table 5 looks at the cross correlation between x 2 and x ~ _ j ( j > 0), a relationship sometimes encapsulated as the " l e v e r a g e effect" that volatility depends upon the sign of returns. There is evidence here of such a negative effect in stock returns and exchange rates, while for interest rates it is not as clear. If there is some relationship for interest rates it would be a positive rather than negative one. The magnitude of the effect may not be large but what is striking is its persistence, with many negative values for the correlation coefficients and a slow dying away of its values. 17 T h e s a m e w o u l d be true if absolute returns w e r e e x a m i n e d , see D i n g et al. ( 1 9 9 3 ) . A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 31 Table 5 Cross correlations between squared a n d level returns CROSS STK-M t BOND t SUS/SF t STK-D t Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag 0.02 -0.09 -0.13 -0.12 - 0.12 -0.12 -0.I0 0.01 0.1 - 0.06 - 0.18 -0.14 0.6 2.4 3.6 3.3 3.2 3.2 2.8 0.2 2.7 1.5 5.0 4.0 0.04 0.12 0.08 -0.02 - 0.07 0.15 -0.02 0.06 0.00 0.06 - 0.02 0.05 0.9 2.7 1.7 0.5 1.6 3.4 0.4 1.2 0.1 3.6 0.3 1.1 -0.10 -0.04 -0.13 -0.10 - 0.00 -0.03 -0.02 - 0.08 -0.08 0.02 0.02 -0.05 2.5 1.0 3.2 2.5 0.0 0.8 0.5 2.1 2.1 0.5 0.5 1.3 -0.07 -0.08 -0.06 -0.05 - 0.04 -0.05 -0.05 - 0.05 -0.05 - 0.03 - 0.03 -0.02 8.7 10.4 7.2 6.4 5.6 6.4 5.7 6.0 5.8 4.3 3.5 2.9 1 2 3 4 5 6 7 8 9 10 11 12 2.3. The existence of moments Very early in the history of financial econometrics Mandlebrot (1963) raised the spectre that asset returns had second (unconditional) moments that did not exist. To assess that possibility he recommended that an examination of the recursive variance be made, and this has been done by Hois and De Vries (1991) for exchange rates, Pagan and Schwert (1990b) for stock retums, and Loretan and Phillips (1994) for a wide variety of returns. If the process x, was covariance = ' 2r stationary then the sequence of recursive variances, ~L2.r = T - I ~j=lXt,T 1,2,3 . . . . formed by adding on one observation at a time, should converge to E(x 2) as -r --* ~. If E(x~) does not exist then there will be no convergence, and in fact one tends to observe " j u m p s " in the sequence ix2,~. Pagan and Schwert (1990b) describe a number of possible test statistics to formally test this feature. One involves testing if the mean of the sample variance over one part of the sample is the same as over the remainder. Another is based directly upon @(t) = (TD)-t/2E~= l(X 2 --~Lz,T) ' where D is an estimate of the asymptotic variance of T - I / 2 E ( x ~ - ~ 2 , r ) (in applications below set to " ~ o + 2 ~ = ~ i ( 1 ( j / 9 ) ) , , where @j is the jth estimated autocovariance of x~). The logic of the test is seen by replacing ~2.r by tx 2, which it estimates. Then the random variable T - l / 2 ~ ( x j 2 -- P~2) should have variance v. O(t) is an N(O,(t/T)(1 - t / T ) ) random variable under the null hypothesis of a constant variance and Fig. 1 shows a plot of O(t) along with the 99% confidence intervals for the monthly stock returns series. Stock returns do not seem to be covariance stationary. Loretan and Phillips (1994) examine a very wide range of series, all of which have this characteristic. As Loretan and Phillips (1994) point out, what is at issue here is the shape of the density in the tails. Unless it declines sufficiently swiftly moments will not exist, as should be familiar from analysis of the Cauchy density, e.g. see Spanos 32 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 2"3409 I 1'13331 ~ .... Upper Confidence -.074289~ -1.2819~ "'''-_ 1 ............. 193 385 int~e ~ ' " / 577 768 Fig. 1. C U S U M o f squares test, CRSPretums, 99%confidence intervals. (1986, p. 70). More precisely, suppose that there are Pareto-type tails in the density. This means that P r ( X > x ) = kx -~, so that l o g [ P ( X > x ) ] = log k - a logx. If [al > c then moments of order c exist, so that what we are interested in is the question of the value of c~ for our data. After ordering the data to produce x(~)...x(r ), where x(l) > X(2 ) > ... > X(T ) they estimate P ( X > x(j)) by the empirical survivor function [ # ( x , >_ x(j))/T] = j ~ T, and give a plot of the log /3(X > x(j)) -2.0023~ -3'6901t.edLine ' Data~ x slope=-2.4 x -5.3779 -7.0657 -2.7298 r -2,1400 f -I.5502 -.96037 Fig. 2. Right tail of empirical survivor function of CRSP returns. 33 A. P a g a n / J o u r n a l o f Empirical Finance 3 (1996) 15-102 Table 6 Estimates of the tail index for various financial series 6t (left) ~t (right) m SRM SRD BR SF/$ 2.03 2.81 60 2.29 2.52 1600 1.49 1.50 40 2.70 3.26 50 SRM = Monthly stock returns (Fig. 1.5). SRD = Daily stock returns. BR = Change in bond rate. SF/$ = Change in log SF/$ exchange rate. against log x(j). In Fig. 2 we show a plot of these quantities for the monthly stock return data as well as the regression line fitted to it over the last 68 data points (after data has been re-ordered). The slope of this line is - 2.4, which points to the fact that the variance might exist, but no higher order moments. Loretan and Phillips also utilize more formal methods for estimating the inverse of the tail index, ~ / = or- ~, and thereby et itself, of which the most popular is the proposal of Hill (1975) that ¢/= ( m -- logx~i) -- l o g x ( ~ ) . A problem is to determine m. Asymptotic analysis with x t being i.i.d, yields d ml/2(~t - et)~N(O,et 2 ), showing that m needs to be large. But clearly the larger m is the greater the extent to which one is sampling out o f the range of the " t a i l " of the density, and thereby the larger the bias found in the estimator. The situation is made even more complex by the fact that x t is not i.i.d.: in Kearns and Pagan (1992), where x t is generated from heteroskedastic processes typical of daily stock returns, it is found that the standard deviation of dt is around seven times larger than predicted from the i.i.d, case. All of this points to the fact that very large data sets are needed to be able to accurately compute a tail index. Table 6 gives the tail index computation for the three monthly series we are working with, as well as that for U.S. daily stock data from 1885 to 1987. The evidence points to the existence o f the second but probably not the fourth moment for all series except interest rates, where the situation is clouded. Increasing m to 100 for this series produced estimates of 1.03 and 1.47 for or, so that it may be that the second moment does not exist for that series. 18 O f course it may be that it is higher order moments that fail to exist rather than the second; Loretan and Phillips point to the ~8 Although one might argue that all moments are finite, albeit large, the point is that the series is behaoing as if the moments do not exist. 34 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 possibility that the fourth moment may behave in this way. 19 Since the variance of the estimated second moment depends upon the fourth moment the failure to exist of the latter will generally mean that occasional large outliers will be found for estimates of the second moment and this will show up as jumps in the recursive estimates of the second moment. One might ask the importance of this observation that moments might not exist for financial analysis. There are a number of responses that can be made. First, if true, it rules out certain models as candidates for the analysis of financial time series, in particular those that imply covariance stationarity for returns. Second, the inputs into many financial models pre-suppose the existence o f certain moments, for example the optimal hedging formula depends upon the variance of asset returns. If moments do not exist, attempts to estimate quantities that depend upon them will generally exhibit a lot of volatility, and it may be that this explains the observed fact that many estimates made of the optimal hedge display quite a bit of instability. More generally, this fact may be important to many econometric procedures, e.g., tests for unit roots such as the variance ratio test and the Dickey-Fuller test. The effects are likely to be particularly devastating for the former. 2.4. Are returns normally distributed? Most tests of normality focus upon higher order moments of x t, in particular inquiring if E ( x 3) = O, E ( x ~ ) = 3[E(x,2)] 2, i.e., whether there is skewness or excess kurtosis. This leads to the two most widely used tests, -~ = T - ' ~r= j ( 1 / ~ )a~ and ?2 = Y~tr= , ( 1 / ~ )(fit4 - 3t~2a~),, where a t = x, ~. There are some difficulties with the use of these statistics to assess normality stemming from the fact that the quantities xtk exhibit dependence. Accordingly, one might want to make these tests robust to this dependence. Perhaps the simplest way to do this is to recognize that there are three moment conditions being used in the construction of each of the tests, e.g., taking the test for skewness these would 3 I"1) = 0 ; E ( x t) - - I,~ = 0 ; and E ( ( x t -- hi,) 2 - - or e ) = 0 . Definbe E ( ( 1 / ~ ) x ing these as E(~(xt;O)) = 0, where 0' = [n"l IJ- or2], the test for skewness is based - 19 Loretan and Phillips observe that the failure o f a fourth m o m e n t to exist c a n be a problem with the c u s u m test 0 ( t ) . In particular, they allow the errors to be realizations from a stable law, in w h i c h case the quantity v in the d e n o m i n a t o r o f the test m a y not exist. T h e y propose modifications o f the test that are robust to this feature. M a n d l e b r o t (1963) also argued for stable processes as a w a y o f inducing infinite variance into financial time series. The problem with this a p p r o a c h is that it fails to produce series that are dependent, and some additions need to be m a d e to m a k e the series exhibit the type o f d e p e n d e n c e seen in the squares o f returns. F r o m some simulations that Phil K e a r n s a n d 1 have run, it d o e s not seem as if the a s s u m p t i o n o f stable laws is a g o o d predictor o f w h a t the b e h a v i o r o f test statistics w o u l d be w h e n m o m e n t s fail to exist b e c a u s e o f an I G A R C H structure - see also G h o s e and K r o n e r (1994). A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 35 Table 7 Testing departures from normality for returns data STK ret-M Skewness t Non-robust Robust Kurtosis t Non-robust Robust Fraction around0 f(0) STK ret-D Bond rates Ex rate 3.04 0.4 - 13.1 - 0.78 - 11.36 - 0.96 - 1.14 - 0.46 44.28 2. I 1 0.14 0.52 213.0 2.17 0.14 0.56 55.1 1.65 0.27 0.99 7.54 2.04 0.14 0.52 The skewness and kurtosis tests should be referred to an N(0,1) random variable. Fraction is the fraction of standardized x t lying between -0.1257 and 0.1257. This is 0.1 for an N(0,1) random variable. NP estimate of density of standardized x(t) at origin, f(0) for an N(0,1) density is 0.40. upon the first element of T -1 ~tr= l~t(xt;O),, where 0 is the method of moments estimator of 0, i.e., upon ?1, and it is well known from the theory of G M M estimators how to make tests concerning "r robust to dependence in 4,- The tests ?j ( j = 1,2) given above then correspond to the non-robust "t-statistic" for whether the first element of T-~ Y'-tr=~ , ( x t ; 0 ) is zero. Table 7 presents both robust and non-robust tests of skewness and excess kurtosis. What is striking from this table is how the conclusions reached vary according to whether one has allowed for the dependence in x~. If no allowance is made there are very emphatic rejections, but the converse tends to be true once some adjustment is performed. Only in the case of the kurtosis tests do the conclusions generally match up, and the presence of excess kurtosis is frequently interpreted as the density of returns possessing " f a t tails' '. Rather than focusing upon the higher order moments of returns, it is sometimes more useful to obtain a picture of the density for x t by using non-parametric estimation methods, and to concentrate upon certain of the characteristics that stand out from such an inspection. An easy to way to estimate the density T i K ( ( x t -non-parametrically is to use a kernel based estimator f ( x ) = (1/Th))Zt= x ) / h ) . Fig. 3, Fig. 4, and Fig. 5, computed with a Gaussian kernel and c = 0.9dx T - ~ / 5 for the window width, present non-parametric estimates of the densities of stock returns and the changes in the one month bond yield and the log of S U S / S F respectively, standardized as 6~- t (xt _ ~). For comparison purposes the figures also have the density of an N(0,1) density. The densities for stock returns and exchange rates tend to be fatter tailed than the normal and to possess marked peaks. It is sometimes difficult to see the fatter tails, but the range of the horizontal axis coincides with the m i n i m u m and m a x i m u m values observed in the data, events which would be highly improbable with a normal density. An alternative to kernel based non-parametric density estimation is the " s e m i 36 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 .56042 nsity / .35663 iI iiI .15284 Density -.050947 -5.2719 I -1.4040 2.4639 6.3319 Fig. 3. NP estimate of density of standardized monthly CRSP returns. non-parametric" (SNP) method advanced in Gallant et al. (1991) and other articles. Suppose that z is a random variable with zero expectation and unit variance. The classical approach to the approximation of the density for z would replace it by the product of the standard normal density q b ( z ) = ( 2 r r ) -~/2 exp( - z 2) and a polynomial [ P( z)] 2 = [1 + ~ l z + ... + c~p z P ] 2. To force this approximation to integrate to unity the approximating density must therefore be .82415 \i nsity .52448 if j~ .22477 -.074922 I -7.6773 -3.7155 nsity I .24628 4.2081 Fig. 4. NP density of standardized returns on one month bond yields. A. Pagan~Journal o f Empirical Finance 3 (1996) 15-102 .54836 /•II ~~ r .34903 ~Density / I .14970 37 /~ -.049637 -3.2923 J~ i -.89371 \\ I I.5049 . 3.903 Fig. 5. N P density of returns to $ U S / S F . f ( z ) = P 2 ( z ) d ? ( z ) / f P2(Okb(@)dt~. 20 If the density of x is to be approximated, and x has mean ix and standard deviation tr, Gallant finds the density of x as ~-~fi(z), where z = t r - l ( x - Ix). After these substitutions all unknown parameters, c t l , . . . , ctp, Ix and tr are chosen by maximizing Y'!j=l log fi(xg). Computational details are available in Gallant et al. (1991) with an application to exchange rates, while Gallant et al. (1992) explore stock returns and the volume of trading. 21 All of these studies show that financial time series exhibit densities which have tails that are fatter than the normal, and have much higher peaks than the normal around zero, i.e., one has too many small and large returns to have come from a normal density. It is worth commenting here on the conventional wisdom that fat tails are the cause of excess kurtosis. Deleting the two largest and smallest observations in the data on one-month bond yields reduces "~2 from 55.1 to 28.2, demonstrating the influence of " f a t tails". However, if one further reduces the sample by eliminating the 200 smallest and largest observations, essentially leaving the observations around the mean, the statistic becomes - 8 . 3 9 , showing that the coefficient also 20 In the classical approach P 2 ( z ) are the sum of Hermite polynomials - see Spanos (1986, p. 204) - but this simply represents a reparametrization. 2J In most of the SNP applications the density being estimated is the density of x, conditional upon x,_ t . . . . . xt_ p and the volume of trading rather than the unconditional density, and the latter is derived from the former. The procedure described for estimation remains the same however, except that ~r and Ix are taken to be simple functions of x,_ t . . . . . xt_ p whose parameters are to be estimated along with the ctj. 38 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 responds to the strong peak or leptokurtosis in the density. 22 Because the leptokurtosis is so dramatic it is useful to have some measures of it that are easily computed, and we have found two of these to be useful. The first compares the fraction of observations of the standardized data that lie between + 0.1257; this should be 0.1 if the density was N(0,1). The second is the non-parametric density estimate of the standardized data at the origin, J~0). Table 7 shows these, and they replicate the visual impression. 3. Models for univariate series 3.1. Formulating models of volatility Models need to be devised to reproduce some (or all) of the characteristics just listed. 23 A useful strategy is to view returns as being the product of two random variables, i.e., x t = crte ,, where % is i.i.d. (0,1), and 0-, is a random variable independent of E t whose properties will determine the nature of x t. If 0-2 is assumed to be i.i.d. (0-2,v), then E ( x ~ ) = E(tr~)E(¢~) and, if e t is n.i.d. (0,1), E(x 4) = 3E(0- 4) > 3[E(0-t2)] 2 = 3 0 -4, which illustrates the fact that this model of returns produces the desired feature of excess kurtosis. Historically, such "mixture models" appealed for other reasons, principally the ability to interpret cr, as related to the quantity of " n e w s " items arriving at a point in time. Such an interpretation means that the variance of returns at a point in time will be an increasing function of the amount of " n e w s " arriving in the market. Volatility of returns will then be high when the market is exposed to large amounts of information. Studies have also appeared in which crt was taken to be an increasing Poisson process while e, is n.i.d. (ix,,0-2). Such " j u m p " processes have been applied to stock returns, Friedman and Laibson (1989) and Jorion (1988), and to exchange rates by Nieuwland et al. (1994) and Feinstone (1987). Perhaps the major defect of this literature was its inability to account for autocorrelation in xt2, as the assumptions make x t (and hence x~) i.i.d. This need not lead one to jettison the model, but, instead, should concentrate attention upon the fact that cr~ has to be made dependent upon the past in some way, and choosing suitable relations for 0-2 has been a major item of interest in financial 22 This symmetric trimming strategy is strictly valid only if the data density is symmetric and Table 7 does indicate that there may be a mild departure from it. 23 In an interesting paper Harrison (1994) shows that the characteristics of 18th century financial asset returns are the same as the 20th century ones. This is not to say that all series have these characteristics; Diebold and Lopez (1995) point to the fact that there is considerable variety in the behaviour of series. Nevertheless, whilst there are always exceptions to " s t y l i z e d facts", they are useful as summary measures of what is distinctive about many financial series and therefore what must be generally accounted for. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 39 e c o n o m e t r i c s during the past decade. 24 B r o a d l y , there are t w o w a y s o f doing this. O n e is to r e c o g n i z e that (rE is a r a n d o m variable and to regard it as b e i n g generated f r o m one o f the processes (1), (2) and (3). An a l t e m a t i v e is to allow x 2 to f o l l o w the s a m e set o f processes, i.e., g(x~) = x 2, and then derive the imputed m o d e l for (rE that generates the requisite o u t c o m e . Both approaches are featured in the literature. A s well as this division, there is a further one, in that, w h e n f o l l o w i n g the first strategy, it is possible to distinguish b e t w e e n cases in w h i c h (r E is explicitly related to past returns, and those w h e n it is treated as a r a n d o m variable with no specific reference to past returns as a determinant o f its e v o l u t i o n a r y path. The latter is generally referred to as " s t o c h a s t i c v o l a t i l i t y " and will be treated later in this section. 25 3.1.1. The G A R C H class o f models E n g l e (1982) introduced the A u t o r e g r e s s i v e Conditional Heteroskedasticity m o d e l o f order one ( A R C H ( l ) ) 26 ~E = So + o , ~ , L , , which, using the identity x 2 = E t_ t(x~) + v, = (rE + V,, w h e r e E t_ l(v,) = 0, bec o m e s an A R ( 1 ) in the squared returns, x,2 = a0 + a l x 2 _ , + v , , and this p r o d u c e d a simple w a y o f capturing the d e p e n d e n c e in x,2, or " v o l a t i l i t y c l u s t e r i n g " . A q u i c k glance at the i m p l i e d a.c.f, for x t2 h o w e v e r s h o w s that it is unlikely that such a simple m o d e l has m a n a g e d to capture all the characteristics. The a.c.f, has both h e i g h t and shape dimensions, and a m i n i m u m o f two parameters m i g h t be e x p e c t e d to be n e e d e d to capture both features. In the A R C H ( I ) m o d e l a single parameter et I has to account for both. A s the autocorrelation f u n c t i o n of an A R ( 1 ) is pj = tx~, a small estimated v a l u e o f Pl implies a l o w v a l u e for ot l, while the o b s e r v e d persistence in the autocorrelation function o f squared returns d e m a n d s that et~ be set close to unity. Inevitably, it is impossible for the A R C H ( l ) m o d e l to reconcile these two features. 24 Another way of inducing some dependence, used by Nieuwland et al. (1994) is to have a jump process driving an AR(1) in x~. This does not seem a very satisfactory solution as the dependence in x t itself is weak. In fact Nieuwland et al. find that they need tr~ to be dependent as well. 25 Such a distinction does not bear up under close examination. Suppose that et was distributed as Student's t and ~t was a function of past x,_j. Now ~t could be written as ×{q, where ×, is an inverted gamma random variable and xl~ is N(0,1). Accordingly, x t = trtXt'qt = ~trb and now ~t is "unobservable". Paradoxes such as these require precise definitions to resolve them, and these have been supplied by Andersen (1992). 26 Engle actually has x t = w'tb + e t and tr2 is related to e~_ t but we will assume that there are no variables in wr The generalization is easily made. 40 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 The introduction of the Generalized A R C H (GARCH(1,1)) process by Boilerslev (1986) improved matters, being a two parameter model 0-2 = a 0 + [3,0-L, + a,xt2_ 1, (13) and implying an ARMA(1,1) process for xtz of x 2 = a o + ( ~ , + [3,) x2_, + v t - (14) [3Yt-," In this model Pl = [1 - [31(a~ + [31)]a](1 - 2131(a ~ + [3l) + [32) and autocorrelations thereafter die away like p j = (131 + e~l)J p l ( j ~ 2). Thus one could have a small initial autocorrelation coefficient, and yet have the others slowly dying away, provided that a~ was small and [3~ + a j large; such a combination effectively produces " r o o t cancellation" in the AR and MA polynomials of (14) and the resulting acf of xt2 looks like that of white noise, i.e., flat. Another way of introducing persistence into the autocorrelation function of squares is to invoke a components model as in (3). Engle and Lee (1994) do this, selecting the following model for 0-2, 0"2 -~" 0"p2t ..]_ O.T2t = +,',-1 (7 - + [31)L)0"L = + [3,)o, + - + 1. Because of the presence of the same error in both components this is the Beveridge-Nelson model of permanent and transitory effects (Beveridge and Nelson, 1981). In terms of observables it becomes, (I - (~1 + [31)L)Ax,~ = (1 - ~, - ~1)o, + [1 - (1 + ~, - (1 - ~,1 - ~31)+)c + [31c2] o,. It is clear from this expression that, if 41 + 13~ = 1 and + - 0, there will be a unit root that effectively cancels, leaving an ARMA(1,1) process of the same form as the GARCH(1,1) model. Engle and Lee conclude that this model actually gives a better representation of the data than the GARCH(1,1) model. There are quite different long-run implications however. Unless a I + 131 = 1, i.e., the G A R C H process is an Integrated G A R C H (IGARCH) process, the ARMA(1,1) model implied by G A R C H does not have a unit root, while the components model always does, so that a fraction of shocks into volatility persists. An integrated process is one with very long memory, whereas a non-integrated A R M A process has only short memory. If it is held that the evidence is actually in favor of long memory in x,2 it is natural to try to directly model it. The UC model just discussed does so, at the expense of introducing a unit root into the x,z series, but it might be felt that "treading more softly" would be advantageous. It seems natural therefore to seek to apply the fractional integration model to volatility. Baillie et al. (1996) have done this, naming the model FIGARCH. The second characteristic of the data which models may need to account for is the correlation between xtz and x t _ j. Using the basic model, E ( x Z t x t _ j ) = A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 41 E(e~Gr2e,_jcr,_j) = E(cr2e,_flr,_). If Gr2 is an even function of e,_j it follows that this covariance is zero, indicating that any model seeking to explain the cross correlation must specify the conditional variance as an asymmetric function of %_j. Most importantly, G A R C H models will be incapable of replicating the feature, and this has led to a search for modifications of the G A R C H model that would be satisfactory. The first of these was the exponential G A R C H (EGARCH) model of Nelson (1991), set up to incorporate the asymmetric effects of returns on volatility noted by Black (1976) and Christie (1982), lncr~ = et 0 + 131 ln~rL, + OLlZ,_ l , (15) where z, = [ l e , I - (2/~r) ~/2] + S%. In this expression, e t is n.i.d. (0,1) so that EI%I = (2/a'r) ~/2, accounting for that choice of centering factor. Because of the use of both e t and let[, tr~ will be non-symmetric in e I and, for negative 8, will exhibit higher volatility for large negative %. The G A R C H / E G A R C H class of models make cr~ specific functions of xt_ j, but perhaps greater flexibility in these functional forms is needed. A useful way of characterizing the possibilities is given by Engle and Ng (1993). Fixing xt_ j, j = 2 . . . . . they consider the mapping between tr 2 and x t_ l, terming this the " n e w s impact curve". They point out that two broad decisions need to be made: about the shape and the position of such a curve. In particular, one needs to indicate whether it is symmetric or not and, if it is, what value for x,_ 1 it is symmetric about. Thus a G A R C H process has an impact curve that is symmetric around zero, whereas an E G A R C H model's curve is asymmetric around zero. Other attempts have been made to produce models with generalized news impact curves. Hentschel (1991) advocates the absolute G A R C H ( A G A R C H ) model, cr, = a 0 + a l l X ,_ iI + 13or,_ l, or, if one wants to allow for asymmetry and a non-zero centering, or,= a0 + ~ x , [ I x , _ ~ - b l - c ( x , _ l - b ) ] +~(r,_,. The parameter c accounts for the asymmetry in the news impact curve, and the value of b determines what point it is asymmetric about. A related model is the Q G A R C H model of Sentana (1991): tr~ = et 0 + ct l(X,_ , - b) 2 + [3crt2_ I. Strictly speaking this produces a symmetric curve around b, but not around zero, which has been the traditional point of reference. Some very general formulations have appeared which use the B o x - C o x transformation as a way of producing a variety of functional forms. Higgins and Bera (1992) originally wrote t r 2 = [ c t 0 +ct,(x2t_l)l/x] 1/~, (16) 42 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 which could be regarded as a B o x - C o x transformation applied to x , _ ~. They refer to this specification as NARCH. Ding et al. (1993) apply the B o x - C o x transformation to G A R C H models, adding in an asymmetry, and designating the model A - P A R C H (Asymmetric Power ARCH). One cannot be too optimistic about the popularity of this latter approach since B o x - C o x transformations have rarely been applied to the conditional mean, as there seem numerical difficulties in the optimization. Perhaps its main use would be as a specification test. In all the above formulations the non-linearity was taken to be continuous. Friedman and Laibson (1989) are also concerned with a non-linear response of g~ to past x t, concentrating upon whether ~r2 varies continuously with xt2_ ~ up to a certain threshold but is constant thereafter. In their application they estimate the model or2 = n o + eL,g( x2- 1) + [31°,2-- 1, (17) where g ( x 2_ ~) = sin [axe_ ~]if a x e _ , < ( w / Z ) and is unity if ax~_~ > (~r/2). Such a modified A R C H (MARCH) model has the characteristic of subdued volatility changes in response to sharp movement in returns. By inspection it is clear that the distributed lag connecting O't2 to X2_ I will tend to be shorter for large changes than for small ones, so that the effect of events like the crash of October 1987 disappear relatively quickly. When applied to the excess stock yield - the S & P returns less the Treasury Bill return - they find n 0 = 0 . 0 0 0 3 , n~ = 0.0084, [3~ = 0.6147, a = 67. Neither a nor n~ were significantly different from zero. Because the M A R C H formulation is symmetric in x,_ ~, one way to test for it is to estimate a G A R C H model trimming out returns whose absolute value exceeds some specified number 8. If there is no threshold effect the estimated parameters n 0, n ~, [3 ~ should remain much the same since the G A R C H model would then be correctly specified. However, if there is a M A R C H effect the G A R C H estimates for n 0, n l, [3~ should be different below the threshold than above. Kearns and Pagan (1993) applied this symmetric trimming idea to Australian stock return data, but found no evidence of a threshold as 8 was varied through a wide range of data. Threshold G A R C H (TARCH) models, in which n 1g ( x t - ~) = el +I x ( x t - J > c) + n T I( x t_ j < c), where I(- ) is an indicator function taking the value unity if the event in brackets is true and zero otherwise, have become increasingly popular - Glosten et al. (1993) and Zakoian (1994) being good examples. Hentschel (1994) allows for very general functional forms with the relation a~, --- n 0 + [3,o'tX_2, + n,o'l~ , g(~,_ ,), and it is easily seen that most of located within this specification by Non-parametric approaches are and Monfort (1992) suggest that might be approximated by a series the parametric forms described above can be suitable choice of k i and g(-). also to be found in the literature. Gouri~roux the unknown relation between a2 and x,_ of steps, i.e., dummy variables are added that A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 43 take the value unity within regions of possible values that x t_ ~ might take, but are zero outside it. Pagan and Schwert (1990a) use non-parametric methods to directly estimate the news impact curve, i.e., they compute E ( x ~ l x t_l). The major restriction on this approach is that the number of conditioning variables has to be small, and cannot reasonably be restricted to a few lags of x t due to the long memory in asset returns. Hence, in the concluding section of their paper they formulate and estimate a model in which crff is a G A R C H model augmented with Fourier terms cos( ket_ j), sin (ket_j), k = 1,2,3. This formulation is Gallant's (1981) Flexible Fourier Form, which has been used in the production function literature to model producer behavior. The presence of sine terms means that cr2 27 will be an asymmetric function of xt_ j. Other types of approximating functions would be possible, e.g., one might use Laguerre polynomials, since these are always positive. The third main characteristic of asset prices is their leptokurtosis, as reflected in the fact that their density has a very large peak around zero, i.e., there are far too many small returns compared to what would be found with a normal density. It seems as if G A R C H and E G A R C H models allied with the assumption that e t is N(0,1) fail to adequately capture this leptokurtosis. As an example, we fitted such models to the monthly stock return series, and then computed f ( 0 ) with data simulated from them, producing values of 0.46 ( G A R C H ) and 0.43 ( E G A R C H ) versus the 0.52 seen in the data. There is an obvious failure to match this feature. Another way of addressing this correspondence has been to examine the skewness and excess kurtosis properties of the X,//~ t. For the CRSP equity returns the normalized residuals have excess kurtosis of 1.46, while for the $ U S / S F rate Engle and Bollerslev (1986) found that the normalized residuals had excess kurtosis of 0.74. The main response to this failure has been to change the density of ~t from an .A:(0,1) to something else. Engle and Bollerslev made e t follow a Student's t-density rather than a standard normal, i.e., =F "~ ( 1-' "1- 1 ) I~ V [(P -- 2)] - 1/2[1 -1" Et -- This adds an additional parameter, v, the degrees of freedom associated with Student's t, which can either be pre-set or estimated along with the other parameters of the model. W h e n Engle and Bollerslev do this they find it to be around ten, and that the normalized residuals from this fitted model did not seem 27 Sine terms raise the possibility of negative estimates of trt2. One could avoid parameter regions that cause these by penalizing the log likelihood whenever parameter estimates are such that a negative O't2 was observed. 44 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 to have excess kurtosis. Other suggestions along these lines have also been made. Hsieh (1989a) and Nelson (1991) use the GED density f(~') exp(- i /Xl = x2(l+v-l)F( p - 1) ' where F is the gamma function and h = [ 2 - ( 2 / v ) F ( 1 / v ) / ( 3 / v ) ] 1/2 (when v = 2 this produces a normal density while v > ( < ) 2 is more thin (fat) tailed than a normal). Nelson estimates models up to an EGARCH(4,4) for stock returns and finds v = 1.576 for an E G A R C H (2,1) process. Lye and Martin (1991) use the "generalized t distribution" f(E t) = exp[01tan - I (et/~l) + 021og(~/2 + e~) + M i ~i=30iEtZ], where z is a normalizing constant. It is useful to look at the potential for success of this solution. To do that we will derive an expression for the value of the unconditional density of x t at the origin, fx(0). f~( x) = f f ( x, ,at) dcr, = f f ( x, [ cr,)f(a,) da, = f f , ( e = a[-'x,)a;-~f(crr)da,, so that fx(0) = f , ( 0 ) E [ a 7 l]. This result shows that manipulation of the density for ~t will certainly enable one to control to some degree for the leptokurtosis in fx, but that it is necessary that E[a, - l ] does not change as a result. However, estimators of the parameters in the density f,, and those in at 2, will generally be correlated, and therefore there may be offsetting effects. Although it is hard to find analytical expressions for how E ( 1 / a t) changes with the values of the parameters 0 contained in the function determining o',z, numerical experiments with a GARCH(1,1) model point to aE(1/at)/OO being negative, so that any increase in these parameters will tend to depress the peak of the density of x t produced by the model. In particular, there would be a conflict between the need to make [3~ + a j high to account for the strong dependence in the series, and the need to make it low to account for the leptokurtosis. If there is a positive correlation between f ( 0 ) and 0, where these are estimated quantities from some data set, then it may be impossible to reproduce the observed degree of leptokurtosis simply by varying the nature of ft. To illustrate this we fitted a GARCH(1,1) model to the monthly stock return data using both an ./Y(0,1) and a t-density for the errors. 28 The estimated degrees of freedom for the t-density was 7, and the parameters of the conditional variance function were very similar, except for the intercept, which was 0.8 X 1 0 - 4 with a normal density but 1.3 × 1 0 - 4 with a t-density. Thus, whilst the use of a t-density has raised f,(0), the expected value of crt has declined, and these interact to produce a value of f(0) that is very close in both models. 2s The program used was a beta test version of MICROFIT, Version 4. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 45 The conflicts just detailed hint that, even after allowing the density for % to deviate from normality and a use of general specifications for tr f, existing models may not be rich enough for the statistical modeling of actual data. Both Gallant et al. (1991) and Hansen (1994) find that there is evidence that the higher order moments of % also depend upon the past. Both propose models for this dependence. Gallant et al. do so by using their SNP approach to density estimation, which effectively makes the density of e t a polynomial in the past history of returns, while Hansen's A R C D (Autoregressive Conditional Density) models make some parameter of the density of e t, say the degrees of freedom parameter in Student's t-density, change over time. He finds that the degrees of freedom coefficient exhibits quite a deal of variation for some financial series, including the excess holding yield on US Treasury Bills. 3.1.2. The stochastic volatility class o f models Although the G A R C H class of models has met with substantial empirical success, one item hindering their widespread acceptance within financial circles has been the difficulty that their specification departs from the type of models encountered in finance texts. To some extent this is a consequence of the shift from continuous to discrete time, a subject discussed later, but it also reflects the fact that finance models see volatility as being driven be a process separate from returns per se. "Stochastic volatility" (SV) models have therefore arisen in which the E G A R C H functional form is retained but the process driving the conditional variance is made an n.i.d. (0,1) process that is independent of x t. Such a model has the format (18) log cr2 = ct 0 + 131 log (rLl + (rnrlt. Because it is a " t w o parameter" model the stochastic volatility model (ST) has the potential for replicating the type of autocorrelation function for xt2 seen in the data. Defining IX = et0/(1 -131) and v = t ~ / ( 1 - [ 3 ~ ) as respectively the mean and variance of log crt2, the autocorrelation function of x 2 is p1 = exp(21x) exp(v(1 + [3{)) (see Jacquier et al. (1994) for a very clear derivation of this result). It tends to decline a little slower than the G A R C H models. However, the SV model does not fare as well on the correlation between x 2 and x,_j. Since cov(x ,x,,) = E(+, j) = from the independence assumptions made about e, and ~7, the SV model predicts a zero correlation between x 2 and xt_ j. No transformation of either ~z or r b would change this situation, and the standard way to produce a correlation is to allow wit to be correlated with e t. If e t and "qt are jointly normal with correlation coefficient -,/ we can write T~t = "yEt "]'- ~ t , where ~, is independent of et, so that 46 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 3' 4= 0 m e a n s that o.2 b e c o m e s a function of ¢ , - i and so it yields the requisite o d d function. 29 A variant of the S V m o d e l that has been extensively used b y H a m i l t o n in a variety of applications, e.g., to exchange rates - Engel and H a m i l t o n (1990) - and interest rates - H a m i l t o n (1988) - is to relate 0.,2 to an u n o b s e r v e d state variable z, that can take either the value of zero or unity, with this variable e v o l v i n g according to the first-order M a r k o v process Prob [z, = I I z t - j = 1] = p Prob [ z , = OI z , - i = 1] = 1 - p Prob[z,=0[z,_,:0] =q Prob[z, =llz,_l=0] = 1-q (19) It can be shown by substitution that this scheme implies that zt is an A R ( 1 ) process z,=( 1-q)+pzt-,+~t, (20) where p = p + q - 1 and, conditional upon z , - ~ = 1, % = (1 - p ) with probability p = -p with probability (1 - p ) , (21) while, conditional upon z,_ ~ = 0, ~1, = - ( 1 - q) with probability q = q with probability ( 1 - q ) . (22) Defining o.2 = ~b i + ~b2 z, we m a y write o.? = po.2_, + (1 -- p)qb, + (1 - q)qb 2 + qbz'q,, (23) and, from (21) and (22), e(n,~ I z,_~) = q(1-q)+Sz,_, = q(1 - q ) -8qb,qb 2 ' + d02'8%2._,, where 8 = p ( 1 - p ) - q ( 1 - q ) , m a k i n g this model differ from a standard S V m o d e l in two ways: the innovations driving the conditional variance are a discrete r a n d o m variable with four states and the conditional variance of the innovations depends upon %2 ~. The ability to replicate standard characteristics is similar to the standard S V model. Most importantly, persistence in the autocorrelation 29 But at the expense of inducing a non-zero mean into x t since E,_ i(xt)= E t_ l(~r,e,)= E,_ ~(o-~,~,) =~O. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 47 function of the squares is governed by the size of p and, and to ensure that it is close to unity, one needs to have both p and q close to one, i.e., once a state is entered into there is little tendency to leave it. Hamilton's model has been applied to a variety of financial time series - to interest rates in Hamilton (1988) and to stock retums in Pagan and Schwert (1990a) and Sola and Timmermann (1994). The first of these is a particularly interesting application; there turn out to be two regimes, corresponding to the 1979-1982 years and the remainder of the sample, respectively. The three year period 1979-1982 coincides with the change in the Fed's operating procedures which led to very high volatility in interest rates. The other two papers allow for both shifts in the mean and variance of returns. Instead of viewing the ARCH and latent state models as separate one might attempt to link them together. A motivation for doing this comes from the high degree of estimated persistence in volatility observed after fitting either GARCH or SV models, a concomitant of the behavior of the acf of squares. Another explanation for this phenomenon might be that there have been shifts in the parameters of the underlying processes. It is well known that shifts in the mean of a random variable will tend to show up as a unit root when an autoregressive process is fitted, and one might extend this argument to x~ as well. Diebold (1986) and Lamoureux and Lastrapes (1990) maintained this interpretation and echoes of the theme may be found in Ding et al. (1993) and Diebold and Lopez (1995). As mentioned above, Hamilton's model can be used to allow the intercept in the variance equation to shift, but one might expect some extra autoregessive elements in it, even after allowing for that effect. Accordingly, Hamilton and Susmel (1994) and Cai (1994) consider switching ARCH (not GARCH) models in which the conditional variance is selected from a number of possible ARCH processes depending upon the state that eventuates. Such SWARCH models have been applied to stock returns by Hamilton and Susmel (1994) and to Treasury Bill yields by Cai (1994). 3.2. Estimating models of volatility Five methods of estimation for the above models figure prominently in the financial econometrics literature. 1. Generalized method of moments. 2. Maximum likelihood 3. Quasi maximum likelihood 4. Indirect estimation. 5. Bayesian estimation. Closely associated with the decline in computational costs has been a shift towards the use of the final two methods, and it seems likely that this will continue, although computer packages may well continue to feature the first three. 48 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 3.2.1. G M M estimation G M M estimation of A R C H type models has been implemented by Glosten et al. (1993), Rich et al. (1991) among others. To find G M M estimates one must have orthogonality conditions and Rich et al. use E{zt[ x ~ - tr~ ]} = 0, where zt are elements of F t_ i, e.g., x t_ 1, x2-1. However, there seems little advantage to using this estimator rather than the MLE. SV models were first estimated by method of moments - see Taylor (1986), Melino and Turnbull (1990) and Duffie and Singleton (1993) (the latter simulate the moments rather than deriving them analytically) - but after these first studies there has been declining interest in this way o f doing estimation. The moments used typically involved the first four moments of x t as well as the a.c.f.'s of x~ and Ix,I. In a recent Monte Carlo study Andersen and Sorensen (1994a) conclude that the results are sensitive to the number of moments used; if too many moments are used the sampling distributions of the estimators are adversely affected, even with quite large numbers of observations. Under certain circumstances difficulties can be experienced with the G M M estimator. To appreciate these suppose that there is a single parameter 0 to be estimated and the moment condition used for this is E ( z t u t ( x t ; O ) ) = O, where zt is an instrument. In the case of an A R C H ( I ) model one might set zt = x2t-i and v, = x 2 - et 0 - a i xtz- 1, and assume that only one of et 0 and a i is unknown. The G M M estimator of 0 solves Y~ztvt(xt;~)) = 0 and 0 - 00 = - ( S , z , v o t ) - l ( E z t v t ) , where = means that only the linear terms from the expansion of v,(xt;0) around 00 are retained, as is common in asymptotic analysis, and rot = Ovt/O0 at 0 = 0 o. 3o What makes for complications with the G M M estimator is clear from this expression: it is the ratio of two random variables that are generally not independent. In standard analysis T ~ / 2 ( ~ ) - 0 o) is asymptotically normal since T-IS_,ZtVot ~ P E(z,Vo,), which is a constant, and so the distribution is determined by the distribution of the numerator, Tl/2S_,ztv,. But what if E ( Z t O o t ) = 09. Then we would have 0 - 0 0 = - ( T - l / 2 Y ' z t v o t ) - ~ ( T - I / 2 S , ztv t) and both numerator and denominator would converge in distribution to normally distributed random variables with zero means. If the numerator and denominator were independent, this distribution would actually be Cauchy. Hence, the case when E ( z t v o t ) = 0, which corresponds to the situation where the instrument has a zero correlation with v0t, would be expected to produce a G M M estimator whose distribution departs substantially from that predicted by asymptotic theory. O f course, this polar case is too extreme, even if it is suggestive. Accordingly, Staiger and Stock (1993) have performed an analysis when E ( z t v o t ) is " l o c a l to z e r o " of the form ~T - I / 2 . This allows T ~ / 2 ( ~ ) - 00) to have a limiting distribution determined by the ratio o f N(0, var(z,v,)) to N(~, var(z~v0~)) random variables, which can also 30 if vt was linear in 0, say v~ = x2r - x~tO, then rot = - x~t and the GMM estimator would be the simple instrumental variables estimator. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 49 depart substantially from a normal distribution unless ~ is large relative to std(ztvo,). Simulation studies by Fuhrer et al. (1995), Kocherlakota (1990), Pagan and Jung (1993), Mao (1990) and Staiger and Stock (1993) have all found that the asymptotic approximation can be very poor when instruments are " w e a k " . All of this suggests that great care needs to be exercised when applying GMM estimators. Some guidelines for detecting cases in which there are likely to be problems are available, see Staiger and Stock (1993) and Pagan and Jung (1993), of which the most useful involves an examination of the "concentration ratio", a quantity related to the expected value of the denominator, and which can be computed from the R 2 of the regression of rot against the instruments zt if v0t was linear in 0. What to do about the problem is a much harder question. One might consider performing an iterated bootstrap in which the parameter estimates found with the data are used to simulate realizations which can then be used to study the properties of the GMM estimator. However, this requires one to be able to simulate from a known model for x t and that may not always be available. Fundamentally, one would like to make the denominator non-stochastic, i.e, to replace E z t v o t by TE(ztvot). For some applications of GMM which match moments, e.g., in SV models, this expectation can be found either analytically or numerically (since z, is actually unity). What makes it difficult to follow this strategy in many instances however is that the number of moments available, E(+(xt;0)) = 0, exceeds the number of parameters in 0. To use all available information, the larger number of moments are reduced to the correct number E ( ~ , ( x , ; O ) ) = 0 by forming ~t = W~bt, where W is a (dim(0), dim(~b)) matrix. Hansen (1982) showed that the best choice of W was W = [ E ( T -1 E ~ b J ~ 0 ) ] [ T - 1var(Y'.~b,)]- 1. It is clear that the resulting GMM estimator has the form 0 - 0 o = -(Y',WOd?JO0)-I(EW~bt) and, even if 8qbJ00 was non-stochastic, the denominator may become random if W has to be estimated from data. 31 3.2.2. M a x i m u m and quasi m a x i m u m likelihood Estimation of the unknown parameters can be done by MLE using f( x t i X t_ 1), where X t_ t = {Xo,Xl . . . . . xt- 1}- To see how this item determines the log likelihood, write the joint density of returns as the product of a conditional and a marginal density, f ( x , , X t_ 1) = f ( x t ] Xt_ ])f(Xt_ 1). Building this up for all T observations, gives the joint density : ( x, . . . . . xT) =f(xT I x , _ , ) f ( xT_, I xT_2) ... f ( xo), 3~ If the problem is such that artificial series can be generated allowing the estimation of W to any desired degree of accuracy, that approach might be used. In Monte Carlo studies such an option is available and Andersen and Sorensen (1994a) used the technique in their study of the GMM estimator of the SV model to show that its properties are greatly improved if the randomness due to W can be eliminated. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 50 making the log likelihood of x~ . . . . . x r equal to T L = 2~ log f ( x , I Xt-. ) + log f ( x 0 ) , (24) t=l and this may be maximized with respect to the unknown parameters. To estimate the unknown parameters of the ARCH process it is necessary to find f ( x t I Xt_i). But, if Ct is n.i.d. (0.1), it follows that the density of x t conditional upon X t_ l is ./V(0,ti 2) and (24) would be L=t ~=1{ - -~1 l o g 2 ' r r - -21 log 0- 2 - - 2cr1 Z2 x ~ ) + lo g f ( x o ) (25) After replacing crt2 in (25) by any of the GARCH type specifications discussed earlier, one can proceed to maximize L with respect to the unknown parameters in cr2. The main difficulty here is the final term in (25). But, because it is a single term, it is dominated (for large T) by the summation, and so it has generally been neglected. In fact little is known about the density f ( x 0) - if the x t are assumed to be strictly stationary, then f ( x 0) is the unconditional density for the series x r To date no analytic expression for this is available, but we can determine it experimentally once particular values of the GARCH parameters are specified. 32 As mentioned earlier models have also been estimated when f, is not normal. A useful way to characterize this search for flexibility in the density of ¢t is to observe that the "density score", dlog f ( ¢ ) = f ' ( ~ . ) / f ( ~ . ) , is - ~ for an N(0,1) density, i.e., it is linear in e. Different choices of the density for ~, amount to making the "density score" a non-linear function of e,. To appreciate why this is of interest observe that the log likelihood for any ARCH model is l T T - ~ E log tr E - Y'. log f ( ~ t ) , t~ 1 (26) t= 1 and therefore the scores for 0 will be O0 2,~=1 --~-1 t L~ f(~,) ] Ct+l , (27) Notice that this involves the unknown density score so that the various parametric approaches can just be viewed as alternative ways of estimating this quantity. 33 Another approach, used by Engle and Gonzalez-Rivera (1991), is to 32 Diebold and Schuermann (1996) have included log .t(x 0) in the maximization by forming non-parametric estimates of f ( . ) . For samples of 100 the difference is minor. 33 Nelson (1990b) argues that one needs to be cautious in following this strategy since there will be an interaction between the assumed density and the specification of 02. In particular, it is possible that the estimated ~i~ might imply thinner tales than the true cr~ would, since some of the thickness will be absorbed by the assumed density. We have encountered this phenomenon earlier when setting out an expression for f(0). A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 51 try to produce estimates of 0 that would be " r o b u s t " to the density function by replacing f ' ( e ) / f ( e ) in (27) with a non-parametric estimator of that quantity, i.e., to find a semi-parametric (SP) estimator of 0. Within the SP literature a distinction is made between the parameters of interest 0 and a set of " n u i s a n c e " parameters "q that may be thought of as indexing the possible set of densities, f(00,'q), with f(00,-q0) being the true density. Because is infinite dimensional analysis of the situation is more complex than in parametric problems where "q might (say) be the degrees of freedom parameter for the t-density. Nevertheless, it is the case that familiar propositions from the parametric literature tend to generalize to the semi-parametric case. In particular, it is well known that, to estimate 0 without knowledge of -q, it is necessary that E[a log f(00,-q)/ao] = 0 for all possible values of ~q, where the expectation is taken with respect to the true density. In the SP estimation context, if this condition holds 0 is said to be adaptively estimable. What are the prospects for this? It is clear from the definition of the score in (27) that adaption requires E[tr72(atr2/a0)] = 0. Taking the A R C H ( l ) model for illustrative purposes, and following the approach in Linton (1993), re-parameterize trt2 as ~r2 = e ~ (1 + to(x2_ ! - a)), where e a = 0% + toa, to = e-Set i and a = E[tr72 x 2_ i]/E[o't -2 ]. It is then clear that = - =o. Hence, the above results point to the conclusion reached by Linton (1993) and Steigerwald (1992) that neither the intercept s 0 nor the slope et 1 are adaptively estimable. These results parallel what is known about the possibility of adaptively estimating the intercept and slope coefficients in a linear conditional expectation, see Pagan and Ullah (1995, Theorem 5.5). As discussed there it is also necessary that the covariance of the scores of ~ and to be zero, but this can be shown to be satisfied by the choice of a in the text. However, there is one important difference to the conditional mean context: rarely is there much interest in the intercept in a conditional mean, but most uses of the conditional variance require tr 2 in its totality, i.e., the constant term as well as the A R C H part. Consequently, the quantity of interest, cr2, cannot be adaptively estimated. Again this is a familiar result from the SP literature; it is impossible to adaptively estimate a scale parameter. It is unwise to become too preoccupied with the possibilities of adaption. A more useful question is to ask whether a given SP estimator is "efficient". To answer this requires some definition of efficiency. In the parametric case this is provided by the C r a m e r - R a o lower bound for 0 of (I00 - 10oI~nl lo0)- l, where I is the information matrix. There is an analogue to this in the SP case that is termed the SP efficiency bound. Steigerwald (1992) and Drost and Klassen (1994) derive this bound for G A R C H models. Provided the density of e t is symmetric the estimator proposed by Engle and Gonzalez-Rivera attains this bound. Instead of trying to find the MLE when f, is unknown, it has been suggested 52 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 that one proceed as if ~, was n.i.d. (0,1), i.e., maximize the function - ( 1 / 2 ) E log ¢,~ ,_~ - (1/2)Ecr,~,2_ ,[x,z- E(x, IX,_ ,)]2 where cr,~t_, = var(x, I X,_ 1). The resulting quasi MLE o f 0,0QMLE, will be consistent and asymptotically normally distributed, provided the expected value of the scores under the normality assumption h a v e zero e x p e c t e d value u n d e r the true density, i.e., ~=g [ yl , t T l ( 3 0 " / 2 / 3 0 ) O ' t 2 ( e ~ - 1 ) ] = 0 . Clearly, this holds for any density. The major difficulty with the quasi MLE is that, whilst 0 is consistent, one has to be careful about the construction of the estimator of the covariance matrix. This should be var(0) = H - I V H - 1, where H = -E(O2L/O0OO ') and V = var(OL/00), which differs from the value, H -1, that would obtain if the assumed density for e, coincides with the true one. One can replace V with the "outer product" estimate Iv" -4.,a~a~_.g.~_x, ~ 2 -- 1) 2 SO that, in large samples, V ~Tt= I(i)Lt//OOXOL,/OO), _- - 7L.,~, ~ a0 a0 )~,et 1 2 -gH[E(~. t - 1)2], showing that the true covariance matrix is a multiple of the " f o r m u l a " variance, H -1 , determined by the fourth moment of %. The quasi-MLE standard errors have become widely used. In the discussion above it is assumed that the density of the innovations generating the QMLE is normal, but the widespread use of the t-density in estimation prompts the question of the properties of QMLE's based on other densities than the normal, Newey and Steigerwald (1996) deal with this question showing that, if both assumed and true densities are symmetric around zero, the parameters of the conditional mean are consistently estimated, while those in the conditional variance are consistently estimated only up to a scale factor. They also discuss how to modify the QMLE if symmetry does not hold. Estimation of f ( x , l X,_ i) for SV models is very demanding computationally. To see why, observe that, since the SV model has ¢r2 following a Markovian process, = f ( x, I X,_,) = f~,, f~, , f(%,cr,_,,x, ] X,_ 1), (28) and it becomes necessary to perform numerical integration at each observation in order to evaluate the likelihood. In the case of Hamilton's specification the burden is lessened by the fact that crt2 is discrete, allowing the bivariate integration to be carried out by a dual sum over the four possible state values. Of course, this integration first requires that the density f ( % , % _ ~,x, I X,_ 1) be found, but this can be done recursively from the following relations between conditional and unconditional densities: f(%,crt_,,x,[ X , _ , ) = f ( x , [ ~ r t , c r , _ , , X , _ , ) f ( % , % _ 1 [X,_I). (29) 1)" (30) f(cr, ,or,_ 1 I X,-I ) = f ( % I ~r,-1 ,X,_~ )f(o',_ l ,cr,_ 2 I X,_ f(%,o-,_. IX,) = f ( ~ t , t r , _ , ,x, IX,_ t ) / f ( x , I X t_ 1)- (31) Beginning with some initial density f(~t,cr0] xl) we can apply (30), (29), (28) and A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 53 (31) in that order to find f(tr2,trl I X2), whereupon the process is repeated. 34 Since f ( x t I o't,tr t_ l,Xt_ 1) and f(tr, I crt_ l , X t _ 1) derive from the model the circle is closed. The problems o f having to perform numerical integration at each observation has dampened the enthusiasm of investigators for exact M L E (except in the case of H a m i l t o n ' s model). Some progress has been made in designing efficient ways of performing the numerical integrations, of which the best method seems to be the accelerated Gaussian importance sampler of Danielsson (1994) and Danielsson and Richard (1993), and more can be expected. Until recently therefore, the major way of estimating the SV model was by QMLE. To implement this method, square x t to get tr t2Et,z and then take logs, thereby summarizing the SV model (18) in the state space form, log x t2 = k + log cr2 + ~, log tr 2 = ct 0 + 131 log tI 2_ t + trn'qt, where k = E(log E2) and ~t = log ~ 2 _ k. If ~t was normally distributed, and ¢t was independent of "q~, then f ( x t l X t_ l) could be found directly with the Kalman filter. 35 W e can ignore the fact that ~t is not normal and proceed as if it was so, i.e., use a quasi MLE, and this approach was suggested by Nelson (1988) and implemented by Harvey and Shephard (1993) and Harvey et al. (1994). 36 Its major defect is that it can be very inefficient, since the distribution o f ~t lies far from a normal density, having a very long left hand tail. Computationally, problems can arise when % becomes small, as this makes log ~.2t v e r y large, and this " i n l i e r " problem was dealt with in Harvey and Shephard by effectively trimming out these observations. Drawing on ideas in Shephard (1994), K i m and Shephard (1994) show how the formulation above may be adapted to perform exact maximum likelihood. The idea is to replace the ~t with a mixture of m N ( i x i , v i ) , i = 1 . . . . . m , random variables whose probability of occurrence is ~i. Stochastically this can be thought of as making ~t a function of a discrete random variable co, taking values i = 1. . . . , m, so that f ( ~ t ) = ~-,i% I"lTif(~t ] (Dt) = ~.mi= IqTiN(J'Li'Vi )' e.g., if co t = 2 is realized then we would draw ~t from an N(l~2,tr 2) random variable. As K i m and Shephard point out the weights xr i, Ix~ and v~ can be determined once m is 34 One possibility is to set f(trt,% I x~) to the unconditional density f(tr0.~); another is to estimate it. Hamilton (1989) uses the former whereas Hamilton (1990) proceeds under the latter assumption. 35 In the case of Hamilton's model the error term in the state dynamics equation has a time varying conditional variance that depends on unobserved quantities, as E(-q21 zt_ t) depends on try_ t- While the Kalman filter allows for the conditional variances of the errors to vary in a known way with the past history of x~ it does not allow them to depend on the past unobserved states. 36 Ruiz (1994) suggests that the QMLE is more efficient than the GMM estimator of the parameters of SV models but Andersen and S0rensen (1994b) demonstrate that this conclusion is incorrect. 54 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 selected by finding what parameter values would make ET'= I'l'tiN(P°i,vi ) a s close a fit to the density of the log of a X2(1) as possible, and these weights are given in their paper for m = 7. 37 What makes this idea ingenious is that, conditional upon a realization Co= (to I . . . . . tOr), the random variable in the observation equation is ~tl Co, and this will be normal. Hence the Kalman filter can be applied. Of course, this immediately raises the issue of where Co comes from. Defining 0 2 = (tr~ . . . . . trr~) we could obtain a set of realizations on Co if we could simulate from f(02,Co I Xr), but that can be done with the Gibbs sampler, for which we will need f(~2 I Co,XT) and Pr(Co 1~2,XT). The first of these, sampling from the Kalman smoother, is accomplished by the algorithm in De Jong and Shephard (1993), l e a v i n g the task o f getting r e a l i z a t i o n s from Pr(~ I~Z,Xr) = Iltr= ~ Pr(to, I tr2,10g xt2), and this can be done using Bayes theorem, culminating in the requisite density as being proportional to II~r= l f(log x2tl~,~t,trtz) Pr(c%). 38 Kim and Shephard report that the method works very quickly. 3.2.3. Indirect estimation A method of estimation whose popularity has grown in near exponential fashion since its introduction in 1992 has been that of indirect estimation. To understand the nature of this method it is useful to consider the simple example of estimating the MA(1) coefficient et Yt = et + ere,_ l, (32) where e t is n.i.d. (0,1), by using the AR(1) Yt= ~Yt-1 +ut" We assume that u t is n.i.d. (0,tr 2) and write down the sample scores for 13 as T - 1o'-2 Y'.y,_ l( y I - [3yl_ i). Equating these to zero gives the OLS estimator ~ Since this model is mis-specified it is known from the theory of estimation in mis-specified models that the estimator [3 converges to the pseudo-true value 13" defined by EMA[ Yt-1( Yt -- 13" Y , - l ) ] = 0, (33) 37 It is not necessary that ~t be taken as an N(0,1) random variable as one can approximate the distribution of ~ - E(~ 2) by a mixture of normals if ~t follows other densities. Mahieu and Schotman (1994a), in estimating an SV model, allow ~ i and v i to be free parameters. as The constant of proportionally is found by summing f(Iog x~ J to t = i,tr 2) Pr(to r = i) over all i=l,...,m. 39 Both the quasi and exact M L E ' s use the state space format for squared returns but this will only appropriate when ~t and "qt are independent, and it has already been argued that the cross correlations between x~ and x~_ i make this assumption suspect. When one allows for the correlation it will be necessary to estimate that parameter, but there is no information in x~ to do so. Harvey and Shephard suggest that one use both x~2 and the sgn(x r) for estimation. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 55 where EMA means that the expectation is taken with respect to the true model (the DGP). In this instance it is easy to solve for 13", producing [3* = EMA( Yt Yt- l)//EMa( YL I) = Or//( 1 + Or2), and this shows the dependence of 13" upon a . As the value of et varies so will [3", and we might write this dependence as [3 = ~b(et), a quantity that has been termed the "binding function". This further suggests that we might estimate ot by using such a relation, i.e., we could solve T-I ~Y~-I Yt 1 -t-6 2yl-I =0, to get an estimate of ~. Standard properties for 6 of consistency and asymptotic normality follow from ~ = qb(6) and the fact that ~ is consistent and asymptotically normal around [3 *. In this instance we know that 6 has these properties from the times series literature, as factorizing the autocovariance function was one of the earliest ways of estimating an MA(1) - see Durbin (1959). In this simple example we could find the binding function analytically, whereas in most instances this will be hard to do. However, all we need to be able to do is find the value of [3 * ,~* (~), that is associated with any given value ~ through the binding function, and to then choose a value of ot that implies a [3 * that equals ~, i.e., we wish to choose a value of or,6, such that ~ = 6 / ( 1 + 62). Now, because 13" is defined from (33), it is clear that the value of 13,~*, given by 1/MFYm=l 1/TY"rt= 1 .v,~,- l(Y,~t- ~* Ymt- l) = 0, where the .vmt are values simulated from the MA(1) model at the ruth replication, with ot set to ~, m = 1. . . . . M, will converge to ~* (U) as m ~ ~. Thus we might m i n ~ ( ~ - ~* )2; the value of • changing as new values for et are tried. At the minimum value we have the "indirect estimator", 6. It will not exactly equal the 6 found from fi = qb(6), unless M was infinite, but we might expect it to be close. If there are more elements in [3 than in et the distance function between [3 and fi* needs to be modified to ( f i - ~ * Y V ( f i - ~ * ) , where V is some weighting matrix. This describes the original proposal by Gouri6roux et al. (1993). A related method of indirect estimation is due to Gallant and Tauchen (1992), who would find the value of et that minimizes [ 1 / m ~ = ~ 1/T ~ L I Yrat-- 1.( Yrnt -- ~ Ymt-- 1)]2; this works since, as m ~ c , the function tends to [ ( ~ - J 3 ( l +~2))]2, which is minimized by setting 6 = ~(1 + 62). 4o Therefore, computer simulation of the model we are ultimately interested in estimating, along with actual estimation of an entirely different model, will produce consistent and asymptotically normal estimators of the parameters of interest, 40 Although Gallant et al. (1994) refer to their method as "efficient method of moments" we feel that the descriptor used by Gourirroux et al. is sufficiently evocative to justify calling all methods operating with this principle by that name. 56 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 The ideas described above generalize. Think of the model to be estimated as having some density for the random variable Yt, g(Yt;Ot), that is characterized by unknown parameters ct, while the auxiliary model used for estimation purposes satisfies a set of moment conditions 1/T y,r I~(Yt;~)= 0. Then the pseudo-true value of 13 solves Eg(1/T Y'.tr=I~(Yt;13* )) = 0, and the estimator of a that we seek is what solves this set of equations. To make this operational, we replace E~ by an average using observations simulated from the model with a given value of et, and replace 13* with a consistent estimator of it, [~, leading to the indirect estimates of et being determined by min,~ [ 1 / M Y'.~=, 1/TZ~=I~(~r,t;~)]'V[1/MZ~=, l/TY'.rt=l~(~,,t;~)]. In this formula Y,~t are simulated values from the model described by g(y~;ct) for given values of ct, and V is some weighting matrix. As this can be thought of as a GMM estimator, the literature on ways of selecting V is relevant; for a good review of these see Den Haan and Levin (1994). Obviously, the success of the method depends on a number of factors. First, there must be some connection between 13" and a , i.e., varying values for c~ must show up in terms of changed values of 13*. Consequently, there would be no sense in trying to estimate parameters relating to a conditional variance of Yt from the conditional mean, unless these were connected in some way. Second, the choice of the auxiliary model 6(Yt;13) (or score generator in Gallant and Tauchen's terminology, since they normally select 6(Yt;13) as the scores of some model) is important for the efficiency of the indirect estimator of ct. In terms of our simple example, the estimator of the MA(1) coefficient using an AR(1) as the auxiliary model is known to be very inefficient. Using a higher order AR, e.g., an AR(3), should improve the properties of the estimator, a result established many years ago in time series analysis. Clearly, what we are seeking in ~(y,;13) should be the best statistical representation of the data that is possible. It is important to stress in this connection that the auxiliary model may be a mis-specified one. Although it is computationally very demanding, the procedure is extremely attractive, particularly if it is hard to find MLE's of ct but it is possible to find some representation of the data that may be easy to estimate. A good example is the stochastic volatility model. It is simple to generate data from the model once parameter values are selected, and it is known that GARCH models can be found that give reasonable fits to the data, making it natural to select these as candidates for an auxiliary model. Engle and Lee (1994), in estimating an SV model with non-zero correlation between e, and r b, found indirect estimates using a GARCH(1,1) model with an asymmetric effect of returns on volatility as the auxiliary model. One might think that the scores of an EGARCH model would be a better candidate for 6(Y,;13) in this instance, since it has a similar functional form to the SV model. In perhaps the most comprehensive study of the indirect estimation of the SV model, Gallant et al. choose their SNP method of approxi- A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 57 mating densities to derive ~(. ). Because dim(J3) therefore greatly exceeds dim(a) it is possible for them to assess the adequacy of the SV model as a representation of the daily stock data used in this survey by testing how close the scores Etr l~(yt;[3*(dt)) are to zero. They find that the SV model is soundly rejected unless the density of "qt is allowed to take a very general form and there is also a very flexible specification of the way in which tr~ maps into "tit. 3.2.4. Bayesian estimation methods Bayesian methods of estimation are becoming more common in financial econometrics. One reason is that the parameters 0 are regarded as random variables. Therefore, they can be treated as extending the set of latent variables tr t, and posterior densities for both will be simultaneously obtained. Some see this as a major advantage, i.e., instead of first determining 0, and then using this to construct a point estimate of tr t, the complete density of the latter is found. As is well known it is easy to write down the posterior for 0, given the data, in terms of the product of the likelihood and the prior density, but rather harder to give this an analytic expression. Fortunately, methods have become available that enable simulation of realizations from the posterior without explicitly knowing its form. Generally, these methods fall under the heading of Markov Chain Monte Carlo methods, of which leading examples are the Gibbs sampler and Metropolis-Hastings algorithm. Surveys of this technology are available in Chib and Greenberg (1994a) and Chib and Greenberg (1994b). Jacquier et al. (1994) have estimated the SV model with a variant of the Metropolis algorithm, while Geweke (1994) has estimated both GARCH and SV models with the same approach. Geweke also shows how the prediction error decomposition (24) for the log likelihood may be used to easily compute posterior odds recursively, and, based on an examination of these, concludes that the SV model is superior to a GARCH model for an exchange rate data set. Albert and Chib (1993) estimate Hamilton's model by these Bayesian methods. 3.3. Parametric models and moment existence Estimation experience with GARCH models is now quite extensive. One of the most striking features has been the fact, documented in Bollerslev et al. (1992), that the model parameters [3j, a j frequently sum to unity, i.e., for a GARCH(1,1) process [31 + a i = 1. For EGARCH models the sum of the autoregressive coefficients on lagged values of log tr 2 can also be close to unity, e.g., Nelson (1989), Pagan and Schwert (1990a) and Keams and Pagan (1993). One must therefore enquire into the consequences of this feature for estimation. Let us begin with an EGARCH process that has the autoregressive coefficients summing to unity. This means that log crf has a unit root. Without an intercept therefore log cr,z would evolve as an integrated process; with an intercept it would be integrated with drift. In the former case it is known that T-2 ~T/=I log errz would 58 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 tend in law to a random variable, so that the log likelihood for such a process must eventually tend to T - 2 E r = , log (yf, since the other term T-ZY'xrTZx f = T-2y'.e~ = T - 1 ( T - I E e 2, ) ~pO as T - l ~ e ,2L 1. Thus the estimation problem is not well defined and we could not expect that M L E ' s obtained by maximizing (25) would have standard properties. If there is an intercept, T - 2 E l o g ( r 2 would tend in probability to a constant that would not depend on the E G A R C H parameters, other than the intercept, and again this would create estimation problems. Hence, it appears that an E G A R C H process with unit roots would have to be avoided. Therefore the finding of near unit root behavior has to be very disturbing. Turning to the GARCH(1,1) process in which ct~ + 131 = 1, termed an Integrated GARCH(1,1), or IGARCH(1,1) process by Engle and Bollerslev (1986), the situation is much less clearer. Assume that •t is i.i.d, with a symmetric unimodal density, monotonically increasing from - ~ ¢ to zero and monotonically decreasing from zero to infinite, with zero third moment and finite moments up to order six. Nelson (1990a) showed that, if E[fn(et 1•2 + 131)] < 0, 1. For the IGARCH(1,1) model, when the intercept in the equation for err2 was zero, (r2 °~'0. 2. For the IGARCH(1,1) model, with a non-zero intercept in the equation for (r2, cr2 has a strictly stationary and ergodic limiting distribution. 3. For the IGARCH(1,1) model, E(crt2k) < ~ if E[(a l•t2 .jr_ 131)k] < l. 41 Now, from (3), putting k = 1, gives E ( a l e ~ + 13~)-- ct I + 131, and therefore when the process is I G A R C H (a~ + 131 = 1), E((rf) does not exist (this is not surprising given the solution for E((r 2) in the GARCH(1,1) case). Now E((rt) < {E(o't2)} I/2 by Jensen's inequality, i.e., E[(ot i • 2t -k- 131 )1/2 ] < [e(al•:, + 131)P/2 = 1 when ctl + 131 -- 1, and by (3) this guarantees the existence of the fractional m o m e n t E(o't). Consequently, when one looks at the log likelihood for an I G A R C H process, T-1 y, log (r, will converge to E[log (rt ] by the ergodic theorem, while T-lEcr72e 2= T-1E•~ tends to unity. Therefore, unlike the E G A R C H process, the log likelihood of an I G A R C H process is well behaved asymptotically, and it is not surprising that the MLE estimators of the I G A R C H parameters will be asymptotically normal, converging at the standard rate of T 1/2. Lumsdaine (1991) and Lee and Hansen (1994) give formal proofs of this fact; the latter under the conditions just cited, the former under the assumption that moments of • t up to the 32nd order exists. The I G A R C H process is therefore one for which the variance of returns does not exist but for which returns are a strictly stationary process (the latter follows from x t = tr~et, and this is the product of two strictly stationary processes), i.e., the IGARCH(1,1) process is strictly stationary but not covariance stationary, whereas an IEGARCH(1,1) process would be neither covariance nor strictly 4~ Results for GARCH ( p , q ) models are available in Bougerol and Picard (1992). A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 59 stationary. Accordingly, because an IGARCH process is strictly stationary, it is somewhat misleading to attach the label "integrated" to it. This is not to say that it does not possess some of the characteristics of an integrated series. Using x t = et0- t it is possible to re-write (13) as 0-2 =Or. o "t-[[31 "1-OtlEt_ l(@t2 l)]O't2_ 1 + 0¢.1[F.2_ , - gt_ l(Et2_ 1)]0- 2_ 1, whereupon E l_ 1(0-L 1) = (~-0 "[- ([31 "~ °£1)o'L , and E,_ 1(o-2) = (3/.0 -q- 0-LI if the process is IGARCH, and this would also be true of an I(1) series with drift. Another important feature of Nelson's results is that there must be an intercept in the variance equation, otherwise 0-2 will be degenerate for large t. Thus removal of an "insignificant" intercept would be rather unwise if the IGARCH process was to be used for forecasting. It is also important to observe that one cannot "start u p " 0-2 with a o / ( 1 - a I - 131), as in the GARCH(1,1) case, any longer. Nevertheless, setting the initial value %2 to zero will mean that it is effectively incorporated into the intercept, so that the estimated intercept really has two components. It may well be that a small intercept is found if an IGARCH process is estimated with a GARCH program because the initial value will be set to larger and larger numbers as the [31 + et i sum tends to unity. 3.4. Specification tests for ARCH models A large range of models have been mentioned above as being useful for the modeling of financial series. Each model differs from others in being able to replicate some particular characteristic of financial data. Consequently, it seems clear that one would want to develop some methods for assessing e x a n t e which models would be most useful in capturing the behavior of any particular series, as well as determining e x p o s t how successful they have been. This is the realm of specification tests. There are two philosophies that might be adopted when addressing issues of the appropriateness of any given model. One possibility is that of matching "stylized facts", i.e., a comparison is made of certain characteristics of the series being investigated with what is implied by the fitted model. At first glance this approach to model evaluation may not appear to be very effective, but there are many examples in which a simple stylized fact has proven remarkably effective in weeding out poor models. We have already encountered an example of this in the form of the asymmetric relation between stock volatility and the level of returns. As mentioned previously, this is a stylized fact that has proven to be very influential in delineating the class of models needed to fully describe stock returns. A second example is the " j u m p " behavior of recursive variances. Unless a model is capable of replicating this feature one would be concerned about adopting it. A third example would be the ability of models to replicate the strong peak seen in the density of returns. If U.S. daily stock return data from 1835 to 1987 is used, the estimates of f(0) are 0.63 (data); 0.46 (GARCH); 0.40 (EGARCH); and 0.45 60 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 (AGARCH); all of the models producing substantial understatements of the f(0) seen in the data. A more conventional approach to the evaluation question is to postulate an alternative specification to the one being examined and to construct Lagrange Multiplier (LM) tests for whether the alternative model is to be preferred. Of course, such a test can also reveal inadequacies in the maintained model even if the alternative is not correct, i.e., the diagnostic test can have power against alternatives other than the one specified. Let 0 be the set of parameters in both the alternative and null models, where a sub-set of 0 being set to zero produces the latter. When ~, is N(0,1) it is easily seen that the scores for 0 are ' E(x,-E0(rt2/00(e~- I) (or I E f f t 2 O f f 2 / O O [ - - ( f t ( E t ) / f ( ~ t ) ) E t - - 1], when e t has density f(e)). Hence the LM test that the maintained model is correct is just based on TR 2 from the regression of ( ~ - I) upon 0 =2 00,2/00, where the hats indicate that 0 is replaced by the MLE of 0 under the null hypothesis. In the special case when the null hypothesis is that there is no ARCH, 0~ = 0 2 and = x J & . Thus, if this is the null, and the alternative is an ARCH(1) process, (rt2=ao+Ct,x)_,, then ( ~ - 1 ) = ( x , 2 / O 2 - 1) is regressed against ~ - 2 00t2/0ao ( = 0 - 2 ) and 0 -2 00,2/00L! ( = 0 - 2 x 2 t _ l), which is the same as the regression of x~ upon an intercept and x 2_~, i.e., the test used previously to assess dependence in returns. 42 If instead we are checking that the model is ARCH(2), when A R C H ( I ) is maintained, then (~2= (~0 +&lxt2-t,00~/0Cto = 1,002//0Ot l = Xt2_ 1,01~t2//00t. 2 = Xt2_2 and the regression is now ~:2 = ~ t - 2 x t 2 against unity, E^2t - I and Et--2" ^2 The central problem therefore is to determine a suitable alternative model. Mostly, the proposals have related to different models for the conditional variance. Engle and Ng (1993) suggest that the extra terms be dummy variables of either an additive or a multiplicative nature, e.g., a dummy variable that takes the value unity only if x t_ t is negative would imply the possibility of an asymmetric response if we try to add the product of it with x t_ ~ to the existing model. Pagan and Schwert (1990a) argued for a non-parametric approach, approximating trt2 by a G A R C H process along with Fourier terms. They then tested the significance of the Fourier terms. There are clearly many alternative ways of generating diagnostic tests, varying by what we choose to add. Instead of trying different types of non-parametric approximations, an alternative is to derive tests based on a very general parametric model. If one derives an LM test for the presence of the jth lag in O't2 from the A G A R C H model discussed earlier, then it involves the regression of xt2 against unity, x2_j, Ixt_jl, x t _ s, and x, lxt_j[, followed by a test that T times the R 2 from this regression is zero. This result is easily established by looking at the derivatives of (rt2 with respect to the 42 Lee and King (1993) point out that, since ct I >_0, the LM test can be improved upon by taking account of this sign information. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 61 unknown parameters of the A G A R C H model. A different perspective on such test procedures is to be had by recognising that the LM test is testing if E[~r72(8crff/80)(e~- 1)] = 0, and this moment restriction holds under the null hypothesis because (e~ - 1) has a zero expectation with respect to any function of past dated random variables. One of these is that selected by the LM test viz. (rtZ(Str,2/80). Pagan and Sabau (1992) proposed that other functions might be used, in particular those that were related to tr if, since an estimate of this variable was available under the null hypothesis. They applied the test to a number of applications and found that the fitted models were rejected. Nelson (1989) also provides tests of this nature. It should be noted that all of the tests described above require that et be N(0,1). If the density f ( e ) has some other form, then e ~ - 1 is replaced by ~t = -[f'(e)/f(e)]e~1. However, since f ( e ) will typically involve some parameters that need to be estimated, e.g., the degrees of freedom parameter if f ( e ) is Student's t, it is not generally true that one can regress ~t against quantities such as trt-2(Oty~/a0) to produce the requisite test statistic, since the fact that the density parameters have been estimated means that some allowance needs to be made for this effect. To analyze this problem it is useful to note that all the tests discussed in this section are testing a moment condition of the form E[ ztd~,] = 0, where ~bt is e ~ - 1, etc. and are therefore based upon testing whether Y'.ztd~t is zero, In order that one can ignore the fact that ~t involves estimated parameters, it is necessary that E[ zt(O+t/OO)] = 0, and this will rarely be the case. To account for this dependence one needs to follow Newey (1985) and Tauchen (1985), regressing z,d~t against an intercept and the scores for 0 and testing if the intercept is zero. 43 The procedure is explained in Pagan and Vella (1989). Note that d?~ may also depend on estimated parameters if the conditional mean depends on A R C H parameters, as occurs with the A R C H - M models discussed later, so that a similar adjustment needs to be made for tests in that context. If one just considers whether the model might be G A R C H or E G A R C H , it is clear that the two formulations are not nested, owing to the use of the logarithmic transformation and the fact that le,[---lcrt-~xtl and e t = ( r ; - l x t are the driving forces in the E G A R C H model. Pagan and Schwert (1990a) compared the log likelihoods from both estimated models, and one could modify this criterion along the lines of Schwarz (1978) to reflect the extra parameters in EGARCH. Going beyond such simple comparisons, there are a number of ways that one might effect a non-nested comparison. The simplest is through "artificial nesting". To perform this, define tr,zc and tr 2e as the G A R C H and E G A R C H conditional variance specifications respectively, and set up the expanded model % 2 = ~ r ~ c + ~O.t2E, leading to an LM test that ~ is zero (or unity). Testing if ~ is zero, i.e., testing for 43 Typically zr will involve estimated parameters so that the scores for these would also need to be added to the regression. 62 A. Pagan~Journal o f Empirical Finance 3 (1996) 15-102 the superiority of the GARCH model, would involve taking T times the R 2 from ^2 upon unity, ~[2c ao.tZc/a0, the regression of the standardized GARCH errors e, and ~ 7 2 c d 2e. To fully account for the non-nested character of the models is much more difficult, as it is hard to evaluate what the expectation of a quantity such as the likelihood for one model would be if the other is correct. One way of doing this is by computer simulation, just as for the choice of linear and log linear models in Pesaran and Pesaran (1993), i.e., one could estimate both models from the data, simulate x, from (say) the estimated GARCH model, compute whatever criterion was being used to compare the models, and find its empirical distribution by performing enough replications. 44 Kearns (1993) reports an application of this idea. Apart from the issues of appropriate specification of the conditional variance function there has also been some concern over the stability of the conditional variance coefficients. Lamoureux and Lastrapes (1990) argued that the fact that parameter estimates seemed to indicate an IGARCH process might simply be symptomatic of instability in GARCH parameters, and they split the sample to test this hypothesis. Lee and Hansen (1991) provide a test of stability of the GARCH coefficients using Nyblom's approach (Nyblom, 1989). This is essentially an LM test of the hypothesis that the coefficients follow a random walk versus the alternative that they are constant, and involves the cumulative sums of the scores with respect to the GARCH coefficients thought to vary. The statistic used is L* = g,r__ 1(r.'j= , d o #• V - ' (Ej=' l d0,j), where do. t are the scores for the GARCH parameters, and V is an estimate of the variance of do, t. Hansen (1990) gives tables for the distribution of L* under the null hypothesis that the coefficients are constant. Hansen (1994) also applied the test to whether the degrees of freedom parameter of the t-density varies as a way of determining if an ARCD model is necessary. Chu (1995) also provides a test for the stability of GARCH parameters based upon the summation of the scores over sub-samples. 3.5. A R C H models and diffusion processes GARCH models have been fitted to data with a wide range of frequencies, and it has been noticed that, while the GARCH effects seem to disappear the longer the sampling interval, they become more intense for very short intervals, in the sense that 13~ + ~ 1. Diebold (1988) showed that, if one aggregated GARCH processes over time, as the sampling interval became large y, tended to resemble a normally distributed random variable - see also Diebold and Lopez (1995). Nelson (1990c) and Drost and Nijman (1993) represent the most comprehensive 44 In an interesting paper W e s t et al. (1993) c o m p a r e different models for the conditional variance o f e x c h a n g e rate returns through the utility gained b y an investor as the different models are used to determine portfolio allocations. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 63 work on the general topic of the effects o f changing the sampling interval upon the nature of models for conditional volatility. Drost and Nijman focus upon the A R M A(1,1) process in (14) and study what happens to it as one aggregates over time. They distinguish between three types of G A R C H processes, depending upon the nature of the errors e t and v , - strong G A R C H if ~t is i.i.d.; semi-strong G A R C H if e, is a martingale difference; and weak G A R C H in which E ( v , x ~ _ ) = 0 for j > 1, r = 0,1,2. Only weak G A R C H aggregates to a process o f the same type. In particular, if one tries to aggregate strong G A R C H , the equivalent innovations to e t in the aggregated form will not be i.i.d. 45 Once one thinks of questions of aggregation of G A R C H processes in terms of aggregation o f an A R M A process it is clear why one gets some of the outcomes noted above. Consider an AR(1) for a stock variable Yt with Yt = 13 Yt- 1 + et, where e t are uncorrelated. If this is observed at intervals m t ( m > 1 being an integer), then Y,,, = 13"Y,,~t-~) + ~,,t, with ~,,t being uncorrelated with Y,,u- ~ ) , ' " , leading to the results that, as m ~ ~ , y, becomes uncorrelated, while, as m ~ 0, 13" ~ 1 and so a unit root tends to appear in the process. In the analysis above, the G A R C H model is assumed to hold for some time interval and a study is made of what happens as the time intervals of observation expand or contract. Many models developed in the theoretical finance literature express the evolution of financial variables in continuous rather than discrete time and sometimes this is very important to the solution of those models, e.g., in the B l a c k - S c h o l e s options pricing model. There are also some conceptual advantages to working in continuous time. For example, information about the prospects of companies continues to arrive when trading in shares is suspended, and this has implications for the modeling of volatility on a daily basis. For example, one would expect share trading on a Monday to be different to a Tuesday, because a much longer interval o f time has elapsed since the last trade, and hence there has been a longer interval for information to accumulate. Thus an investigation of the relationship between continuous and discrete time models seems important. Most of the continuous time processes used in finance are diffusions with the format + (TtY~t d~/Vit (34) d l o g e 2 = a o d t + (13, _ 1) log ~r,2 dt + cdW2,, (35) dy r = adt + bytdt where dW~t and dW2t are correlated Brownian motions. If the diffusion process had non-stochastic constant volatility, i.e., it was (34) only, it was possible to find a form for the density function that could be used to set up the likelihood and hence produce M L E ' s o f the unknown parameters. Numerical integration is 45 This suggests that it would be illegal to perform MLE on a series with the same assumptions about ¢, for a number of different sampling intervals. However, based on simulation studies, Drost and Nijman conclude that "'the asymptotic bias of the QMLE, if there is any, is small" (p. 922). 64 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 involved to get the density in most instances however. Moreover, it seems unlikely that one can assume that volatility is non-stochastic with financial data, so that other ways of estimating the parameters need to be explored. There are many ways to approximate stochastic differential equations - see Kloeden and Platten (1992) - o f which the simplest is the Euler scheme, which discretizes the equation using an interval of length ht (h <_ 1) to get 1 A yth = ah + bhy(t_ I)h + crr,~Y~hh~-e' A log o,2 = ¢x0h + (13, -- 1)hlog O ' (2t - (36) l ) h ql_ ch~.qt, where e, and -q, are n.i.d. (0,1) random variables. As h ~ 0 this approximation should tend to the continuous time one. O f course, data on Yth is not observable unless h = 1, and setting h = 1 produces a very coarse approximation that can result in substantial biases in the estimators of the parameters o f the diffusion. However, as Duffle and Singleton (1993) pointed out, one can simulate from (34) and (35) by an approximation such as (36), and thereby compute the moments of the continuous processes, which in turn may be used to match the moments o f the data via a G M M estimator. 46 Alternatively, one could use indirect estimation methods, simulating from the continuous time process, but estimating using a discrete time auxiliary model. Gouri6roux et al. (1993) use the Euler approximation with h = 1 as the auxiliary model and h = 1 / 1 0 to simulate the data, while Engle and Lee (1994) use G A R C H as the auxiliary model. 47 A n interesting question that arises when one attempts to estimate models closer to the finance literature is the treatment of the parameter "¢. Most investigators have set this to zero, although the literature on the term structure frequently works with models in which "¢ = 0.5, see Cox et al. (1985). If ~/4: 0, there is a " l e v e l s " effect on volatility, which is over and above whatever specification is adopted for (rt2. There is some interest therefore in estimating ~/. Chan et al. (1992) use (36) with h = l , ( r t = ( r , since then E(u~-cr~y2t't)=O, where u t = ( A y t - - a - bYt_~) 2, provides a moment condition to estimate "V. Brenner et al. (1994) generalize this to allow for o't2 to be a G A R C H model, since it is possible for the levels effect on volatility to simply reflect mis-specification in or,2. They find 46 One might wish to use as accurate an approximation as possible; second order schemes such as the Mihlstein (1974) approximation might therefore be preferred. 47 There are many other questions focussing on the relation between G A R C H and continuous time processes that have been examined in the literature. Nelson (1990c) shows how to formulate a continuous time SV model that is the limit of a G A R C H process. Also of interest has been the question of whether a G A R C H process might be used to consistently estimate continuous time volatility. Because G A R C H models average e~_j, Nelson and Foster (1992) interpreted them as a filter and showed that the answer was in the affirmative. A good summary of this literature is available in Bollerslev et al. (1994). A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 65 strong evidence of a levels effect. A similar conclusion is reached by Koedijk et al. (1993) who have cr~ as driven by e,-12 rather than ~r2t_l¢2t_l. Pagan et al. (1995) and Broze et al. (1993) use indirect estimation to estimate -/ keeping ~t2 as a constant. In the first paper the auxiliary model was either the discrete Euler approximation with h = 1 or an E G A R C H ( 1 , 1 ) model; in either case the value of 48 ",/ was around 0.7 for short-term interest rates. 3.6. A r e means and variances related? The G A R C H - M model W h e n modeling returns it is conventional to take these as the excess over some risk free rate, so that it is really a risk premium that is being explained. Similarly, the difference between spot and forward rates is a risk premium. Theoretical models, such as Merton (1973), make the market return a function of volatility, i.e., the risk premium should be larger when the asset return is more volatile. O f course, for an agent making decisions at a point in time t, the appropriate concept of volatility is what the conditional variance of the asset return, try, would be over the holding period for the asset, leading to the relation xt = ~ g ( tyt2 ) + e,. (37) In M e r t o n ' s model, g(crt2) = 13tr2, 13 > 0, but other functional forms for g ( - ) could emerge, possibly allowing the response to depend upon the sign and level of tr 2. Higher order conditional moments might also enter into the relation in certain types of utility maximizing models, although, in an interesting experiment pricing stocks in an economy with a stochastic dividend process, Gennotte and Marsh (1992) found that g was linear, Models such as (37) feature a conditional mean for returns that depend on higher order conditional moments. Early work on this topic, e.g., Pindyck (1984), made cy2 a distributed lag of the squares of returns, i.e., ty 2 K 2 andcan proceeded to regress returns on this constructed variable. 4~Many--Y"k=°wkxt-k'criticisms be made about this practice see Pagan and Ullah (1988), but an important one is that the assumption implies a very arbitrary model for the conditional variance. This leads to the idea of expressing tr~ as some parametric model in the G A R C H class, and Engle et al. (1987) accomplished this with their A R C H in mean ( A R C H - M ) model, wherein (37) was augmented by an A R C H model for tyf such as (r~=ao+c(le~_ ,. (38) M a x i m u m likelihood was used to estimate the unknown parameters (once a form for g ( . ) was specified). Assuming that e t was conditionally ~¢I(0,(r~), it is easily 48 Some of the volatility may be due to jump processes driving the diffusion. Das (1993) estimates such a combined model for bond yields. 49 More precisely a weighted average of (x t - 2t) 2 was employed, where 2 t was a rolling sample average. 66 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 seen that this means x, is conditionally normal (~g(ot 0 + ot~e2_ ~),crr2), allowing a likelihood to be easily written down and maximized. Extensions to allow (r,: to follow GARCH or EGARCH forms are quite straightforward. Analysis of ARCH-M models is much more complex than was true of pure ARCH models. In the latter, whatever variables impinged upon the conditional mean could be regressed out, and then the residuals from such a regression, ~,, might be used for diagnostic checking and specification. Previously, the autocorrelation function of the squares of the residuals was a primary device for specification analysis. Now, however, it is impossible to estimate e t without first specifying a valid model for cr~ (and g(.)), so that pre-estimation specification analysis is now very difficult, making post-estimation investigation very important. It was for ARCH-M models that the specification tests set out by Pagan and Sabau (1992) were developed. Since estimation is by MLE, specification tests involve setting up some moment restriction that is true under the null, E(mt(O))= O, regressing mr(0) against unity and the scores for 0, and then testing if the intercept in such a regression is zero. The ARCH-M structure can also be troublesome when trying to establish sampling theory for parametric estimators. Clearly, if (rt2 was to be made IGARCH, a very common finding in the applied literature, there would be a regressor in (37) that would have an infinite variance. If g(. ) was the square root function then the regressor would be ~r, and one might weight the data with this term, as was done in the proofs for IGARCH, but then difficulties would arise in estimating an intercept in the equation. A theoretical analysis of this problem is important, given the prevalence of the GARCH-M model in the analysis of risk premia - see Bollerslev et ai. (1992) for a list of applications. Recently, Lee (1992) has established sufficient conditions for the asymptotic normality of the MLE of the GARCH-M (1,1) model when g(cr~) = (rt. An important condition is that E[([31 - o~ l~Et//[~l --[- o£ IE~ I,~-t_ 1) ] < 1. When et is n.i.d. (0,1) this becomes an unconditional expectation and, with ot z + 13~--1, the condition effectively implies that 8 has to be in the range - 3 < ~ < 3. An alternative approach that can sometimes pay dividends is to engage in a non-parametric analysis, estimating Et_l(X t) and cr~2 in this way rather than through parametric models. Pagan and Hong (1991) had some success with this procedure for the series on the excess yield on Treasury Bills used in Engle et al. (1987), but very little for the return to equity on the NYSE. In fact, the conditional variance of the latter was well estimated, and displayed the typical skewed response to the sign of returns mentioned earlier, but no relation was found between the conditional mean and variance of returns. Indeed, I think that this is a fair summary of the outcome of parametric investigations as well. In a marked departure from the consensus of the literature, Linton (1992), using semi-parametric techniques, finds that the g ( . ) function in (37) should not be linear, and that taking account of the non-linearity is crucial for finding an important impact of (r,2 upon the excess holding yield. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 67 A note of caution needs to be sounded regarding the use of non-parametrics in this context, as the presence of cr) in (37) means that E t_ i(xt) depends upon all past returns, even if ix) is just an A R C H ( l ) model. To see this let g(ty)) -- cr2, 2 i, so that (37) becomes and (r~ -- a 0 + ot i e ,x, = 8(or 0 + txle~_ l) + e, (39) By continued iteration it is evident that all past x t appear in the determination of x r In fact, the dependence is a very non-linear one, and conditions for x, to be covariance stationary have yet to be worked out. From this simple model it is apparent that A R C H - M models imply that the conditional mean of x t depends on an infinite number of conditioning elements, whereas the methods of non-parametric analysis are restricted to a finite number. Hence, a non-parametric analysis could not exactly replicate an A R C H - M model structure, and the quality of its approximation will be a function of the magnitude of coefficients such as ~xj and 131. 4. Statistical representations of multivariate data In an earlier section we described the properties of univariate data and statistical models that were useful descriptions of such data. However, financial analysis is largely concerned with the relationship between series, e.g., options prices and volatility, rates of return on different assets, forcing a proper analysis of the multivariate characteristics of data. By far the most important summary measures of how series relate to one another have been those concerned with the number of factors that are common to the data, and we will organize this section along those lines. 4.1. The common trend factor Given that financial series are frequently I(1), determining if series are cointegrated is a first step in much financial analysis. For example, as seen in Baillie (1989), the expectations theory of the term structure for n maturities would imply that there are (n - 1) co-integrating vectors among these bond yields. There are a variety of methods for testing for the presence of co-integration and the number of co-integrating vectors. Johansen's technique is perhaps the one most in use as it has been programmed into many microcomputer packages (Johansen, 1988). Testing if the one month, two month, three month, four month and five year zero coupon bond yields in the McCulloch data set are co-integrated, Johansen's maximal eigenvalue test statistic values were (with 5% significance critical values in brackets): 72.89 (34.4) for r = 0 vs. r = 1; 51.94 (28.13) for 68 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 r = 1 vs. r---- 2; 39.89 (22.00) for r = 2 vs. r3; 31.28 (15.67) for r = 3 vs. r = 4; 2.90 (9.24) for r = 4 vs. r = 5. Consequently, one could be comfortable with the conclusion that there are indeed four co-integrating vectors within this data. However, in many applications, the co-integrating vectors are frequently predicted to be certain values by theory, so that one might want to impose these and then test if the series formed from such linear combinations are integrated or not. An example is the term structure case which predicts that any two yields are linked by a co-integrating vector [1 - 1]. Thus, we could form the vector of series such as the yield differential between yields at all maturities and one-period yields, and then test if there are unit roots in these series. We might in fact just run a sequence of unit root tests, such as were discussed earlier, for each of these series separately, rather than explicitly looking at it in a multivariate framework. In any case, the techniques for testing for co-integration, and the rationale for them, requires a separate treatment. It is probably worth mentioning that care has to be taken when estimating co-integrating vectors, if more than one is expected, as their lack of uniqueness means that reduced form estimators such as Johansen's may produce linear combinations of the vectors rather than the vectors themselves. To illustrate this, Johansen's estimator was applied to a data set consisting of one month, 10 year and 20 year bond yields from the McCulloch data set. The expected co-integrating vectors were [1 - 1 0] and [0 1 - 1 ] , i.e., the interest differentials are I(0), but the actual estimates found (by MFIT386) were [ - 1 2.8 - 2 . 0 ] and [ - 1 - 16.3 17.5]. Approximating these by [ - 1 3 - 2 ] and [0 - 1 1], it is seen that the estimates are the linear combination - 1 × [1 - 1 0] - 2 × [0 - 1 1]. In this instance it is easily seen how to unscramble the final estimates to get back to a set of basic co-integrating vectors, but in those cases where there are u n k n o w n elements in the vectors that would not be possible. 50 For example, if one was estimating a demand elasticity for financial asset holdings using observations (say) on interest rates and an aggregate portfolio, there is a risk that the estimates found could actually be linear combinations of the demand and supply elasticities. The solution to such problems has to be to formulate and estimate structural models and not to attempt to recover structural parameters from a reduced form. Because of the emphasis upon factors in financial analysis, in most instances it is useful to re-formulate co-integrating relations as the c o m m o n trends idea in Stock and Watson (1988). Let Yt be an n X 1 vector of I(1) variables with r cointegrating vectors. Then Ay,=C(L)e, (41) 50 For the McCulloch data the hypothesis that the co-integratingerrors are the spreads is rejected. Pagan et al. (1995) discuss what might be the cause of this outcome. Possible reasons are non-linearities in the underlying processes, the presence of levels effects in volatility, and the effect of having near-integratedrather than strictly integrated series. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 69 would be an M A representation of the I(0) variable A Yt, and this may be written as A y t = C ( 1 ) G + C* ( L ) A e t, (42) using the result that C ( L ) - C(1) + AC* (L) (C* (L) is actually defined by this relation). If there are r co-integrating relations among the y~, defining a vector of stochastic trends "rt with dimension ( n - r) which evolve as "rt = "rt_ l + d~,, (43) where "ro = 0, we get the " c o m m o n trends" representation y, = J'r t + C* ( L ) e,. (44) In (44) Yr is driven by ( n - r ) stochastic trends and qbt in (43) is white noise. 5~ If there were no co-integrating vectors ( r = 0) there will be n trends driving the n variables, i.e., there are no c o m m o n trends and each variable exhibits separate trending behavior. For the term structure case r = n - 1 and there should be a single common trend that drives all interest rates; for a small country this is probably the world interest rate. Many studies have been performed on the number and nature of common trends in various asset price series, e.g., Baillie and Bollerslev (1989) for exchange rates and Kasa (1992) for stock prices. 4.2. C o m m o n factors in detrended series It is possible that there may be common factors in series after these have been detrended in some way. There are two ways in which these common factors might be manifest. In the first, the cov(ut), u t = C * ( L ) e t , has full rank but with restrictions upon it induced by a factor structure, while the second features the case in which u t = C * ( L ) e t has a rank-deficient covariance matrix that can be further decomposed into J~,, where p(cov(u,)) = p(cov(~,)) = p ( J ) < n, and p(. ) denotes the rank of the matrix in brackets. This latter situation leads to what Vahid and Engle (1993) have called the " c o m m o n - t r e n d , c o m m o n - c y c l e " representation Yt = J"rt + J ~ t , (45) and, cast into our taxonomy, the series can be regarded as being composed of two types of factors, the " c o m m o n trends", % and the non-trend factors, ~ r 52 This type of representation is very popular in the term structure literature, e.g., Cox et al. (1985), Chen and Scott (1993) and Duffle and Kan (1993). Two different views 51 We have suppressed the dependenceof Yt upon Y0. 52 Although we have used the terminologyof Vahid and Engle the situation we are interested in is different to that of those authors. The var(ut) could be rank-deficientbecause either all elements in C*(L) or var(e~) are singular. Our focus is upon the latter; Vahid and Engle concentrate upon the former. 70 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 of the non-trend factors can now be elicited, depending upon how y, is "detrended" - the first concentrates upon returns A Yt, while the second deals with the co-integrating errors [, = cx'y,, where et is the ( r × n) matrix of co-integrating vectors. The latter will generally be the " s p r e a d s " between asset prices. 4.2.1. Factors in returns The most famous factor model of returns r t = A y t is of course the "market m o d e l " i n which r, is a vector of returns. 53 Suppose that the objective is to explain the return on the jth portfolio of assets rj,. If there is only one asset per portfolio this will simply be the return on that asset. " P o r t f o l i o s " can be interpreted in diverse ways. For example, Harvey (1991) has rj, as the return on equity in the jth country. In the market model rjt is related to the return on the market or aggregate portfolio, r,,,. The latter may be a simple average of the returns to all stocks in the economy or perhaps is a weighted average, with weights depending on the value of the portfolio accounted for by each stock. A popular way to derive the factor structure is to impose the restriction that rit and r,,, are multivariate normal, from which it follows that E ( rjt [rmt ) = etj + ~jrmt , (46) where 13j = c o v ( r j t r , , , ) / v a r ( r , , t) and a j = E(rj,) - [3jE(rm,). Therefore, rjt = a i + 13jr,n' + ej,. (47) Hence, ri, could be predicted once rm,, a j and 13j are known. The object of interest therefore is the determination of 13j, as that parameter summarizes the relationship of the return on the jth asset to that of the market portfolio. Traditionally 13j has been estimated by regression, meaning that the range of problems encountered with any linear regression model recur here as well. Examples would be that outliers in the data can affect the point estimate of 13i, which encourages robust estimation of the parameters, while the presence of heteroskedasticity in returns can also be true of the errors in (47), thereby demanding the computation of heteroskedastic robust standard errors in order to make valid inferences. Some have argued that a linear relationship might be inappropriate. Tracing the argument back to the source of linearity however, the question becomes one concerning the underlying multivariate density. Linearity is associated with normality but not uniquely so, for example Spanos (1986, pp. 122-125) points out that the Student's t-density yields a linear conditional expectation, while densities such as the bivariate Pareto result in non-linear conditional means. Moreover, models formed from densities such as the bivariate t also have implications for conditional variances. Non-parametric estimation methods can be usefully employed here to shed light on the conditional moments of an asset return given the market return. 53 This is an example of a factor model in which cov(u,) is not singular. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 71 It is possible to produce multiple factors by concentrating upon the nature of 13-i in (47). There it was a constant equal to the ratio of the cov(rit,r,, t) to the variance of r,,,t. However, in line with the distinction between conditional and unconditional moments, one might wish to consider models for rjt in which the conditional rather than the unconditional density of returns is utilized. To this end let Ft- t be a set of conditioning variables, not including rjt and rmt but containing their past histories. Then, the conditional capital asset pricing model has E[ r-it [ F,_ 1] = fA-itE[ r,,,[ Ft_I] (48) where 13j, = cov(r-itr,,,, [ F t_ l)/var(r,,,t [ F t_ 1). Because the coefficients of the conditional market model are functions of the conditional moments for r i, and r,,,, it is necessary to model these in some way. Assuming that rmt = E jK= l r j t = l P r t , where r~ is the (1 x K ) vector of returns (rl,...rKt), it is clear that var(r,, t I Ft_ 1) = var(~rt ] Ft-1) = /'c°v(rt I Ft- 1)~, while cov(rjtrmt I Ft- 1) = l'cov(rt I Ft_ t), showing that the conditional moments appearing in (48) are determined by the conditional means and variances of the returns r-it , so that multiple factors are involved, albeit in a non-linear fashion. This is also true for the literature on optimal hedging where the conditional variance of the optimal hedged portfolio is minimized by choosing as weights the ratio of the conditional covariance of spot and futures prices divided by the conditional variance of the futures price. Modeling these constituent moments was the concern of earlier sections. A natural way to proceed is to allow the conditional mean for r jr to depend on its conditional variance, as in GARCH-M models; and to subsequently model cov( G IF,_ i) by a multivariate GARCH or factor GARCH structure described later. Bollerslev et al. (1988) estimate such a model for stocks, bonds and bills. Estimation is just as for the GARCH-M model described earlier. A number of other applications of this model to financial data have been made and these are described in Bollerslev et ai. (1992). Baillie and Myers (1991) apply the GARCH framework to the determination of optimal hedge ratios. Although the linearity of (48) seems to be intimately connected with a conditional normal density for returns, models have been suggested that take the linearity as a prior constraint and then proceed to specify other models for the conditional moments than those in the GARCH class. Harvey (1991) makes Et(rjt) = Z't~-i and Et(rmt) = Z'tS,,, where zt are elements in F t_ 1, so that e-it = rj, - z't~-i and e,,,t = r,,,t- z't ~, allowing (48) to be written as ztSj ' -- ( zt~,.)E[ ' e-ite,,,tl z t ] / E [ e,,,, 2 I zt], (49) from which E[ e~, Z;~j I zt] - E[ ej, e,., z;~,~ ] zt] = 0. (50) A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 72 By definition E[ zt emt] = E[ zt %,] = 0, and these two moment conditions, along with (50), may be used to estimate ~j and ~,, by GMM. No estimate of [3j, can be made without being more specific about the conditional moments, but it is possible to test various hypotheses about the conditional CAPM using the GMM estimates, as well as to assess how adequate an explanation it provides of the data. There are many variants of this idea of estimating 13 from its conditioning set, e.g., Schwert and Seguin's (1990) single index model of the conditional covariance between the ith and jth portfolio (see (63)) implies a covariance between the jth portfolio and the market portfolio that is a linear function of the market conditional variance, so that [3jt can be estimated quite simply from their model (Schwert and Seguin, 1990). Braun et al. (1992) set up a model, initially based on (48), treating r,,, and rjt as coming from a recursive bivariate EGARCH process in which the innovations driving (r,,t, 2 Zm,, are "aggregate shocks", while crj2 depends on specific shocks to the jth portfolio, z jr, as well as to the market shocks. Subsequently, however, they adopt an ad-hoc specification of 13t as in (51) (the definition of z's is given below (15)) 13jt"c--~kO'~-~k4(~jt_l--~kO) q-~klZm.t_lZj,t_l'~-)k2Zm,t_l'~-~k3Zj,t_l , (51) where the last two terms allow for leverage effects on conditional betas; if both are negative conditional betas rise when returns fall. The distinctive feature of the approach just mentioned was the formulation of the conditional beta as an autoregression. Representing [3jr as an autoregression seems to have been successful in practice, Rosenberg (1985), and one might think of it as an approximation to whatever the true process for 13it is, hoping that by choosing the order of autoregression large enough a good approximation would be achieved. Rosenberg formulates his model as rj, = ta + ~3jtr,,,t + ej, (52) f3j, = pj13j,_ l + vj,, (53) and estimates the unknown parameters techniques. Generally this is done by conditioning set based on past data and it follows that the log likelihood of asymptotically negligible) 1 1 p:, crfj by varying coefficient regression maximum likelihood. Defining F t as a rmt, if rj~ is N(~t,cr 2) conditional on F,, rj~ . . . . . r : is (ignoring terms that are - -~ E l o g 0.2 - -~ Eo-t 2( F j t - I,.t,t)2. (54) Generally, P-t and er~ can be computed by the Kalman filter for any given values of the unknown parameters, and so (54) can be maximized. There is obviously some difficulty in allowing rm, to appear in the information set as it is a linear combination of all the portfolios, although for a large enough number of A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 73 assets, and small portfolios, we might think of the market as solely reflecting the macro shock and it is therefore this that is being conditioned upon; the error term e~t then just reflects the micro shocks. 4.2.2. Factors in spreads The dichotomy in the ways that factors manifest themselves is most important when the detrending is done via the construction of " s p r e a d s " . Mahieu and Schotman (1994b) consider a model of bilateral exchange rates in which the changes in the rates reflect differences in the news in each country. Taking a system of bilateral rates it is clear that there is a c o m m o n factor - the news occurring in the country whose currency is being used to express the rates. Thus, if we consider the y e n / S , m a r k / $ and f r a n c / $ rates, news about the U.S. will represent the c o m m o n factor, and this imposes a restriction upon the covariance matrix of the returns which can be used to estimate the characteristics of the factors. Factors giving rise to a singularity in the cov(u t) need to be given a separate treatment. A slightly different interpretation o f the factors ~, in (45) is available by multiplying (45) by the transposed matrix o f co-integrating vectors, ct', and imposing a ' J = 0, to get ~ = c~'yt = ct'.7~. Hence, p(cov(~,)) :- min(p(a'J), p(cov(~t) ) = min(r, dim(~,)). Since the total number of factors must be less than n, d i m ( ~ ) < n - (n - r ) = r, and it is apparent that p(COV(~t)) = p(COV(~t)), i.e., the common factors ~t show up in the cointegrating errors ~t. Generally, the co-integrating errors have a clear interpretation in financial data as " s p r e a d s " , e.g., in the case of the term structure they are the spreads between yields of different maturities, while in relations between a spot and a forward market they would be the forward premium. Consequently, this result directs attention to the spreads in the search for c o m m o n factors. 54 Spreads between asset prices have been extensively studied in recent years. D y b v i g (1989) applies principal components to ~, to conclude that the number of factors in the term structure is much lower than the number of rates. Applying principal components to the data set composed of sp2, sP3, sp4, sps, sP6, and spg, where spj is the spread between the j-month and one month yields, the eigenval- 54 Vahid and Engle (1993) noted that, if there were common cycles in Yr, there exists a ~ such that ~'J = 0, meaning that ~'Ay~= ~'J~b~ would be white noise and a test for common cycles could be regarded as a test for serial correlation in the residuals ~'Ay~, where g is an estimator of ~. Obviously determining the p(cov(~)) represents another way of finding the number of common cycles. There may be advantages to it as this method does not require the use of instrumental variables as their test does. 74 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 ues of the covariance matrix of these variables are 1.8, 0.05, 0.005, 0.002, 7.6 × 10 -5 and 1.3 × 10 -5, pointing to the fact that these six spreads can be summarized very well by two components (at most), s5 Restricting attention to s P 3 , sP6 and sP9, the eigenvalues are 1.2, 0.03 and 0.002, telling the same story. Knez et al. (1989) estimate a factor model for the excess yields over the repo rate by maximum likelihood. Other factor models, based on the C o x - I n g e r s o l l - R o s s model of the term structure, have been estimated by Chen and Scott (1993) and Pearson and Sun (1994). A difficulty in directly applying principal component methods is that ~t will rarely be i.i.d, and some allowance should be made for that fact. To describe the autocorrelation structure of ~t, return to the levels y,, and observe that the presence of co-integration among the series means that they follow a vector ECM (taking a first order system for simplicity) AYt = ~/~t- I + vl, so that a'A y, = ACt = a'~/~,- i + a' v,. (55) It is worth dwelling on some of the implications of (55). In particular, the matrix ~/ represents the influence of past spreads upon the returns Ayt. Given the earlier evidence that it was hard to predict A Yt using past information, it is likely that ~/ will be close to zero, and therefore ~t will exhibit " n e a r unit r o o t " behavior. Strong persistence in spreads has indeed been observed - see Evans and Lewis (1994), Pagan et al. (1995) and Hejazi (1994) for the term structure and Baillie and Bollerslev (1994) and Crowder (1994) for the forward premium in exchange rates. Baillie and Bollerslev argue that the persistence seen in the premia is best represented as a fractionally integrated process rather than the I(1) structure proposed by Crowder. The same point may also apply to the term structure; Hejazi found that there was no evidence of a unit root in the excess holding yields but that, if one regressed them on a forward rate which was I(1), the resulting regression coefficient was non-zero. This suggests that the type of persistence apparent in holding yields is not of the I(1) variety. 4.3. Relations between financial series If expected returns remain constant, efficient markets demands that all information has been used in pricing an asset so that returns should be uncorrelated with information from other sources than just the past history of returns. Thus the autocorrelation function does not provide a sufficiently wide perspective to judge 55 It is interesting to observe that the covariance matrix of bilateral returns given in Mahieu and Schotman (1994b, p. 282) does not reveal any singularity, emphasizing the need to distinguish between the two impacts a factor structure might have. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 75 questions regarding efficient markets, and this has led to research demonstrating that returns are influenced by a wide variety of other series, e.g., future stock returns are explained by the dividend yield ( D t / P t ) . 56 Much of this work involves just simple regressions and little in the way of new econometric technique is needed. One relatively new development has been Fama and French's extension of their long horizon returns work (Fama and French, 1988b). They consider the regression of k period returns on stocks, rt+t~,t , against the dividend yield and test if this coefficient is zero or not. They sample data so as to avoid over-lapping errors, and therefore work with traditional OLS standard errors when testing the hypothesis of no relation. This reduces the effective sample size quite significantly for large k. Hodrick (1991) points out an alternative approach to Fama and French. Because a k period continuously compounded return is the sum of one period returns, i.e., rt+~. t =-rt+ 1 + . . . + rt+k, the regression coefficient is proportional (in large sampies) to E[{r,+ 1 + ... + r t + k } ( D t / P , ) ], which is the probability limit of the numerator when r t are stationary ergodic processes. The denominator just represents a positive scaling factor so a test of whether the parameter is zero is a test of whether the numerator is zero. But, under stationarity, this is identical to E[ r, + l{(Dt//Pt) + "" + ( D r - k + l / / P t - k + I ) } ] , SO that k-I we could interpret Fama and French's methodology as regressing r t against E j= 0 [ D r _ j / P , _ j] and testing if the coefficient of the regressor is zero. This last regression has the advantage that it uses all observations and that there is no overlapping data problem. In practice Hodrick multiplies the sum of lagged dividends by k - 1 so that the slope coefficient measures the response of annualized expected returns over a given horizon to a change in the ex ante dividend yield. He uses heteroskedastic and autocorrelation consistent standard errors, since there is no certainty that returns are i.i.d, under the null (and we actually know that this is extremely unlikely). Bekaert and Hodrick (1992) mention an alternative procedure that reduces the loss of observations owing to the need to lag ( D / P ) t k times. They put the log of the first period return, the dividend yield, and the one month Treasury Bill yield ( r b , ) into a vector zt, and then fit a first order VAR Zt = A z t - l + "qt, (56) to this trivariate system. The jth autocovariance of zt is given by C ( j ) = ArC(O), where C(0) is the variance of zt and is a function of A and var(rl,). Hence the autocovariance between the log of the first period return and the jth lagged 56 As Gennotte and Marsh (1992) point out one should be careful in giving such correlations a causal interpretation. They simulated an artificial economy with fully rational actors but the generated data exhibited a return-dividend correlation since the stochastic output process driving this economy affects both dividends and stock prices. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 76 dividend yield is e'lC(j)e 2, where e s is a (3 × 1) vector with unity in the / t h place and zero elsewhere. Thereupon E[ rt+ ,{( D t / P , ) + ... + ( Dt_k+ ,/P,_k+ ,)} ] = e'lC(1)e 2 + ... +e'iC( k - 1 ) e 2 = e ' , a c ( O ) e 2 + e',A2C(O)e2 + ... + e ' , A k - l c ( O ) e 2 = e'lA[l+A+... +Ak-']C(O)e2, and we can test if this is zero by finding var{e', A ( I + A + ... + A k - ' ) C ( O ) e z ) . This variance can be found using the ~ method. It would seem that the only advantage this method has is that it economizes on observations, since only one observation is lost in estimating the VAR, and the autocovariances are all then derived ones. For small numbers of observations, or for very large k, this may be a saving, but it is not so clear what the advantage is otherwise. Moreover, there is a potential cost if the process is not well represented by a linear VAR; in particular if there are non-linearities affecting various autocovariances. Just as was true for univariate series one might seek non-linear relations between series rather than linear ones. Suppose we want to estimate the E( y, I x,), without imposing any functional form upon this relation. By definition y, = m ( x t ) + u,, (57) where m ( x t) is some unknown function of x, and u, is taken to be i.i.d. (0,~r2). An example that has attracted some interest would be if y, is the price of a derivative while x t contains the price of the derivative asset (pt), the strike price (st), volatility (or), the risk free interest rate (r,) and the maturity of the derivative (-r). In Black and Scholes' formula u, = 0 and a specific function linking these quantities is provided, mBS(Xt) =" pt¢I)( d, ) - ste-r'~'dP( d 2 ), where l o g ( P , l s , ) + (r, + a 2 / 2 ) ' r dl dz O'T = d I - or-r, and qb(.) is the cumulative normal density. However, it is known that this formula tends to break down for out-of-the money and short-maturity options, where the degree of non-linearity seems more pronounced. Moreover, the derivation of the Black-Scholes formula assumes that volatility is a constant, whereas implicit volatilities computed from options prices using the formula change over time and also seem to be a function of p , / s , - the latter constituting the well known " s m i l e " effect in volatility. Consequently, it is of interest to estimate m ( x t) non-parametrically in an attempt to correct for these problems. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 77 The basic idea behind nonparametric estimation of E(ytl x,) is to approximate m( x t) arbitrarily closely by combining a set of basis functions }2~g=l l3itpi(x) and to then estimate the 13j, leading to ~ ( x , ) = Y',y=t[39tpj(x,). Hutchinson et al. (1994) fitted the options model above using neural networks in which ¢pir=(1 + e x p ( - x ' t ~ ) ) to approximate m(xt). Kearns (1993) simulated data from an options pricing model with stochastic volatility and non-parametrically estimated the non-linear function implied by a particular pricing model using the flexible Fourier form of Gallant (1981), which has as tpit equal to 1, xt, x~, cos(jxt), sin (jx t) ( j = 1,2,...). After this model has been calibrated, it can be used with actual data to produce predictions of observed options prices. The prediction error may be used as a diagnostic device. The method seems to work quite well. In Keams' case 98% of the variation in options prices was captured. Each of the approaches above has approximated the unknown function globally and then predicted what value it would have at given points x t = x. An alternative method is to estimate c~ = m(x, = x) by using only those observations whose value is close to x. Since we can think of re(x) as a constant, the appropriate estimate is found by choosing ~ to minimize Etr i ( Y t - o t ) 2 K ( x t - x / h ) , where K ( - ) is a kernel or weighting function that gives low weight to observations for which x t is far from x. The window-width parameter, h, determines exactly how far away the observation can be in order to be included in the computation. From standard least squares theory this gives ~n( x t = x) = ~ = , ~ K ( t ~ t ) t= K(t~t)' (58) where ~, = ( x ~ - x ) / h . Diebold and Nason (1990) consider predicting exchange rates using a conditional mean estimated non-parametrically in this way, the conditioning variables being the lagged value of the rate. Huang and Lin (1990) have analyzed non-linearities in the term premiums and forward rates with these methods. The Cox et al. (1985) model of the term structure implies a linear relation between expected term premiums on Treasury securities and forward interest rates. However, the linearity stems from the assumptions on the stochastic processes driving the model, which make the state variables linear Markov processes. If the forcing processes were non-linear, term premiums and forward rates would be related in a non-linear way. Huang and Lin apply kernel based methods to the data, but discover that the linearity assumption seems quite good. Pagan and Hong (1991) look at the conditional mean for excess holding yields on Treasury bills as a function of past yields and the yield differential, finding that the relation seems to be non-quadratic. The principal disadvantages of such local analysis are that there are well known biases in the estimator of m(x) and, when the dim(x t) is large, very few observations will be used to determine each point. In recent years an attempt has been made to combine the two approaches by assuming that the function can be parametrically approximated around the point x by, say, a linear polynomial, so A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 78 that the optimization problem becomes one of choosing ot and 13 such that Etr-- l ( Y t - c t 1 3 ( x t - x ) ) 2 K ( t ~ , ) is minimized, e.g. see Fan et al. (1994), and it seems as if these locally p a r a m e t r i c methods can produce big improvements in the properties of the local estimator. Of course, there is no reason to chose a polynomial in x t. In fact, if one thought that the global function was likely to have a specific shape, then it would make sense to use that. Thus, for options pricing, one might use m B S ( x , - x ) , and this idea has been exploited by Bossaerts and Hillion (1994). Another important application in finance of non-parametrics has been to estimate the yield curve. McCulloch (1971) pioneered the application of spline functions to this - here Yt are observed yields, x, is the maturity and ~b~ are regression splines - while Fisher et al. (1994) use smoothing splines, which add extra constraints to penalize " r o u g h n e s s " in the estimated yield curve. Gouri6roux and Scaillet (1994) use the local parametric idea with the candidate parametric functions being factor models taken from the term structure literature, such as Vasicek's (Vasicek, 1977). Another way of proceeding is to model the joint density of y, and x t and to then derive the conditional moments from that. If one uses the SNP method, the joint density would be approximated as the product of a polynomial and a multivariate normal. Gallant et al. (1992) have used this approach to examine the relation between returns and the volume of trading. They show that the variance of returns conditional on both past returns and the volume of trading shows no asymmetry in returns, so that it appears that most of the asymmetry is due to the fact that the volume of trading is different in a bear than a bull market. 4.4. M o d e l i n g multivariate volatility It seems very likely that the conditional variance of the return on an asset would be related not only to the past history of its own return but also to those on other assets. Indeed if one non-parametrically computes the variance of stock returns conditional upon the past history of returns as well as the volume of trading, it appears that the asymmetry between the variance and past returns noted previously disappears, emphasizing the need to look at multivariate relations between the variances of series. One trend has been to engage in multivariate extensions of the univariate GARCH, E G A R C H etc. models. Defining the conditional covariance matrix of an n × 1 vector of returns x t as ~ , , Bollerslev et al. (1988) defined the multivariate G A R C H model to be x t = w~13 + e t v e c h ( ~ / ) = A o + ~q= 1Bj vech(~Qt_j) (59) P + ]~ A j v e c h ( e t _ j e ; _ j ) , j=l (60) A. P a g a n / J o u r n a l o f Empirical Finance 3 (1996) 1 5 - 1 0 2 79 where e l = ~-~lt/2 E t and ¢t is n.i.d. (0,I,). The log likelihood is easily written down as T. 1 1 ---~log2'rr-~F~loglf~,l--~E(x,-w',ft)'12-~l(xt-w'tB), (61) but maximizing (61) is a formidable challenge owing to the huge number of parameters entering it - from (60) alone there are n(n + 1 ) / 2 + ( p + q ) n z (n + 1)2/4, and for (say) n = 3, p = q = 1, there are 78 G A R C H parameters to be estimated. For this reason most applications have concentrated upon ways of restricting (60) in some sensible fashion to reduce the number of unknown parameters. A first suggestion was by Bollerslev et al. (1988) who set p = q = 1 and made the matrices A~, B l diagonal, giving the simple model for the elements of 12,, (62) f f ij,t = OtO,ij + O~ l , i j e i t _ I e j t - 1 + ~ l , i j ° i j , t - 1 i , j = 1 . . . . . n, for a total of ( p + q + 1)(n(n + 1 ) / 2 ) parameters: when n = 3, there are 18 parameters. Later Bollerslev (1990) proposed that (rij.t = Pij~riYt=cr)f,2(i¢ j ) . In this model there are n(n + 1 ) / 2 correlation coefficients pq as well as the parameters of the N conditional variance equations to estimate. If each one of these variances was univariate GARCH(1,1), then the total number of parameters would be reduced to 3n + (n(n + 1)/2), i.e., 12 if n = 3. Such a specification certainly looks attractive as it does not seem unreasonable to restrict the conditional covariances to vary in line with the conditional variances. Nevertheless, there are still a very large number of parameters and one might also wish to further reduce the number of unknowns by imposing some structure upon the pq. Various other formulations have been tried; one which has the advantage of being both parsimonious and ensuring that the estimated f~t is positive definite is that given in Engle and Kroner (1995), M ~t=C+ E q M E Bmj~t - j B ' j + E m=lj~l p E A m j e t - j e t -'j Y l m-'j , m~lj=l where the Bmj and Amj each possess n 2 unknown parameters. A different approach to reducing the number of unknown parameters is by changing the number of forcing variables, i.e., rather than operate upon Aj and Bj, the idea is to replace vech(e,_je',_j) by a smaller dimensional vector. Generally, this is done by concentrating upon a small number of " f a c t o r s " that are thought to drive the conditional variances. One perspective on this is that it is derivative from Ross's arbitrage pricing model in which a vector of returns x, is written as a linear combination of K < N factors ft, i.e., x t =f[f5 + et, and this idea is just being shifted to the second moment (Ross, 1976). The difficulty is to translate this idea into an operational model, since factors are generally unob- 80 A. P a g a n / J o u r n a l of Empirical Finance 3 (1996) 15-102 served. One strategy that does so is Diebold and Nerlove's formulation which has Ot a function of a single latent factor, ft, and the conditional variance of the factor, crf,, following a GARCH process, thereby endowing 1~t with the structure ~ t = C + 8~'crft (Diebold and Nerlove, 1989). Since E(l)~) = C + 3~'var(ft), they used factor analysis on the unconditional covariance matrix of returns (they have no w'~'s) to determine estimates of j~ and then applied these to estimate the GARCH model, producing df. After the latter quantity is determined, MLE can proceed in the standard way. Harvey et al. (1992) point out that it is possible to estimate E t_ i(xt) and vary_ j(x,) under this model and so the quasi-MLE can be applied. Actually, it would seem that the simplest way to estimate the model would be to perform simulated MLE, since ft can be generated for any given values of the GARCH parameters, and this would avoid the two step procedure. Although not explicitly taking the viewpoint adopted here, one can view Schwert and Seguin's work in this way (Schwert and Seguin, 1990). In their case x t was a vector of monthly returns to different portfolios of stocks, and the common factor was taken as the market return, i.e., they set ~ijt = O~Oij "~- OLlij(Yt q- (?t2ijt'~t2' (63) where trf is the conditional variance of the market return, measured in their case by the volatility of daily returns to the S & P composite portfolio from 1928 to 1986. Thus, in this instance, the factor is rendered observable by the use of daily data, although it could have also been estimated by (say) making (rt2 a GARCH(1,1 ) and using the monthly data on the aggregate portfolio. Schwert and Seguin also 1/2¢r 1/2 , i.e., the equivalent of what Boilerproposed that (rijt be replaced by nor ,.vii.t ~jj.~ slev does in the general GARCH model. One can estimate (63) by OLS replacing g~j~ by x~, xjt, although one needs to correct for heteroskedasticity in the errors when making inferences about the a's. 57 They find that small firm portfolios are four times more sensitive to market volatility changes than are large firm portfolios. A generalization of this idea of making ~ t a function of the variances of factors which are portfolios constructed from the assets in x t is available in Engle et al. (1990b), where the model is termed Factor-GARCH. Defining fk, = "¢~xt as the kth portfolio, let ~r~kt be its conditional variance. Engle et al. assume that f ~ , = E Kk= ~k~ktrj~t , 2 + ~ , an obvious extension of the APT, and then allow each of the factor conditional variances to be GARCH(1,1), i.e., trek ' = ao~ + alk(~l,ket_ ,)2 + ~31ktrj~,_ ,. 57 ~ijt is replaced by the residuals ~itt~jt if there are variables w r determining x t. (64) A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 81 After substituting out (7fkt in the expression for f~,, one is left with K K ~ t = C + ~ 8kS'~a , k ( ~ e , _ ,)2 + E 13,kSkS'k'Ykf~,-,~/k" (65) k=l k=l In their terms this would be a "univariate portfolio representation" since cr~kt depends only upon its own lagged value and a combination (,/~e,_ 1)2 of the errors associated with its own portfolio. A "recursive portfolio representation" would allow some dependence across portfolios; as the name suggests, the conditional variances of portfolios are assumed to be ordered so that the dependence is upon information about other portfolios further down in the chain but not upon information above that point. Engle et al. apply the idea to excess returns on stocks and Treasury Bills with maturities ranging from one to twelve months. Principal component analysis of the data led them to the idea of having two portfolios as the factors - one comprising stocks, and the other a linear combination of Treasury Bills (equally weighted). The recursive portfolio representation in which stocks affect bills but not conversely seemed superior to the univariate one. In this work the portfolio weights are given and not estimated. Lin (1992) considers full maximum likelihood estimation with unknown weights. In all the above analysis, it was assumed that the object of interest was to model the complete conditional covariance matrix. Sometimes, however, one might only be interested in the diagonal elements (or at least be prepared to concentrate upon them alone). This occurs in the Engle et al. (1990a) study of the effects of news upon return volatility as it is transmitted around the world's stock markets. Thus, the vector of returns might be returns on the New York, Milan, London, and Tokyo markets. Here a subtle modification needs to be made to the specification of ARCH models to reflect the fact that, when New York opens, London and Milan have already been open for a number of hours. Hence, if the model is one of intra-day volatility, (72 for New York should be a function of not only past returns, but also current returns in London and Milan, i.e., one would have an ARCH specification for the ith market (out of n) as i-I n 0"2 = (72 "~- E ° t i j e j 2t + E ° t i j e j t 2 - 1 + ~ i i ( T i2, - l " j=l j=i (66) In addition, one might also be interested in the behavior of covariances, particularly given the belief that stock markets are now much more closely related than they were previously, but it would be a formidable task to allow these to vary as well. All of the above represents generalizations of the simplest GARCH processes to multivariate models. There have been few attempts to estimate more complex models. Braun et al. (1992) consider a bivariate EGARCH model when estimating betas for stocks (as described earlier) but that is a rare exception. Multivariate 82 A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 stochastic volatility models have been proposed and estimated. The extension to the multivariate case is conceptually quite easy since one is simply dealing with a VAR rather than an AR process in the conditional variance. Harvey and Shephard (1993) point out that the quasi-MLE can be applied easily as the Kalman filter is already formulated for multivariate problems. Danielsson (1993) extends the accelerated importance sampler to perform simulated MLE and applies the method to a bivariate system of exchange rates. Gallant et al. (1994) estimate a stochastic volatility trivariate system with an indirect estimator and an SNP multivariate density as the auxiliary model. 4.5. Are variances co-persistent? Many financial series possess the property of integration and are also co-integrated. Integration can be interpreted as that characteristic wherein shocks to a series are permanent, or in which a perturbation in the initial condition never dies out of the series. Thus, for the AR(1) in (4), OYJOYo = [3tl for any given sequence of shocks {uj}j= z. When 13~ = 1 any changes to the initial condition are permanently embedded in y~. In contrast, for an I(0) series, shocks are not persistent. Therefore, if a co-integrating error is formed by weighting each series by their parameter value in the cointegrating vector, this error is I(0) by definition, and shocks to it will not be persistent even though they are to the individual series from which it is constituted. Effectively, cointegration is a statement about the conditional means of variables. Hence it is not surprising that a similar distinction regarding the impact of shocks has been proposed by Bollerslev and Engle (1993) for variances. Thus, suppose that Yt was GARCH(1,1), i.e., its conditional variance tr f is that given in (13). Iterating that equation backward in time cr,2 = [H~.- ~([3, + e~e,2_j)](r 2 +(13~ + ~x,)'(x 0, (67) and this relation enables one to assess the impact of a change in (r 2 on cry. One difficulty is that the impact is stochastic and so there is no unique way of measuring it. A simple measure would be just to consider E(ao52/~(r02)= 2 _ t- 1 E[[I~-__ 11([31+otlet_j)]-[IIj=l(~31 +Otl)]. 58 When the process is I G A R C H , 13j + c~j= 1, and the shock is deemed persistent into (rt2, whereas if it is just G A R C H there is no such persistence. With the ideas relating to persistence in variance just stated, it is natural to define co-persistence in variance as occurring if individual series are I G A R C H but if there exists some linear combination of them which is GARCH. The terminology is a useful one, as there do seem to be instances in which this phenomenon 58 Bollerslev and Engle adopt a different definition involving the stochastic b e h a v i o r o f predictions o f a~2 as the prediction horizon lengthens, and the current defmition w a s chosen for its simplicity. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 83 occurs, e.g., spot and forward exchange rates may each have IGARCH in the variance, but the difference between them, the risk premium, may just display GARCH effects. It cannot be emphasized too strongly that there may be no relation between the vectors that co-integrate the levels of series and those that "co-integrate" the variances. Indeed it is quite possible for two financial series to be both 1(0), each featuring IGARCH variances, and yet for the difference (say) to display only GARCH. Moreover, although co-persistence in variance may be an interesting empirical phenomenon, it does not seem to have any connection with theoretical models in the same way that co-integration does. There we have the notion that the co-integrating vector reflects an equilibrium relation from static theory; this idea deriving from the belief that deviations from an equilibrium should be stationary. However, even if we had a model indicating an equilibrium relation between variances, the fact that t~f remains an I(0) process even if it is IGARCH means that no argument can be made in favor of the combined variances being GARCH. 5. Economic models of multivariate financial data In the previous section statistical models to describe financial data have been outlined. Ultimately, however, one seeks some economic explanation of the inter-relationships, and various models have been proposed for this task. In some instances, these involve considering what the first order conditions for optimizing consumers and producers would be either in a partial or a general equilibrium framework, followed by a derivation of the implications of those conditions for relationships between financial series. In others, the optimality conditions come from general arguments considering the relation between the price of an asset and the flow of income that is associated with it. It is the latter which is the focus of the next sub-section, and the former is the concern of the following one. Mostly, the literature in this area has been concerned with testing whether the conditions for inter-temporal optimization hold rather than trying to calibrate the underlying models, and that explains the emphasis given to this question in what follows. In those cases where there is interest in estimating the parameters of an underlying model, estimation has proceeded with one of the techniques outlined earlier, e.g., Chen and Scott (1993) apply maximum likelihood to estimate a multifactor model of the term structure. Perhaps the most interesting development has been the use of indirect estimation for this purpose - see Bansal et al. (1994). There are many models in economics that are easy to simulate but for which it is difficult to write down the likelihood, e.g., models of exchange rates with " b a n d s " , see Ball and Roma (1994), and these are obvious candidates for indirect estimation methods, using as auxiliary models the statistical structures described in the previous section. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 84 5.1. Inter-temporal relations from equilibrium conditions From discounted present value theory it would be expected that the beginning of period stock prices (Pt) would be related to dividends ( D t) by P, = E ~3JE,(Dt+j), j=0 (68) where 13 is an ex-ante constant real discount rate and E t indicates that the expectation is taken conditional on information available at time t (this does not include DI). Defining S t = Pt - 0D, = Y'.~=013J(Et(Dt+;) - Dr), where 0 = 1/1 13, S t will be I(0) if D t is an integrated process, and therefore (1 - 0) would be a co-integrating vector between P, and D , When tests for co-integration between Pt and D t a r e applied, one frequently sees acceptance of the null hypothesis of no-cointegration. Timmermann (1995) notes that the discount rate may vary over time, and therefore 0, = 0 + v, would be a random variable. The relationship between stock prices and dividends would then be Pt - mOt and this error would equal dO, lstD t. AS D, is an I(1) process the dOt will have a conditional (upon Dr) variance that depends upon D , i.e., there is a " l e v e l s " effect similar to that seen with yields. Timmermann constructs simulation experiments to show that the rejection of co-integration might simply stem from the fact that dOt has substantial persistence in it due to the documented fact that v t depends on the dividend/price ratio, which tends to be quite persistent. Earlier work by Shiller (1981) sought to determine the validity of (68) by considering its implications for sample moments. In particular, he defined an ex-post rational price Pt* as = P,* = Y'~ 13JDt+j. j=0 (69) Then 9C e,'= E 13Je,(o,+ ) + E 13 (o,+j- e,(o,+j)) j=O =P,+ (70) j=O ~ 13iV,+j, j=o (71) and, following from the fact that Et(V,+ ;) = O, E(PtV,+ ;) = 0, producing var(Pt* ) _ v a r ( P t ) . (72) Shiller took this variance bounds inequality and estimated both sides from sample A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 85 data. The LHS poses difficulties since it depends upon dividends an infinite T-1 j distance into the future, so he truncated this as Pt * * = Ej=0[3 Dt+ J + f3rPr, i.e., instead of Pt* being used to summarize future information he replaced it with the observed quantity Pr. He then discovered that the sample variance of Pt exceeded that of Pt* * An immediate problem with this test, noted by Kleidon (1986), was that Pt was very likely to be an integrated process so that the " v a r i a n c e " does not exist. Hence, a comparison such as (72) can only be done with Pt being a non-integrated process. This has led to other proposals for checking the validity of (68). One of these is to recognize that Pt and D t should be co-integrated with vector (1 - 0) (see the discussion in (a) above) so that if we formed o~ St* = Pt* - OD, = E fSJDt+j - OD, (73) j=o = Y'.[3J(Dt+j - D,), (74) St* will be an I(0) process when D t is integrated. Comparing this to S t = :¢ j Y'.j= 013 ( E t ( D t + j) -- Dr), shows that a prediction of the present value model is that var(s,* ) _> var(s,), (75) and a comparison can be made using the sample variances owing to the fact that S,* and S t are both 1(0). 59 However, even if these variables are I(1) there are a number of factors that make an empirical comparison of the population quantities in (75) much more clouded. Flavin (1983) pointed out that there are small sample biases in estimating variances if the series have substantial serial correlation. Indeed that must be the case for St* as it involves an infinite combination of AJDt+ j. Moreover, as Shea (1989) shows in some simulation experiments, the comparison can be a problem if AD, is close to being integrated: For such near integrated processes statistics such as sample variances frequently behave more like I(1) than I(0) series if the sample is small. Shea presents some Monte Carlo evidence to show that the sample variance of S, can easily exceed that of St* even if the present value model is correct. Actually, what Shea examines is the properties of the Mankiw et al. (1985) statistic which involves comparing var(Pt* pO) to var(P t - pO), where Pt° is a " n a i v e " forecast. However, as they set pO to 0 D t, the experiment is relevant. Another variant is LeRoy and Parke (1987) who compare the vat(St*) to the var(St), where St* = St*/D, and ~St = S t / / D t. _ 59 O f course St* is replaced by St* * but, as S t" * = St* + 13r (Pt - Pt* ), and P, - '°1" should be I(0) when the present value model holds, that substitution should not be of concern, although the actual computed sample variance can be different if we decide to compute the ex-post rational price using a terminal date that is within the sample. In the remaining discussion we focus on St* only. 86 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Because S,* = ( P , * / D , ) - 0 and St = ( S t / D , ) - O , 0 does need not to be estimated in their comparison. Generally, one has to conclude that these relative variance tests are very inefficient ways of examining the validity of a present value relation, and that direct testing of the model seems a more sensible strategy. Another variance bounds test that does not suffer from problems arising out of integrated processes has been developed by West (1988). He modifies (68) to Pt = E t , (76) where P, is now an end-of-period price rather than a beginning-of-period price. (76) implies that (77) Pt = ~ E t ( O t + l + P t + l ) , where the appropriate transversality condition is assumed satisfied. Now, suppose that the information set used in (76) is I t and that there is a sub-set of it designated G t. Constructing the hypothetical quantities xlt=E II, , x c , = E l G, , (78) he shows that, under the present value model, var(xc,- e(xo, Ja,_ ,)) _> var(x,,- e( x,, it,_ ,)). (79) He proposes that both sides of the inequality be estimated from sample data and that the present value model be rejected if the inequality fails to hold. The main issue is how to construct sample estimates of both variances. From inspection of (78) and (76), xtt = D, + P,, and so the RHS of (79) is val'(Dt .4- P, - E ( D t + P, [ I,_ l))" From (77) we can write e,= t3(D,+, +P,+,) - n,, (80) where "tit = fS(Dt+ i + P,+ 1 - E(D,+ l + P,+ l I 6)) has the property that E('q, [ I,) = 0. Lagging rl, once shows that its variance is the variance on the RHS of (79). Hence, it is simply a matter of determining the variance of -q, from (80). By definition E(rl, [ I t) = 0 . ' . E [ P t - ~3(D,+, + P t + , ) l l , ] =0 or E[(1 - f3( D,+, + P,+ ,) ) / P , r 1,] = 0 as I t includes Pt in the information set. (81) (82) A. Pagan/Journal o f Empirical Finance 3 (1996) 15-102 87 The method of moments estimator of [3 from (82) is just fi such that ]~(1- ~[(D,+ 1 -I'-Pt+I)/Pt])=O (83) or ~ = I / T - I E[(D,+ l + It+ 1)/Pt] . As the denominator is one plus the sample mean of {(D/+ 1 + APt+ l)/Pt}, this will give an estimated discount rate approximately equal to 1 minus the sample mean of (Dr+ l + APt+ ~)/Pt, i.e., one minus the mean of ex-post returns. 6o Since (Dt+ l + APt+ l ) / P t will be I(0), ~ should therefore be consistently estimated in this way. Thereupon the variance of "qt can be estimated as T - l E ( P , - ~(Dt+ l + Pt+ 1)) 2. To determine the LHS of (79) requires that the information set G t be specified and a model for the dividend process be set up. Suppose A D t = e t is white noise and G t contains Dr_ I. Then x ~ t = ( l - 1 3 ) -~ D t, E( xctl Gt_ l ) = ( 1 - [5) -1 Dr- 1 and the LHS is just var[(1 - 13)-l ( D t ~ Dt - 1)] = (1 - 13)-2 varAD t, from which an estimate can be constructed using 13 and the sample variance of A D t. Since both quantities involve estimating [3, var(ADt) and var(-qt) by method of moments, a standard error can be placed on the difference between the estimated var(xct - E ( x c t I Gt- l)) and var(xlt - E ( X l t I I t - 1)), a l t h o u g h this does not provide a solution to the problem of testing an inequality. With data like Shiller's on stocks and dividends the inequality was clearly rejected. Restricted V A R ' s have been used to test other aspects of models of financial data. For example, if we return to the question of the present value pricing model for stocks or the term structure of interest rates, we have observed that the relationship should exhibit a co-integrating vector. Concentrating on stocks and dividends, S t = Pt - ODt = ~=o[3JEtADt+j, and if we form a VAR from S t and A Dr, restrictions are implied upon its parameters. In particular, writing it as (56), where z't=(St ADt), the restrictions on a pth order V A R would be that dp+ 1 = e'113A(l- 13A) - I , where ej is a vector with unity in the jth position and zeros elsewhere. One can test these restrictions by doing a Wald test. A better way ¢ ¢ to test the restrictions would be to test ep+ 1(1 -- 13A) -~ ep+ 113A, as this is testing a set of linear rather than non-linear restrictions and Wald tests have better sampling properties when restrictions are close to linear - see Gregory and Veall (1985) and Phillips and Park (1988). Such tests have been used previously by Campbell and Shiller (1987) and Kearns (1990a) in studying the term structure relations. They find that the restrictions are generally rejected. 60 West suggests using other instruments, ta t , to estimate 13, but if they are to be instruments they must be uncorrelated with 1 - 13[(Dt+ t + Pr+ l)/P,], i.e., E[wt(l - 13[D1+ i +/>1+ t / P t ] ) ] = 0. Since there is a constant being used. w, can be assumed to have E(w t) = 0, so that this condition requires 13E[wr(Dt+2+ Pt+ i ) / P i ] = 0. However, this means there is no information in wl for !3 as the derivative of [1 - f~(Dt+ i + Pt+ l)/Pr] with respect to 13 has to be correlated with the instrument for the latter to be relevant. As discussed earlier in connection with the GMM estimator the use of such weak instruments can have deleterious effects upon the properties of the instrumental variables estimator. 88 A. Pagan/Journal o f Empirical Finance 3 (1996) 15-102 I f one o b s e r v e s rejection the q u e s t i o n o f what caused the rejection arises. O n e candidate is that there are speculative bubbles. 61 A second is that the approximations e m p l o y e d to get relations such as the expectations hypothesis R t = k - ~ E ~ S _ ~ E ~ r , + j , w h e r e R t and r t are r e s p e c t i v e l y the long and short term interest rates, are invalid; in particular they i n v o l v e the linearization around (1 + ,~)-~, where R is taken to be the m o n t h l y a v e r a g e yield on long bonds. W h e n yields f o l l o w an integrated process R will no l o n g e r c o n v e r g e to a constant so that the linearization o f the holding yield o f an n-period bond p a y i n g a c o u p o n C, C h~'= R t + 11 - C C+---:"~+R,,_)I.I~,~_I],,_ "'t+) " ' , + ' t . . . . ,+, j C l R-~+ R7 - C R711 +R~'] . -1 -- see Shiller (1979) - is being p e r f o r m e d around a stochastic quantity. If the process is close to b e i n g integrated_the s a m e p r o b l e m is likely to arise unless the sample size is v e r y large. E v e n if R can be taken as a constant the linearization w o u l d fail if b o n d yields d e v i a t e d substantially f r o m R. It has generally b e e n argued that the linearization is a g o o d approximation, but this is not sufficient for estimation purposes. In a regression, any a p p r o x i m a t i o n error in the conditional m e a n g o e s into the regression error and generally causes a correlation b e t w e e n regressors and errors, thereby i n d u c i n g inconsistencies into p a r a m e t e r estimators. In turn, these m a y translate into rejections o f hypotheses. I n d e e d this seems to be true here. Both Kearns (1990a) and S h e a (1992) simulate data f r o m m o d e l s in which the present v a l u e m o d e l holds but find substantial rejections o f it. W h e t h e r this is due to p o o r s a m p l i n g properties o f the test statistics or the linearization error remains to be determined. H o w e v e r , it s e e m s m o r e likely to be the latter. F o l l o w i n g the A b e l - M i s h k i n approach, C a m p b e l l and Shiller (1987, p. 1068) s h o w that the linear restrictions on the V A R can be tested by regressing z~ = ( A r t q- S t - - ( S t _ 1//~)) o n A Ft _ i a n d S , _ i, and testing if the effects o f these variables are zero. C o n s e q u e n t l y , the W a l d test e f f e c t i v e l y i n v o l v e s adding variables to a 6~ There is another literature that treats the presence of bubbles as arising from a failure of the transverSality condition l i m j ~ 13./ Et(PI+j+j) = 0 to hold, so that the solution to the forward difference equation underlying (68), Pt = 13Et(Pt+ i) + Dr, is not unique and can be augmented with any process satisfying Br+ 1= 13- J B, + ~t+ J, where E t ( ~ t ) 0. This results in a failure of co-integration between Pt and D t - see e.g., Hamilton and Whiteman (1985). In fact, if the bubble is an integrated process, the Wald test based on the VAR would not be a consistent test for a rational bubble as the test statistic can be shown to have a limiting distribution under the alternative and so will not reject the null with probability one. The main complication with this view is the difficulty of discriminating between a bubble and a mis-specification of the fundamentals process as the source of a lack of co-integration. Durlauf and Hall (1989) have made some progress in choosing between the two possibilities. = A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 89 linear regression, albeit with predetermined rather than strongly exogenous regressors, and therefore one would not expect such a test to exhibit the level of biases observed in the simulations (rejection rates of 25.3% when the nominal 10% significance levels are used). 6z 5.2. Intertemporal models with varying IMRS Perhaps the most distinctive feature o f financial economics has been the development of theories of asset prices that are capable of being tested with data. Of these the most famous would have to be the S h a r p e / L i n t n e r Capital Asset Pricing Model. Stochastic intertemporal asset pricing theories have been developed by a number of authors, and a particularly influential one was Breeden (1979). The central implication of these theories was that the first order conditions for an intertemporally optimizing consumer were E,[([3u'(C,÷ , ) / d ( C t ) ) R t ] = 1, (84) where u'(C t) is the marginal utility from consumption, R z is the gross return (Pt+ I + dt)//Pt, where Pt is the real price of the asset, d t are real dividends and E t denotes that the expectation is taken with respect to information at time t. This model is also sometimes referred to as the consumption - C A P M as it emphasizes that returns are a function of the covariance of the intertemporal rate of substitution (IMRS) - "stochastic discount factor" - ~t = fSu'(Ct+ t)/u'(Ct) with the return R t. 63 TO see this write E ( t ~ t g t) = c o v ( t ~ t g t) - E(Rt)E(~t), so that (84) implies E(R,) = c o v ( R t O , ) / / E ( ~ t ) , and it is the covariance with 4, rather than the market portfolio which is important for asset prices (although if one follows Campbell (1992) and log-linearizes the c o n s u m e r ' s budget constraint to find E,(A c,+ i) = P~,, + E,(r,,,t+ l), where c, = log(C/), it is clear that there is implicitly a relation with the market return, rm,t). Under a special form for the utility function it is possible to write out a specific set of restrictions that are implied by (84). F o r example, if u(C,) has the constant relative risk aversion ( C R A A ) form y - ~ (C~t - 1), then (84) becomes E,[~3(C,+i/Ct)~R,] where et = ~ / - = 1, (85) 1. Hansen and Singleton (1982) suggested that one estimate the 62 Of course if zt, St and Art were near integrated, such over-rejection would be predicted. 63 If 0t is a constant and d t = 0 one gets the implication that returns are unpredictable. 90 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 parameters of (85) by GMM, using as "instruments" z't = (1 w~), the information being conditioned upon in E,, leading to the moment restrictions E[13(Ct+JC,)~R,] - - 1 = 0 (86) E[13wt( C,+ J Ct) '~] = 0 , (87) where we can assume without loss of generality that E ( w t) = 0 due to the presence of unity in z,. Replacing the population mean in (86) by its sample value produces a method of moments estimator of 13 of [3 = 1 / ( T - I E ( c , + ./C,)aR,), where 8 is some estimator of a from (87), making [3 just the inverse of a sample mean. This estimation method has been applied to a wide variety of asset prices, both domestic and international. Because, at heart, it involves a G M M estimator, there are potential problems associated with it, which were outlined in an earlier section. As explained there, these stem from the possibility that w t is a poor instrument for avJO0, where v,= ~3(C,+I/Ct) '~ R t - 1 and 0 ' = (13 a ) , and this would be assessed by regressing Ov,/O0 against w,. A small value of the R 2 from this regression signals likely poor performance of the G M M estimator of 0. Unfortunately, Ov,/OO depends upon 0, leading to a circularity in the argument, in that accurate computation of the R 2 etc. depends upon a good estimate of 0, and this will be poor if the R 2 is low (the situation one is attempting to detect[). Nevertheless, it would seem useful to compute measures such as the R 2 using the point estimates of 0, as that should provide some insights into possible problems with the GMM estimator. Returning to the asset pricing model in (86) and (87), Hansen and Singleton (1982) fitted this with various instruments chosen from lags of consumption growth and gross returns. As an exercise, we set w t to unity, (Ct//ft_ 1) and (C t_ l/C,_2) and take R t to be the gross market return on equities. This produces G M M estimates for [3 and a of [~ = 0.998 and 8 = - 1.02 (assuming that v t is an MA(2)). Now Ov,/~[3 = (C,+ 1 / C , ) ~ ' R , and Ovt/Oa = 131og(C,+ JC,)(C,+ ,/C,)~'R,, and these derivatives (evaluated at the G M M estimates of [3 and oL) can be regressed against the instruments, resulting in R 2 of 0.007 and 0.13 respectively. These results demonstrate that one would need to be very careful when using the GMM estimates, although the very tight distribution of the squared instrument for 13 around its mean (the mean is 1.0015 and the standard deviation is 0.007) suggests that this parameter should be quite accurately estimated, a result in accord with existing simulation work - see Mao (1990). If the dim(w,) > dim(0) it is possible to test whether the moment conditions for inter-temporal optimization are satisfied using the J-test set out in Hansen (1982). Mostly, applications of this test have led to rejection of the model. There are a number of possible reasons for this. One is that the over-rejection is caused by the poor sampling distributions of the GMM estimator of 0, as this has been observed in some simulation work, e.g., Mao (1990). Another is that the specifications used for the utility function are incorrect, and this has led to a literature exploring A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 91 generalized forms, e.g., the introduction of inertia into the utility function as in Constantinides (1990). Instead of performing a specification test directly upon the data a popular way of assessing the quality of a model has been to utilize " H a n s e n - J a g a n n a t h a n bounds". Defining p as the correlation between R t and ~, we have 02 = = [e(e,,,) = [1-E(Rt)E( 2 2 2 if the asset pricing condition (84) holds. 64 Inverting this relation and recognizing that the maximum value of p2 is unity gives %2 >_ [1 - E ( R , ) E ( , , ) ] 2/cry, 2 (88) and this is the bound provided by Hansen and Jagannathan (1991) (they work with a vector of gross returns so that a matrix representation of the above is needed). Using the equality one can trace out a relation between E(~t) and the minimum value of cry. For any given candidate for ~t, e.g., ~ = [3(Ct+ 1 / C t ) ~, it is possible to derive estimates of E(~t) and cr,z and to see if the bounds are satisfied. One nice feature of this approach is that it enables one to perform a sensitivity analysis with respect to parameters such as c~, i.e., one asks what value ot would have to take in order to make the bound hold. If one does not wish to perform such a sensitivity analysis then it would be necessary to take account of the fact that the point estimates of the left and right hand sides of (88) are both random variables and one might therefore wish to attach some probability statement to whether the bounds are violated. Burnside (1994) and Cecchetti et al. (1994) do this using the method of deriving asymptotic distributions of non-linear functions of random variables. One might well ask what the advantage of using these bounds as a formal specification test is over the direct test of the over-identifying restrictions. Sometimes the latter may not be available as dim(w t) = dim(0) but, when it is, the simulation studies in Burnside (1994) suggest that it is the best test, although that it could over-reject in some cases as the true values of [3 and a changed, and this is most likely due to the changing quality of the instruments. 6. Conclusion A striking feature from this survey is how a purely statistical modeling approach has been dominant in the analysis of financial data, particularly when the 64 The conditional expectation has been replaced by an unconditional expectation using the law of iterated expectations. Gallant et al. (1990) use the conditional expectations, estimating these with SNP densities. 92 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 focus is a univariate one. One cannot help but feel that these statistical approaches to the modeling of financial series have possibly reached the limits of their usefulness. The models are having to be made increasingly complex so as to capture the nature of the conditional density of returns. Initially, it looked as if the ARCH class of processes would provide a simple explanation, but, as documented in these notes, it is becoming clear that this is not so. Ultimately one must pay more attention to whether simple economic models are capable of generating the complex behavior that is evident, i.e., is it possible to construct economic models that might be useful in explaining what is observed? To date progress in this area has been slow. Few models are capable of generating the type of ARCH one sees in the data. The same can be said about present value models. To find the type of dependence observed in returns one would need a very strong dependence in the dividends process (or endowments if one is looking at general equilibrium methods of generating asset prices), and the evidence on this is weak. Most of these studies are best summarized with the adage that "to get GARCH you need to begin with GARCH". Nevertheless, the search for models that will take as inputs moderate degrees of volatility and accentuate it has to be accorded a high priority if we are to make further progress in the modeling of this aspect of financial series. An interesting paper that does just this is Den Haan and Spear (1994) who manage to produce GARCH in interest rates by allowing for income distribution effects changing with the cycle. The situation with multivariate modeling differs in that theoretical models have informed decisions about how to link the mean behavior of variables, although the explanation of co-movements in volatility is still the province of statistical models. To some extent this emphasis comes from the motivation for much empirical work in finance; if it works and one can make money from it then the lack of a theoretical base is not accorded much importance. For many economists however it is desirable to be able to understand the phenomena one is witnessing, and this is generally best done through theoretical models. Of course this desire does not mean that the search for statistical models which fit the data should be abandoned. One of the nice features of the indirect estimation approach discussed in Section 3.2 is the emphasis placed upon the use of a statistical model that fits data well as a way of estimating the parameters of an underlying theoretical model. It is this interplay between statistics and theoretical work in economics which needs to become dominant in financial econometrics in the next decade. 7. O t h e r s o u r c e s Abel and Mishkin (1983, Chesney and Scott (1989, Clark (1973, Engle and Lee (1992, Fama (1976, French et al. (1987, Gallant and Tauchen (1990, Hansen and Singleton (1990, Kearns (1992, Nelson and Startz (1990, Newey and West (1987, Pagan (1975, Summers (1986, Tauchen (1986, Tauchen and Pitts (1983, Taylor (1990) and Watson (1964). A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 93 Acknowledgements I would like to thank Torben Andersen, Richard Baillie, Tim Bollerslev, Colin Cameron, Mardi Dungey, Frank Diebold, Phil Kearns, Walter Kramer, John Robertson, Allan Timmermann and anonymous referees for their comments on earlier versions of this survey. References Abel, A.B. and F.S. Mishkin, 1983, An integrated view of tests of rationality, market efficiency and short-run neutrality of monetary policy, Journal of Monetary Economics 11, 3-24. Albert, J. and S. Chib, 1993, Bayesian analysis via Gibbs sampling of autoregressive time series subject to mean and variance Markov shifts, Journal of Business and Economic Statistics l 1, 1-15. Andersen, T.G., 1992, Volatility, Working paper No. 144, Kellogg Graduate School of Business, Northwestern University. Andersen, T. and B.E. Sorensen, 1994a, GMM estimation of a stochastic volatility model: A Monte Carlo study, Working paper No. 94-6, Brown University. Anderson, T.G. and B.E. S0rensen, 1994b, A note on Ruiz (1994): Quasi-maximum likelihood estimation of stochastic volatility models, Working paper No. 189, Kellogg Graduate School of Management, Northwestern University. Baillie, R.T., 1989, Tests of rational expectations and market efficiency, Econometric Reviews 8, 151-186. Baillie, R.T., 1996, Long memory processes and fractional integration in economics and finance, Journal of Econometrics (forthcoming). Baillie, R. and T. Bollerslev, 1989, Common stochastic trends in a system of exchange rates, Journal of Finance 44, 167-182. Baillie, R. and T. Bollerslev, 1994, The long memory of the forward premium, Journal of International Money and Finance 13, 565-572. Baillie, R.T., T. Bollerslev and H.O. Mikkelson, 1996, Fractionally integrated autoregressive conditional heteroskedasticity, Journal of Econometrics (forthcoming). Baillie, R.T. and R.J. Myers, 1991, Bivariate GARCH estimation of the optimal commodity futures hedge, Journal of Applied Econometrics 6, 109-124. Ball, C.A. and A. Roma, 1994, Target zone modelling and estimation for European monetary system exchange rates, Journal of Empirical Finance 1,385-420. Bansal, R., A.R. Gallant, R. Hussey and G. Tauchen, 1994, Nonparametric estimation of structural models for high frequency market data, Journal of Econometrics (forthcoming). Bekaert, G. and R.J. Hodrick, 1992, Characterizing predictable components in excess returns on equity and foreign exchange markets, Journal of Finance 47, 467-509. Beveridge, S. and C.R. Nelson, 1981, A new approach to the decomposition of economic time series into permanent and transitory components with particular attention to measurement of the business cycle, Journal of Monetary Economics 7, 151-174. Black, F., 1976, Studies in stock volatility changes, Proceedings of the 1976 Meetings of the Business and Economics Statistics Section, American Statistical Association, 177-181. Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics 31, 307-327. Bollerslev, T., 1990, Modelling the coherence in short-run nominal exchange rates: A generalized ARCH model, Review of Economic Statistics 72, 498-505. Bollerslev, T. and R.F. Engle, 1993, Common persistence in conditional variances: Definition and representation, Econometrica 6 l, 167-186. 94 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Bollerslev, T. and R.J. Hodrick, 1992, Financial market efficiency tests, Working paper No. 132, Kellogg Graduate School of Management, Northwestern University. Bollerslev, T., R.Y. Chou and K.F. Kroner, 1992, ARCH modeling in finance: A review of the theory and empirical evidence, Journal of Econometrics 52, 5-59. Bollerslev, T., R.F. Engle and D.B. Nelson, 1994, ARCH models, in: R.F. Engle and D. McFadden (eds.), Handbook of Econometrics, Vol. 4 (North-Holland). Bollerslev, T.R., F. Engle and J. Wooldridge, 1988, A capital asset pricing model with time varying covariances, Journal of Political Economy 96, 116-131. Bossaerts, P. and P. Hillion, 1994, Local parametric analysis of hedging in discrete time (mimeo, Tilburg University). Bougerol, P. and N. Picard, 1992, Stationarity of GARCH processes and of some non-negative time series, Journal of Econometrics 52, 115-127. Braun, P.A., D.B. Nelson and A.M. Sunier, 1992, Good news, bad news, volatility and betas (mimeo, University of Chicago). Breeden, D.T., 1979, An intertemporal asset pricing model with stochastic consumption and investment opportunities, Journal of Financial Economics 7, 265-296. Brenner, R.J., R.H. Harjes and K. Kroner, 1994, Another look at alternative models of the short-term interest rate (mimeo, University of Arizona). Brock, W., W.D. Dechert and J.A. Scheinkman, 1987, A test for independence based on the correlation dimension (mimeo, University of Wisconsin, Madison). Brockwell, P.J. and R.A. Davis, 1991, Time series: Theory and methods, 2nd ed. (Springer-Verlag, New York). Broze, L., O. Scaillet and J.M. Zakoian, 1993, Testing for continuous time models of the short-term interest rates, Discussion paper No. 9331, CORE. Burnside, C., 1994, Hansen-Jagannathan bounds as classical tests of asset pricing models, Journal of Business and Economic Statistics 12, 57-79. Cai, J., 1994, A Markov model of unconditional variance in ARCH, Journal of Business and Economic Statistics 12, 309-316. Cameron, A.C. and P.K. Trivedi, 1993, Tests of independence in parametric models with applications and illustrations, Journal of Business and Economic Statistics 11, 29-43. Campbell, J.Y., 1992, Intertemporal asset pricing without consumption data (mimeo, Princeton University). Campbell, J. and R.J. Shiller, 1987, Cointegration and tests of present value models, Journal of Political Economy 95, 1062-1088. Cecchetti, S.G., P-S Lam and N.C. Mark, 1994, Testing volatility restrictions on intertemporal marginal rates of substitution implied by Euler equations and asset returns, Journal of Finance, XLIX, 123-152. Chan, K.C., G.A. Karolyi, F.A. Longstaff and A.B. Sanders, 1992, An empirical comparison of alternative models of the short-term interest rate, Journal of Finance, XLVII, 1209-1227. Chen, R.R. and L. Scott, 1993, Maximum likelihood estimation for a multifactor equilibrium model of the term structure of interest rates, Journal of Fixed Income 3, 14-31. Chesney, M. and L.O. Scott, 1989, Pricing European currency options: A comparison of the modified Black-Scholes model and a random variance model, Journal of Financial and Quantitative Analysis 24, 267-284. Chib, S. and E. Greenberg, 1994a, Markov chain Monte Carlo simulation methods in econometrics (mimeo, Washington University). Chib, S. and E. Greenberg, 1994b, Understanding the Metropolis-Hastings algorithm (mimeo, Washington University). Christie, A.A., 1982, The stochastic behaviour of common stock variances: Value, leverage and interest rate effects, Journal of Financial Economics 10, 407-432. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 95 Christiano, L.J. and M. Eichenbaum, 1990, Unit roots in GNP: Do we "know and do we care?, Carnegie-Rochester series on public policy, 32, Spring, 7-61. Chu, C-S.J., 1995, Detecting parameter shift in GARCH models, Econometric Reviews 14, 241-266. Clark, P.K., 1973, A subordinated stochastic process model with finite variance for speculative prices, Econometrica 41, 135-155. Cochrane, J.H., 1988, How big is the random walk in GNP, Journal of Political Economy 96, 893-920. Constantinides, G.M., 1990, Habit formation: A resolution of the equity premium puzzle, Journal of Political Economy 98, 519-543. Cox, J., J. Ingersoll and S. Ross, 1985, A theory of the term structure of interest rates, Econometrica 53,385-407. Crowder, W.J., 1994, Foreign exchange market efficiency and common stochastic trends, Journal of International Money and Finance 13, 551-564. Danielsson, J., 1993, Multivariate stochastic volatility (mimeo, University of Iceland). Danielsson, J., 1994, Stochastic volatility in asset prices: Estimation with simulated maximum likelihood, Journal of Econometrics 64, 375-400. Danielsson, J. and J.F. Richard, 1993, Accelerated Gaussian importance sampler with application to dynamic latent variable models, Journal of Applied Econometrics 8, S153-S174. Das, S.R., 1993, Jump-hunting interest rates (mimeo, New York University). De Jong and N. Shephard, 1993, Efficient sampling from the smoothing density in time series models (mimeo, Nuffield College). De Lima, P. and N. Crato, 1994, Long range dependence in the conditional variance of stock retums, Economic Letters 45, 281-285. De Lima, P. and F.J. Breidt and N. Crato, 1994, Modelling long-memory stochastic volatility (mimeo, Johns Hopkins University). Den Haan, W.J. and S. Spear, 1994, Credit conditions, cross-sectional dispersion and ARCH effects in a dynamic equilibrium model (mimeo, University of California, San Diego). Den Haan, W.J. and A. Levin, 1994, Inferences from parametric and non-parametric covariance matrix estimation procedures (mimeo, University of California, San Diego). Diebold, F.X., 1986, Modeling the persistence of conditional variances: A comment, Econometric Reviews 5, 51-56. Diebold, F.X., 1988, Empirical modeling of exchange rate dynamics, (Springer-Verlag, New York). Diebold, F.X. and J.A. Lopez, 1995, ARCH models, in: K. Hoover (ed.), Macroeconometrics: Developments, tensions and prospects (Kluwer Publishing Co). Diebold, F.X. and M. Nerlove, 1989, The dynamics of exchange rate volatility: A multivariate latent-factor ARCH model, Journal of Applied Econometrics 4, 1-22. Diebold, F.X. and J. Nason, 1990, Nonparametric exchange rate prediction, Journal of International Economics 28,315-322. Diebold, F.X. and G.D. Rudebusch, 1989, Long memory and persistence in aggregate output, Journal of Monetary Economics 24, 189-209. Diebold, F.X. and T. Schuermann, 1996, Exact maximum likelihood estimation of observation-driven econometric models, in: R.S. Mariano, M. Weeks and T. Schuermarm (eds.), Simulation-based inference in econometrics: Methods and applications, (Cambridge University Press, Cambridge) (forthcoming). Ding, Z., C.W.J. Granger and R.F. Engle, 1993, A long memory property of stock market returns and a new model, Journal of Empirical Finance 1, 83-105. Drost, F.C., and C.A.J. Klassen, 1994, Adaptivity in semiparametric GARCH models (mimeo, University of Tilburg). Drost, F.C. and T.E. Nijman, 1993, Temporal aggregation of GARCH processes, Econometrica 61, 909-927. Duffle, D. and R. Kay, 1993, A yield-factor model of interest rates (mimeo, Graduate School of Business, Stanford University). 96 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Duffle, P. and K.J. Singleton, 1993, Simulated moments estimation of Markov models of asset prices, Econometrica 61,929-952. Dufour, J.M. and M.L. King, 1991, Optimal invariant tests for the autocorrelation coefficient in linear regressions with stationary or non-stationary AR(1) errors, Journal of Econometrics 47, 115-143. Durbin, J., 1959, Efficient estimation of parameters in moving-average models, Biometrika 46, 306-316. Durlauf, S.N. and R.E. Hall, 1989, A signal extraction approach to recovering noise in expectations based models (mimeo, Stanford University). Dybvig, P.H., 1989, Bonds and bond option pricing based on the current term structure, Working paper, Washington University (St. Louis). Elliott, G., T.J. Rothenberg and J.H. Stock, 1992, Efficient tests for an autoregressive unit root (mimeo, Harvard University). Engel, C. and J.D. Hamilton, 1990, Long swings in the exchange rate: Are they in the data and do markets "know it, American Economic Review 80, 689-713. Engle, R.F., 1982, Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation, Econometrica 50, 987-1008. Engle, R.F. and T. Bollerslev, 1986, Modelling the persistence of conditional variances, Econometric Reviews 5, 1-50. Engle, R.F. and G. Gonzalez-Rivera, 1991, Semiparametric ARCH models, Journal of Business and Economic Statistics 9, 345-360. Engle, R.F. and K.F. Kroner, 1995, Multivariate simultaneous generalized GARCH, Econometric Theory I 1, 122-150. Engle, R.F. and G.G.J. Lee, 1992, A permanent and transitory component model of stock return volatility (mimeo, University of California at San Diego). Engle, R.F. and G.G.J. Lee, 1994, Estimating diffusion models of stochastic volatility (mimeo, University of California at San Diego). Engle, R.F. and V.K. Ng, 1993, Measuring and testing the impact of news on volatility, Journal of Finance 48, 1749-1778. Engle, R.F., T. Ito and W.L. Lin, 1990a, Meteor showers or heat waves? Heteroskedastic intra-daily volatility in the foreign exchange market, Econometrica 58 525-542. Engle, R.F., D.M. Lillien and R.P. Robins, 1987, Estimating time varying risk premia in the term structure, Econometrica 55, 391-407. Engle, R.F., V.K. Ng and M. Rothchild, 1990b, Asset pricing with a factor-ARCH covariance structure: Empirical estimates for treasury bills, Journal of Econometrics 45, 213-237. Evans, M.D.D. and K.L. Lewis, 1994, Do stationary risk premia explain it all? Evidence from the term structure, Journal of Monetary Economics 33, 285-318. Fama, E.F., 1976, Foundations of Finance (Basic Books, New York). Fama, E.F. and K.R. French, 1988a, Permanent and temporary components of stock prices, Journal of Political Economy 96, 246-273. Fama, E.F. and K.R. French, 1988b, Dividend yields and expected stock returns, Journal of Political Economy 96, 246-273. Fan, J., N.E. Heckman and M.P. Wand, 1994, Local polynomial kernel regression for generalized linear models and quasi-likelihood functions, Annals of Statistics 22, 1346-1370. Feinstone, L.J., 1987, Minute by minute: Efficiency, normality and randomness in intra-daily asset prices, Journal of Applied Econometrics 2, 193-214. Fisher, M., D. Nychka and D. Zervos, 1994, Fitting the term structure of interest rates with smoothing splines (mimeo, Board of Governors of the Federal Reserve System). Flavin, M., 1983, Excess volatility in the financial markets: A reassessment of the empirical evidence, Journal of Political Economy 91, 89-111. French, K.R., G.W. Schwert and R. Stambaugh, 1987, Expected stock returns and volatility, Journal of Financial Economics 19, 3-30. A. Pagan~Journal of Empirical Finance 3 (1996) 15-102 97 Friedman, B.M. and D.I. Laibson, 1989, Economic implications of extraordinary movements in stock prices, Brookings Papers on Economic Activity, 2/89, 137-189. Fuhrer, J.G. Moore and S. Schuh, 1995, Maximum likelihood versus generalized method of moments estimation of the linear-quadratic inventory model, Journal of Monetary Economics 35, 115-157. Gallant, R., 1981, On the bias in flexible functional forms and an essentially unbiased form: The Fourier flexible form, Journal of Econometrics 15, 211-244. Gallant, A.R. and G. Tauchen, 1990, A nonparametric approach to nonlinear time series analysis: Estimation and simulation, IMA Volumes in Mathematics and its Application (Springer-Verlag). Gallant, A.R. and G. Tauchen, 1992, Which moments to match? (mimeo, Duke University). Gallant, A.R., L.P. Hansen and G. Tauchen, 1990, Using conditional moments of asset payoffs to infer the volatility of intertemporal marginal rates of substitution, Journal of Econometrics 45, 141-179. Gallant, A.R., D.A. Hsieh and G. Tauchen, 1991, On fitting a recalcitrant series: The pound/dollar exchange rate, 1974-83, in: W.A. Barnett, J. Powell and G.E. Tauchen (eds.), Nonparametric and semiparametric methods in econometrics and statistics (Cambridge University Press, Cambridge). Gallant, A.R., D. Hsieh and G. Tauchen, 1994, Estimation of stochastic volatility models with diagnostics (mimeo, Duke University). Gallant, A.R., J. Rossi and G. Tauchen, 1992, Stock prices and volumes, Review of Financial Studies 5, 199-242. Gennotte, G. and T.A. Marsh, 1992, Variations in economic uncertainty and risk premiums on capital markets (mimeo, University of California at Berkeley). Geweke, J., 1994, Bayesian comparison of econometric models (mimeo, University of Minnesota). Ghose, D. and K.F. Kroner, 1994, The relationship between GARCH and symmetric stable processes: Finding the source of fat tails in financial data, Journal of Empirical Finance (forthcoming). Glosten, L.R., R. Jagannathan and D. Runkle, 1993, On the relation between the expected value and the volatility of the nominal excess return on stocks, Journal of Finance 48, 1779-1802. Gourirroux, C. and A. Monfort, 1992, Qualitative threshold ARCH models, Journal of Econometrics 52, 159-199. Gourirroux, C., A. Monfort and E. Renault, 1993, Indirect inference, Journal of Applied Econometrics 8, $85-S118. Gourirroux, C. and O. Scaillet, 1994, Estimation of the term structure from bond data, Working paper No. 9415, CEPREMAP. Gregory, A. and M. Veall, 1985, Formulating tests of non-linear restrictions, Econometrica 53, 1465-1468. Hamilton, J.D., 1988, Rational-expectations econometric analysis of changes in regime: An investigation of the term structure of interest rates, Journal of Economic Dynamics and Control 12, 385-423. Hamilton, J.D., 1989, A new approach to the economic analysis of nonstationary time series and the business cycl e, Econometrica 57, 357-384. Hamilton, J.D., 1990, Analysis of time series subject to changes in regime, Journal of Econometrics 45, 39-70. Hamilton, J.D. and R. Susmel, 1994, Autoregressive conditional heteroskedasticity and changes in regime, Journal of Econometrics 64, 307-333. Hamilton, J.D. and C.H. Whiteman, 1985, The observable implications of self-fulfilling expectations, Journal of Monetary Economics 16, 353-373. Hansen, B.E., 1990, Lagrange multiplier tests for parameter instability in non-linear models (mimeo, University of Rochester). Hansen, B.E., 1994, Autoregressive conditional density estimation, International Economic Review 35, 705-730. Hansen, L.P., 1982, Large sample properties of generalized method of moments estimators, Econometrica 50, 1029-1054. 98 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Hansen, L.P. and R.J. Hodrick, 1980, Forward exchange rates as optimal predictors of future spot rates: An econometric investigation, Journal of Political Economy 88, 829-853. Hansen, L.P. and R. Jagannathan, 1991, Implications of security market data for models of dynamic economies, Journal of Political Economy 99, 225-262. Hansen, L.P. and K.J. Singleton, 1982, Generalized instrumental variables estimation of non-linear rational expectations models, Econometrica 50, 1269-1285. Hansen, L.P. and K.J. Singleton, 1990, Efficient estimation of linear asset pricing models with moving-average errors, NBER Technical paper No. 86. HSrdle, W., 1990, Applied non-parametric regression (Cambridge University Press, Cambridge). Harrison, P., 1994, Are all financial time-series alike (mimeo, Duke University), Harvey, A.C. and N. Shephard, 1993, The econometrics of stochastic volatility (mimeo, London School of Economics). Harvey, A., E. Ruiz and E. Sentana, 1992, Unobserved component time series models with ARCH disturbances, Journal of Econometrics 52, 129-157. Harvey, A., E. Ruiz and N. Shephard, 1994, Multivariate stochastic variance models, Review of Economic Studies 61,247-264. Harvey, C., 1991, The world price of covariance risk, Journal of Finance 46, I I 1-157. Hejazi, W., 1994, Are term premia stationary? (mimeo, University of Toronto). Hentschel, L., 1991, The absolute value GARCH model and the volatility of U.S. stock returns (mimeo, Princeton University). Hentschel, L., 1994, All in the family: Nesting asymmetric GARCH models (paper given to the Econometric Society Winter Meeting, Washington DC). Higgins, M.L. and A.K. Bera, 1992, A class of nonlinear ARCH models: Properties, testing and applications, International Economic Review 33, 137-158. Hill, B.M., 1975, A simple general approach to inference about the tail of a distribution, Annals of Statistics 3, 1163-1174. Hodrick, R.J., 1991, Dividend yields and expected stock returns: Alternative procedures for inference and measurement, Finance Department working paper No. 88, Northwestern University. Hols, M.C.A.B. and C. de Vries, 1991, The limiting distribution of extremal exchange rate changes, Journal of Applied Econometrics 6, 287-302. Hsieh, D.A., 1989a, Modeling heteroskedasticity in daily foreign exchange rates, Journal of Business and Economic Statistics 7, 307-317. Hsieh, D.A., 1989b, Testing for nonlinear dependence in daily foreign exchange rates, Journal of Business 62, 339-368. Huang, R.D. and C.S.Y. Lin, 1990, An analysis of nonlinearities in term premiums and forward rates (mimeo, Vanderbilt University). Hutchinson, J.M., A.W. Lo and T. Poggio, 1994, A nonparametric approach to pricing and hedging derivative securities via learning networks, Journal of Finance, XLIX, 851-889. Jacquier, E., N.G. Polson and P.E. Rossi, 1994, Bayesian analysis of stochastic volatility models, Journal of Business and Economic Statistics 12, 57-80. Johansen, S., 1988, Statistical analysis of cointegration vectors, Journal of Economic Dynamics and Control 12, 231-254. Jorion, P., 1988, On jump processes in the foreign exchange and stock markets, Review of Financial Studies 1,427-445. Kasa, K., 1992, Common stochastic trends in international stock markets, Journal of Monetary Economics 29, 95-124. Kearns, P., 1990a, Testing speculative bubbles in interest rates with the present value model (mimeo, University of Rochester). Kearns, P., 1990b, Non-linearities in the term structure (mimeo, University of Rochester). Kearns, P., 1992, Pricing interest rate derivative securities when volatility is stochastic (mimeo, University of Rochester). A. Pagan / Journal of Empirical Finance 3 (1996) 15-102 99 Keams, P., 1993, Volatility and the pricing of interest rate derivative claims (unpublished Ph.D. thesis, University of Rochester). Keams, P. and A.R. Pagan, 1992, Estimating the density tail index for financial series (mimeo, Australian National University). Keams, P. and A.R. Pagan, 1993, Australian stock market volatility: 1875-1987, Economic Record 69, 163-178. Kim, S. and N. Shephard, 1994, Stochastic volatility: Likelihood inference and comparison with ARCH models (mimeo, Nuffield College, Oxford). King, M.L., 1985, A point optimal test for autoregressive disturbances, Journal of Econometrics 27, 21-37. Kleidon, A.W., 1986, Variance bounds tests and stock price valuation models, Journal of Political Economy 96, 953-1001. Kloeden, P. and E. Platten, 1992, The numerical solution of stochastic differential equations (SpringerVerlag). Knez, P., R. Litterman and J. Scheinkman, 1989, Explorations into factors explaining money market returns, Discussion paper No. 6, Goldman Sachs and Co. Kochedakota, N., 1990, On tests of representative consumer asset pricing models, Journal of Monetary Economics 26, 285-304. Koedijk, K.G., F.G3.A. Nissen, P.C. Schotman and C.P. Wolff, 1993, The dynamics of short-term interest rate volatility reconsidered (mimeo, Limburg Institute of Financial Economics). Koop, G., 1994, An objective Bayesian analysis of common stochastic trends in international stock prices and exchange rates, Journal of Empirical Finance l, 343-364. Kwiatowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin, 1992, Testing the null hypothesis of stationarity against the alternative of a unit root, Journal of Econometrics 54, 159-178. Lamoureux, C.G. and W.D. Lastrapes, 1990, Persistence in variance, structural change and the GARCH model, Journal of Business and Economic Statistics 8, 225-234. Lee, J. and M.L. King, 1993, A locally most mean powerful based score test for ARCH and GARCH regression disturbances, Journal of Business and Economic Statistics 1l, 17-27. Lee, S.W., 1992, Asymptotic properties of the maximum likelihood estimator of the GARCH-M and IGARCH-M models (mimeo, University of Rochester). Lee, S.W. and B.E. Hansen, 1991, Asymptotic properties of the maximum likelihood estimator and test of the stability of parameters of the GARCH and IGARCH models (mimeo, University of Rochester). Lee, S.W. and B.E. Hansen, 1994, Asymptotic theory for the GARCH (1,1) quasi maximum likelihood estimator, Econometric Theory 10, 29-52. LeRoy, S. and W.R. Parke, 1987, Stock price volatility: A test based on the geometric random walk (mimeo, University of California at Santa Barbara). Lin, W.L., 1992~ Alternative estimators for factor GARCH models - A Monte Carlo comparison, Journal of Applied Econometrics 7, 259-279. Linton, O., 1992, The shape of the risk premium: Evidence from a semi-parametric, mean-exponential GARCH Model (mimeo, Nuffield College, Oxford). Linton, O., 1993, Adaptive estimation in ARCH models, Econometric Theory 9, 539-569. Lo, A.W., 1991, Long-term memory in stock market prices, Econometrica 59, 1279-1313. Lo, A.W. and A.C. MacKinlay, 1989, The size and power of the variance ratio test in finite samples: A Monte Carlo investigation, Journal of Econometrics 40, 203-238. Loretan, M. and P.C.B. Phillips, 1994, Testing the covariance stationarity of heavy-tailed time series: An overview of the theory with applications to several financial data sets, Journal of Empirical Finance 1, 211-248. Lumsdaine, R.L., 1991, Asymptotic properties of the maximum likelihood estimator in GARCH (l,1) and IGARCH (1,1) models (mimeo, Princeton University). 100 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Lye, J.N. and V.L. Martin, 1991, Modelling and testing stationary exchange rate distributions: The case of the generalized Student's t distribution (mimeo, University of Melbourne). McCulloch, J.H., 1971, Measuring the term structure of interest rates, Journal of Business 44, 19-31. McCulloch, J.H., 1989, U.S. term structure data, 1946-1987, Handbook of monetary economics, 672-715. McDonald, J. and J. Darroch, 1983, Consistent estimation of equations with composite moving average disturbance terms, Journal of Econometrics 23, 253-267. Mahieu, R. and P. Schotman, 1994a, Stochastic volatility and the distribution of exchange rate news, (mimeo, University of Limburg, Maastricbt). Mahieu, R. and P. Schotman, 1994b, Neglected common factors in exchange rate volatility, Journal of Empirical Finance 1, 279-31 I. Mandlebrot, B., 1963, The variation of certain speculative prices, Journal of Business 36, 394-419. Mankiw, N.G., D. Romer and M.D. Shapiro, 1985, An unbiased re-examination of stock market volatility, Journal of Finance 40, 677-687. Mao, C.S., 1990, Hypothesis testing and finite sample properties of generalized method of moments estimators: A Monte Carlo study (mimeo, Federal Reserve Bank of Richmond). Melino, A. and S. Tumbull, 1990, Pricing foreign currency options with stochastic volatility, Journal of Econometrics 45, 239-265. Merton, R., 1973, An intertemporal capital asset pricing model, Econometrica 41,867-888. Mihlstein, G.N., 1974, Approximate integration of stochastic differential equations, Theory of Probability and its Applications 19, 557-562. Nelson, C.R. and M.J. Kim, 1993, Predictable stock returns: The role of small sample bias, Journal of Finance 48, 641-661. Nelson, C.R. and R. Startz, 1990, The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one, Journal of Business 63, S 125-1 40. Nelson, D.B., 1988, The time series behaviour of stock market volatility and returns (unpublished Ph.D. thesis, M.I.T.). Nelson, D.B., 1989, Modeling stock market volatility changes, Proceedings of the American Statistical Association, Business and Economic Statistics Section. Nelson, D.B., 1990a, Stationarity and persistence in the GARCH (1,1) model, Econometric Theory 6, 318-334. Nelson, D.B., 1990b, A note on the normalized residuals from ARCH and stochastic volatility models (mimeo, University of Chicago). Nelson, D.B., 1990c, ARCH models as diffusion approximations, Journal of Econometrics 45, 7-38. Nelson, D.B. 1991, Conditional heteroskedasticity in asset returns: A new approach, Econometrica 59, 347-370. Nelson, D.B. and D.P. Foster, 1992, Filtering and forecasting with misspecified ARCH models II: Making the right forecast with the wrong model (mimeo, Graduate School of Business, University of Chicago). Newey, W.K., 1985, Maximum likelihood specification testing and conditional moment tests, Econometrica 53, 1047-1070. Newey, W.K. and D.G. Steigerwald, 1996, Consistency of the quasi-MLE for models with conditional heteroskedasticity, Econometric Theory (forthcoming). Newey, W.K. and K.D. West, 1987, A simple positive semi-definite heteroskedasticity and autocorrelation consistent covariance matrix, Econometrica 55, 703-708. Nieuwland, F.G.M.C., W.F.C. Verschoor and C.C.P. Wolff, 1994, Stochastic trends and jumps in EMS exchange rates, Journal of International Money and Finance 13, 699-727. Nyblom, J., 1989, Testing the constancy of parameters over time, Journal of the American Statistical Association 84, 223-230. Pagan, A.R., 1975, A note on the extraction of components from time series, Econometrica 43, 163-168. A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 101 Pagan, A.R. and Y.S. Hong, 1991, Non-parametric estimation and the risk premium, in: W. Barnett, J. Powell and G. Tauchen (eds.), Semiparametric and nonparametric methods in econometrics and statistics (Cambridge University Press, Cambridge). Pagan, A.R. and Y. Jung, 1993, Understanding the failure of some instrumental variables estimators (mimeo, Australian National University). Pagan, A.R. and H. Sabau, 1992, Consistency tests for heteroskedasticity and risk models, Estudios Econ6micos 7, 3-30. Pagan, A.R. and G.W. Schwert, 1990a, Alternative models for conditional stock volatility, Journal of Econometrics 45, 267-290. Pagan, A.R. and G.W. Schwert, 1990b, Testing for covariance stationarity in stock market data, Economics Letters 3, 165-170. Pagan, A.R. and A. Ullah, 1988, The econometric analysis of models with risk terms, Journal of Applied Econometrics 3, 87-105. Pagan, A.R. and A. Ullah, 1995, Non-parametric econometrics (unpublished manuscript, Australian National University). Pagan, A.R. and F.X. Vella, 1989, Diagnostic tests for models based on individual data, Journal of Applied Econometrics 45, 429-452. Pagan, A.R., V. Martin and A.D. Hall, 1995, Modelling the term structure, Working paper in Economics and Econometrics No. 284, Australian National University. Pearson, N.D. and T.-S. Sun, 1994, Exploiting the conditional density in estimating the term structure: An application to the Cox, Ingersoll and Ross model, Journal of Finance, XLIX, 1279-1304. Pesaran, M.H. and B. Pesaran, 1993, A simulation approach to the problem of computing Cox's statistic for testing non-nested models, Journal of Econometrics 57, 377-392. Phillips, P.C.B. and J.Y. Park, 1988, On the formulation of Wald tests of nonlinear restrictions, Econometrica 56, 1065-1083. Pindyck, R.S., 1984, Risk, inflation and the stock market, American Economic Review 74, 335-351. Rich, R., J. Raymond and J.S. Buffer, 1991, Generalized instrumental variables estimation of autoregressive conditional heteroskedastic models, Economics Letters 35, 179-185. Richardson, M. and J.H. Stock, 1989, Drawing inferences from statistics based on multiyear asset returns, Journal of Financial Economics 25, 323-348. Rosenberg, B., 1985, Prediction of common stock betas, Journal of Portfolio Management 11, 5-14. Ross, S.A., 1976, The arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341-360. Ruiz, E., 1994, Quasi-maximum likelihood estimation of stochastic volatility models, Journal of Econometrics 63, 289-306. Scheinkman, J. and B. LeBaron, 1989, Nonlinear dynamics and stock returns, Journal of Business 62, 311-337. Schotman, P. and H.K. van Dijk, 1991, ~, Bayesian analysis of the unit root in real exchange rates, Journal of Econometrics 49, 195-238. Schwarz, G., 1978, Estimating the dimension of a model, Annals of Statistics 6, 461-464. Schwert, G.W., 1990, Indexes of stock prices from 1802 to 1897, Journal of Business 63, 399-426. Schwert, G.W. and P.J. Seguin, 1990, Heteroskedasticity in stock returns, Journal of Finance 45, 1129-1155. Sentana, E., 1991, Quadratic ARCH models: A potential re-interpretation of ARCH models (mimeo, London School of Economics). Shea, G.S., 1989, Testing stock market efficiency with volatility statistics: Some exact finite sample results (mimeo, Pennsylvania State University). Shea, G.S., 1992, Benchmarking the expectations hypothesis of the interest rate term structure: An analysis of cointegration vectors, Journal of Business and Economic Statistics 10, 347-366. Shephard, H., 1994, Partial non-Gaussian state space, Biometrika 81, 115-131. 102 A. Pagan/Journal of Empirical Finance 3 (1996) 15-102 Shiller, R.J., 1979, The volatility of long-term interest rates and expectations models of the term structure, Journal of Political Economy 87, 1190-1219. Shiller, R.J., 1981, Do stock prices move too much to be justified by subsequent changes in dividends?, American Economic Review 71,421-436. Silverman, B.W., 1986, Density estimation for statistics and data analysis (Chapman Hall, New York). Sola, M. and A. Timmermann, 1994, Fitting the moments: A comparison of ARCH and regime switching models for daily stock returns (mimeo, Birkbeck College). Sowell, F., 1992, Maximum likelihood estimation of stationary univariate fractionally integrated time series models, Journal of Econometrics 53, 165-188. Spanos, A., 1986, Statistical foundations of econometric modelling (Cambridge University Press, Cambridge). Staiger, D. and J.H. Stock, 1993, Instrumental variables regression with weak instruments, NBER Technical working paper No. 151. Steigerwald, D.G., 1992, Semi-parametric estimation in financial models with time varying variances (mimeo, University of California, Santa-Barbara). Stock, J.H. and M.W. Watson, 1988, Testing for common trends, Journal of the American Statistical Association 83, 1097-1107. Summers, L.H., 1986, Does the stock market rationally reflect fundamental values?, Journal of Finance 41,591-601. Tanaka, K., 1990, Testing for a moving average unit root, Econometric Theory 6, 445-458. Tauchen, G., 1985, Diagnostic testing and evaluation of maximum likelihood models, Journal of Econometrics 30, 415-443. Tauchen, G., 1986, Statistical properties of generalized method-of-moments estimators of structural parameters obtained from financial market data, Journal of Business and Economic Statistics 4, 397-425. Tanchen, G. and M. Pitts, 1983, The price variability-volume relationship on speculative markets, Econometrica 51,485-505. Taylor, S.J., 1986, Modelling financial time series (John Wiley, Chichester). Taylor, S.J., 1990, Modelling stochastic volatility (mimeo, University of Lancaster). Timmermann, A., 1995, Cointegration tests of present value models with a time-varying discount factor, Journal of Applied Econometrics 10. Tsay, R., 1986, Nonlinearity tests for time series, Biometrika 73, 461-466. Vahid, F. and R. Engle, 1993, Common trends and common cycles, Journal of Applied Econometrics 8, 341-360. Vasicek, O., 1977, An equilibrium characterization of the term structure, Journal of Financial Economics 5, 177-188. Watson, G.S., 1964, Smooth regression analysis, Sankhya, Series A, 26, 359-372. West, K.D., 1988, Dividend innovations and stock price volatility, Econometrica 56, 37-61. West, K.D., H.J. Edison and D. Cho, 1993, A utility-based comparison of some models of exchange rate volatility, Journal of International Economics 35, 23-45. White, H., 1980, A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity, Econometrica 48, 817-838. Wu, P., 1992, Testing fractionally integrated time series, Working paper, Victoria University of Wellington. Zakoian, J.M. 1994, Threshold heteroskedastic models, Journal of Economic Dynamics and Control 18, 931-955. View publication stats