See discussions, stats, and author profiles for this publication at:
https://www.researchgate.net/publication/4852735
The Econometrics of Financial
Markets
Article in Journal of Empirical Finance · February 1996
DOI: 10.1016/0927-5398(95)00020-8 · Source: RePEc
CITATIONS
READS
564
1,959
1 author:
Adrian Pagan
University of Sydney
172 PUBLICATIONS 14,686 CITATIONS
SEE PROFILE
All content following this page was uploaded by Adrian Pagan on 09 February 2015.
The user has requested enhancement of the downloaded file.
Journal of
EMPIRICAL
ELSEVIER
Journal of Empirical Finance 3 (1996) 15-102
FINANCE
The econometrics of financial markets
Adrian Pagan
Economics Program, Research School Social Science, Australian National University, Canberra,
A.C.T. 0200, Australia
Abstract
The paper provides a survey of the work that has been done in financial econometrics in
the past decade. It proceeds by first establishing a set of stylized facts that are characteristics of financial series and then by detailing the range of techniques that have been
developed to model series which possess these characteristics. Both univariate and multivariate models are considered.
JEL classification: GI 1; GI2; G13; GI4
Keywords: Porfolio choice; Asset pricing; Contingent pricing; Futures pricing; Information; Market
efficiency
1. Introduction
Financial econometrics has emerged as one of the most vibrant areas of the
discipline in the past decade, featuring an explosion of theoretical and applied
work. Perhaps more so than in any other part of econometrics, the two have gone
together, with the development of sophisticated techniques being driven by the
special features seen in financial data. Understanding these characteristics, and the
problems that they pose for modeling, must therefore be at the top o f any agenda
purporting to describe " f i n a n c i a l e c o n o m e t r i c s " . It is worth thinking back to the
econometrics of the fifties and sixties to realize the import of this. Then it would
have been c o m m o n to assume that any series being investigated could be regarded
as stationary, independentally, identically and (possibly) normally distributed, with
0927-5398/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved
SSDI 0927-5398(95)00020-8
16
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
moments that existed. ~ To be sure, adjustments were sometimes made to allow
for " c o m p l i c a t i o n s " , such as deterministically trending variables, but these were
all treated as departures from a base model. Such a perspective has been slowly
jettisoned by those dealing with financial series. Non-stationarity, a lack of
independence, and non-normality have become the characteristics of the standard
model, with the earlier set of assumptions now being regarded as the c u r i o s a .
How and why this development came about is the concern of the first section of
this paper. The " h o w " is a mechanical task; explaining " w h y " is the more
interesting one, but we will have to content ourselves with the observation that the
models of interest to financial economists depended very heavily upon the validity
of these assumptions, so it is scarcely surprising that a major effort was mounted
to assess their validity. As examples of this contention witness the fact that
mean-variance portfolio models and optimal hedging formulae gave equal attention to the first and second moments of the data; option pricing formulae
emphasized the need for a correct description of either the conditional or unconditional density of returns; while wealth choices based on maximizing expected
utility E ( U ( y t I F t)), where F t was some information set, focused attention upon
2
the nature of higher order conditional moments.
Having established a case for emphasizing features such as dependence and
non-normality, Sections 2 and 3 seek to describe how the characteristics of the
series have been captured by parametric models. Inevitably, this section is heavily
oriented towards the construction, estimation and testing of models that explain the
volatility of returns. Section 4 returns to the task of data description, now
enquiring into what extra information is generated when moving from a univariate
to a multivariate perspective. Our answer will be to concentrate upon locating the
number and nature of the factors that are common to a set of returns. In Section 5,
attention turns to economic models that have been proposed to account for the
regularities seen in financial data. Because of the forward looking nature of
financial markets, most of these models place heavy reliance on inter-temporal
optimization, and this has led to an extensive literature which seeks to discover
how accurate the description of agents' behavior provided by these models is.
Finally, Section 6 provides a short conclusion.
Financial data appears in many forms, each with its own idiosyncrasies. We
will generally concentrate upon three representative series - stock prices, interest
i Hence assumptions like T - ~ X ' X is a constant as T ~ and the prevalence of F-statistics for
testing hypotheses.
2 In this review we will concentrateupon time series of financialdata, ignoringthe fact that data is
sometimes available upon a cross section at a point in time or even on a panel. There are special
econometric problems arising in the latter circumstances,but generallythey would appear in any cross
section or panel, and are not specific to financialseries. It is in the time dimensionthat financial series
are particularlydistinct, and it is for this reason that a very special set of techniqueshas arisen to deal
with these features.
A. Pagan/Journal of Empirical Finance3 (1996) 15-102
17
rates, and exchange rates, ignoring other series such as futures and options prices.
Nevertheless, much of what is said about the three series in question applies to
others. The series are the log of the monthly CRSP value weighted price of shares
on the NYSE over the period 1 9 2 5 / 1 2 to 1989/12, the continuously compounded
returns to one, three, six, and nine month zero coupon bonds over the period
1946(12) to 1987(12) (taken from McCulloch, 1989); and the log of the weekly
exchange rate between the $US and the Swiss Franc over the period July 1973 to
August 1985 (used by Engle and Bollerslev, 1986). The choice of the series was
largely governed by their accessibility. Sometimes we will also refer to two much
longer series of daily observations, stock returns computed from the S & P
Composite Price Index, 1928-1987, as adjusted in Gallant et al. (1992) and U.S.
daily stock returns from 1885-1987, constructed by Schwert (1990).
2. P r o p e r t i e s o f u n i v a r i a t e financial s e r i e s
It is useful to begin by considering what type of models one might adopt for a
series g(Yt) where g ( . ) is some known transformation of a random variable Yt,
e.g., g(yt)= y2 or log(y2). Throughout, we will assume that g(Yt) has a zero
expected value. Three simple models from time series analysis would be, 3 where
L is the lag operator,
A R M A ( 1 , 1 ) : g( y,) = [$1g( Yt-l) + et +etlet-i
(1)
F I : (1 - L ) d g ( y , ) = e,
(2)
OC : g ( y , ) = gp(y,) + g r ( Y t )
(1 - [31L ) gp(Yt) = wit
(3)
gr(Y,) = ~,,
where et, "q, and ~t all have conditional expectation zero and will be taken to be
identically and independently distributed over time. 4
The ARMA(1,1) model in (1) has been a work horse of time series econometrics since its popularization by Box and Jenkins in the late sixties. By setting
[31 = 1 one can produce a process that is integrated of order one, I(1), so rendering
g(Yt) no longer covariance stationary. The second model in (2) represents the
class of fractionally integrated processes; provided d < 0.5 it is a covariance
3 Of course these can all be generalized by allowing for higher order autoregressive and moving
average processes, but there is little advantage to doing that in this survey.
4 We will write this as E~_ 1(0~)= 0, where the subscript denotes that the expectation is conditional
upon some set of past-dated random variables. Generally these will be the past history of vr
18
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
stationary process, but, if d = 1, it coincides with an I(1) series. Finally, (3) is an
unobserved components model, composed of a permanent (gp(Yt)) and a transitory
(gr(Yt)) part, each of which is generated as a stochastic process depending upon
innovations TIt and ~t" Generally it is assumed that ~lt and ~, are independent of
one another, and this yields a representation for g ( y t ) that is ARMA(1,1) but with
a negative coefficient a . 5In some instances ~qt and ~t are assumed proportional
and this provides the type of decomposition set out in Beveridge and Nelson
(1981).
2.1. Are financial series stationary?
Non-stationarity of a series could occur in many ways but, arising from the
theory of efficient markets, it was natural for researchers to investigate whether
there was a unit root associated with the log prices of financial assets, i.e., whether
such a series defined as Yt was I(1), or not. A huge literature has been spawned
on the question of unit roots, and the resulting tests have been extensively applied
to financial data. These tests can be classified according to their responses to four
issues.
1. Whether the null hypothesis is that the series is I(1) or I(0).
2. The model used to construct a test of this hypothesis.
3. Within a given model, which of the characteristics of an I(1) series is used to
set up a test.
4. Whether the alternative distinguished in the testing procedure is composite or
simple.
Table 1 summarizes the literature according to this four way classification,
culminating in five types of tests that we will focus our attention upon. In the table
Table I
Classification of unit root tests
Hypoth
Ho: I(1)
Model
ARMA
Based on
a.c.f.
Nature H~
Example
Index
comp
ADF
(i)
H0: 1(0)
Var
simp
pt opt
(ii)
comp
var rat
(iii)
simp
'~
FI
UC
Degree(d)
o'~
comp
FI tests
(iv)
simp
?
comp
KPSS
(v)
simp
?
s The error term e, is now a linear combination of ]E~=ocbfqt_j and L'~j=o@j~t_j and so is no
longer independentlydistributedeven though "qt and ~, are (see McDonald and Darroch, 1983).
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
19
there are also question marks which represent what seem to be tests that are not in
the literature to our knowledge. 6
(i) Selecting (1) with g ( Y t ) = Yt the process becomes
Y, = ~31Y,- l + ut,
(4)
and we might test H o : 131 = 1 vs. H l : 131 < 1 using the t-ratio from the regression
of A y t o n Yt-1" This leads to tests for a unit root of the Phillips-Perron,
Dickey-Fuller (DF) type which feature in many econometric programs.
(ii) Instead of using the composite alternative one might put H l : 131 = ~ . In
this situation the best test can be derived from the N e y m a n - P e a r s o n Lemma as a
likelihood ratio. This produces the class of point optimal tests examined in Dufour
and King (1991) and Elliott et al. (1992). The latter also consider the case of near
integration where 13~ = 1 - c / T , so that, as T ~ 2, the alternative collapses
towards the null.
(iii) The two tests just distinguished concentrate upon the autocorrelation
coefficients of the series Yt, Pj(Yt) = cov( Yt Y,- j ) / v a r ( y , ) , as these are all unity if
the process is I(1), so that it is natural to test if {31 is unity using ~j(yt) =
2 ) - l E T t=j+ ~Y~Yt-j. Instead, one might focus upon other implications of
( E rt=~+ i Yt-j
a unit root, in particular the nature of the " v a r i a n c e " of the series. If (1) is a
correct representation of the data, o~I = 0 and e t is assumed i.i.d. (O,cre2), then
y t = yt_k + et + et_l + . . . +et_k+ I,
(5)
2 7
will have variance k tl e.
Taking the ratio h I = var(Ak y ) / v a r ( A l Yt) it would be k, and therefore a plot of
h I against k should be an increasing straight line. Alternatively, h 2 = k - l h l
should tend to unity as k increases. If, however, Yt does not have a unit root both
var(Aky/) and vaffA~ Yt) will be constants, so that the ratio h I does not change
after k becomes large enough, while h 2 will tend to zero. Cochrane (1988)
formulated this idea and gave asymptotic standard errors for ~,2 when estimated
from the data. Lo and MacKinlay (1989) give the asymptotic distribution of
T I / 2 ( ~ 2 - 1)as N ( 0 , 2 ( 2 k - 1 ) ( k - 1 ) / 3 k ) i f e t is N(0,cr2), but found in simulations that the approximation was poor when ~ = k / T is large. Richardson and
Stock (1989) determine the asymptotic distribution when k ~ 2 , T ~ oc, in such a
way that B is a constant. This turns out to involve functionals of Brownian motion,
and the distribution has to be found by simulation. When ~ = 1 / 3 they tabulate
critical values of the statistic for various T. What is appealing about this test, from
and the kth long difference of Yt, A k y t = Y t - Y t - k ,
6 There is also a growing literature on Bayesian tests for unit roots that we will not summarize here,
but which have sometimes used financial data to illustrate the techniques, e.g., Koop (1994) and
Schotman and Van Dijk (1991).
7 With some modification, the results that follow also hold if et is correlated. Cochrane (1988)
deals with the general case.
20
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
the perspective of financial data, is that, if y, is the log o f the price, the kth long
difference can be interpreted as the returns to an asset held for k periods
(assuming only capital gains and no dividends). A disadvantage of the test is its
presumption that e t is normally distributed and that the var(A 1Yt) exists. Later in
this survey both assumptions come under suspicion.
(iv) Instead of the alternative model used to construct tests being A R M A it
could be the fractionally integrated process in (2). It is possible to estimate d see Diebold and Rudebusch (1989) and Sowell (1992) - but probably not very
precisely, although it has been done for various series, e.g., Baillie and Bollerslev
(1994) with the forward premium. Another strategy is to derive tests such as an
LM test or a one-sided locally best invariant test (LBI) that d takes a specified
value, in particular whether d = 1. Writing the FI model as (1 - L ) d- t Ay~ = e t
we can test this by checking if ( d - 1) is zero. Wu (1992) derives the LBI test for
this hypothesis as - 2 E r - l(l/k)~)k(Ay,). It is probably not a good idea to base a
test on a Pk that has k close to T, suggesting that we instead use ?,r =
- E x,= i ( 1 / k ) ~ , ( A y t ) , with K set to a large value. Under the assumption that e~ is
white noise, TJ/2s'?K will be asymptotically N(0,1),where s 2 ~ k X _ l l / k 2. 8 One
might wish to use a one-tailed test given that the test statistic inherits its LBI
properties from its one-sided nature. Another way of interpreting the test is to
observe that it looks at the correlation of A y t with S,[=~(1/k)Ay,_ k, and so
differs from the D F test in that it seeks to add Z,x_ l ( 1 / k ) A y , _ * to a regression
with A Yt as dependent variable rather then y,_ 1.
(v) Changing the model to that o f an unobserved components process (3), with
g(Yt) = Yt, we might test if [31 is unity or, if the null hypothesis is taken to be that
y, is an I(0) process, by testing if the variance of "q, is zero. If the latter is zero
then Yt = Yr, and there is no unit root in the series, 3,,. Tests for this case have
been developed by Tanaka (1990) and Kwiatowski et al. (1992). They are based
upon the summation of the squares of the partial sums of the mean corrected
and the statistic has the form KPSS = T - 2 [ E r = I s 2 ] / v ,
prices ( S t = E :=IYj),
'.
where v is an estimate of the " l o n g - r u n v a r i a n c e " , E [ T - i S2r]. As this term is
essentially the spectral density for Yt at zero, it is necessary to specify a weighting
function for the autocovariances as well as some lag truncation parameter in order
to compute it. Note that KPSS = v - J Z , r 1 ( t / T ) 2 ~t,^2 where ~t = 1 / t E ' :=lY.i is
the recursive estimate of the " m e a n " o f Yt. Since an 1(1) process does not have a
mean, ~,t does not tend to zero as it would for a stationary random variable with
=
E( y t) = O.
Table 2 presents the A D F , variance ratio (at lag length k = T / 3 ) , KPSS and
fractional integration tests for d = 1 for the series mentioned earlier in the paper.
s The result comes from the fact that 0j and 0, are asymptotically uncorrelated with variance T- i.
The test can be made robust to heteroskedasticity in eI by replacing T ~/2 by a robust standard error,
i.e., regress Ayr against a constant and Ay t_ l and use White's standard errors.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
21
Table 2
Tests for a unit root in selected financial series
Series
~
DF
ADF(4)
V a t rat
KPSS
FI test
L o g share pr
Int rate
$1ogUS/SF
Returns
0.999
0.975
0.996
0.111
- 0.007
- 2.26
- 1.318
-24.749
0.07
- 1.98
- 1.564
- 11.751
0.73
0.23
0.79
0.004
1.00
1.02
1.40
0.08
- 1.40
0.43
- 1.77
Series are described in Section 1. A D F ( 4 ) is the a u g m e n t e d D i c k e y - F u l l e r test with 4 lags.
V a r i a n c e ratio test is for lag k = T/3. 0.05 critical value is 0.1 l f r o m R i c h a r d s o n a n d Stock (1989),
Table I.
Critical values for the K P S S test are 0.46(0.05) a n d 0.35(0.1).
To compute the KPSS test the weighting scheme was that used by Kwiatowski
et al. (1992) and 48 lags were used in reflection of the large sample sizes.
Conclusions are unaffected by this choice. The test for fractional integration is
based on 74o and robust standard errors are used. One item worth mentioning in
connection with the latter test is that it is very sensitive to the inclusion of ~ . In
fact, one might wish to exclude this autocorrelation coefficient on the grounds that
a single large autocorrelation coefficient should not be taken as evidence of
fractional integration; the essence of fractional integration must be that the
persistence only shows up in the cumulated autocorrelations. Doing so changes the
test statistics to -0.06, 1.06 and - 1.41 respectively. Overall, the evidence for a
unit root in asset prices is very strong while returns do not seem to possess one. 9
2.2. Are financial series independently distributed over time?
All of this does suggest that there is indeed a unit root in asset prices, but the
question of dependence in " r e t u r n s " , x t = Ayt has not yet been addressed. 10
There are a number of ways that this question has been investigated.
(a) Since a series cannot be independently distributed if any of the pi(x~),
j ~ - 1 . . . . . ~, are non-zero, this points to computation of the autocorrelation
function (a.c.f.) of x t followed by tests that the serial correlation coefficients are
zero. Generally these are found to be zero, except if data has been measured such
9 M a n y o f the a r g u m e n t s m a d e to a c c o u n t f o r unit roots involving " b r e a k i n g t r e n d s " d o not seem
to be as relevant for asset prices w h e r e there are g o o d theoretical reasons to expect a unit root.
iv In the r e m a i n d e r o f the p a p e r we will refer to the c h a n g e s in the logs o f stock prices, foreign
e x c h a n g e rates a n d the level o f b o n d yields as returns. A s a o n e m o n t h b o n d yield is the log o f the price
o f a zero c o u p o n b o n d this u s a g e seems consistent.
22
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
that there is an overlapping component. 1~ As well as returns the same conclusion
would be reached for certain " s p r e a d s " , e.g., those between bid and ask prices.
(b) A different viewpoint arises by thinking of the impact o f the news ~t upon
returns. With no dependence, i.e., x t = %, the short run and long run impact of
news is the same, i.e., Ox,/O~. t = 1, O x t + ~ / ~ , = 1. However, if the x t process has
dependence, e.g., is an MA(1), et + a ~ t - 1, then Oxt/O~. t = 1, O x t + ~ / O ~ , = (1 + et).
In general, for x t being an MA(q), e. t + o~ 1,~t - 1 + ... + oL q~. ~+ q, Ox t + ~ / ~ . , = (1 +
ctj + a z + ..-+ etq), leading to the idea of testing for dependence by testing if
H 0 : e t 1 + ct~ + . . . + ctq = 0. Testing if the sum of the a ' s is zero is likely to be
more powerful than testing if the individual et's are zero, because only a scalar is
being tested and it is more likely to be precisely estimated than any of its
components. This has led some to estimate an MA(q) to get &~, t~ 2 . . . . . (~q and
then to test if the sum is zero - see the review in Christiano and Eichenbaum
(1990). Obviously this method has the disadvantage of assuming that the alternative is a moving average process.
(c) F a m a and French (1988a) work with the UC model (3). Their idea is to
form Yt+k - Y t a n d Y t - Y t - k , i.e., the kth forward and backward l o n g d i f f e r e n c e s
of the series Yt, and to regress the former on the latter. With Yt as the log of stock
prices, Y t + k - Yt is the continuously compounded kth period return and it will be
the sum of k one-period returns, rt+~2= Yt+k - Y t + k - ~, r t + k - 1 ~ Y t + k - I - - Yt+k-2
etc., so that y t + k - y , = E ~ = ~ r , + j .
In large samples the numerator of the
~1 and F a m a and French
regression coefficient will tend to E[(E~= l r t + j ) ~tEk~ i=0~r ,-i-~
are therefore testing if this is zero. To appreciate what is being tested set
k = 1,2,3, giving
E(rt+~r,)
=~,, (k=
1)
E[(rt+l
+rt+2)(rt+rt-i)]
E[(rt+,
+r,+ 2 +r,+3)(rt+r,_
=~/1+2~/2+3"/3+2~4+~5
='Yl +2~/2 "t'-~3 ( k = 2 )
, +r,_2) ]
(k=3),
where -,// are the autocovariances of r,. Thus one is essentially testing if a
weighted average o f the autocovariances is zero, rather than whether the autocovariances themselves are.
One can think of F a m a and French's test as a modification o f the L M test for
v a r ( y r , ) = 0 in the UC model (3). This is done by observing that the composite
error term for A y t in (3) (with 13~ = 1 and g ( y t ) = y t )
is an MA(1), P t q - O l . V t _ l
,
and, when v a r ( Y r t ) -~ v a r ( ~ , ) = 0, the true M A ( 1 ) coefficient et equals 0. Inverting the M A to produce an infinite AR, (1 - a L + o~2L2 - o t 3 L 3 . . . ) x t = "1:t creates a
J Many earnings series exhibit a weak first order dependence because of this measurement feature.
J~ Returns will include the dividend yield so that Yt needs to be re-defined as beginning of period
price plus dividend. In fact, the stability of the dividend yield means that most variation in returns is
from the capital gains component.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
23
non-linear regression of the form x t = ( et x t_ l - ~ 2 x t - 2 .-I- ~ 3 x t _ 3 - . . . ) + 1)t =
g ( z , , o t ) + v t, and the L M test that a = 0 involves examining the covariance
between x , - g ( z t , e t = O) = x t, and [ 3 g ( z , , o t ) / O e t ] , ~ = o = {0[(otL - oL2L2 +
o = x t - l , i.e., it examines the first order serial correlation coefficient of returns. Now, the L M test is a powerful test if the alternative is in the
vicinity of the null, but m a y be dominated by others if the alternative is far from
the null, i.e., et is not close to zero. This idea that one might use information that
represents the alternative to improve the performance of the L M test has been
exploited in the work of King (1985) on the design of point optimal tests. Under
the null hypothesis, it is clear that any combination of past returns should be
uncorrelated with current returns, and this indicates that we might form another
test based on linear combinations of lagged returns that more closely approximates
the alternative. Because the emphasis in F a m a and French is upon the importance
of the temporary component and, as the var(Yrt) becomes large, et tends towards
- 1, this points to an examination of the covariance b e t w e e n [ ~ g ( z t , a ) / a O t ] a
= _ 1
= - - ( X t - 1 + 2 X t - 2 + 3 X t - 3 + "") and x r As this is ~/1 + 2"Y2 + 3"Y3 + .... (ignoring the sign), the argument suggests that F a m a and French are potentially
improving on the L M test by utilizing information about the likely alternative.
Ultimately, looked at from a variable addition perspective, one is trying to find
suitable regressors to add to the equation describing returns x t, and there are many
possibilities, of which F a m a and F r e n c h ' s is just one set. Power considerations
will ultimately determine which variable addition test is best. 13
Let us term the coefficient estimated from the kth differenced regression, ~(k).
Because the denominator of ~ ( k ) will tend to E [ ( ~ , ~ S ~ r , _ j ) 2 ] = k ~ if there is no
temporary component, it is clear that ~ ( k ) will tend to zero as k rises. If there is a
temporary component, the denominator still tends to infinity and the numerator
remains hounded, so ~ ( k ) still tends to zero. However, for small k, ~ ( k ) will
generally be negative if there is a temporary component, owing to the fact that
returns have A k y r t in them and tit is uncorrelated. For example, if YTt was
uncorrelated, then the first autocovariance of returns will be - v a r ( y r t ). Hence a
temporary (mean-reverting) component makes ~ ( k ) negative, and a plot of ~ ( k )
against k should follow a U-shape with k. Indeed, F a m a and French find that this
is so for many stocks they examine (for market, decile and industrial portfolios).
~ ( k ) can also be used as a measure of the importance of the temporary component. Because the R 2 from a regression of w t on zt is given by ~2 v a r ( z , ) / v a r ( w t ) ,
setting w, = Y t + k - Yt and Zt = Y~-- Y,-k, and using the fact that the variances of
°t3L3"")xt]/O°t}a-
13 It needs to be emphasized that the LM test for et = - 1 is the KPSS test described earlier Tanaka (1990) - so that Fama and French are not performing an LM test that ct = - 1. Rather, they use
the information about what the model would look like if ct was - 1 in order to construct a test that
Ot~0.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
24
these two are equal due to stationarity, it is apparent that the R 2 is identical to the
square of the estimated coefficient from the regression. If there is no temporary
component this should be zero in large samples, so that [3(k) gives a measure of
the importance of the temporary component at the kth period horizon.
Because the [3(k) are the first order serial correlation coefficients of k-period
returns, and it is known that there are small sample biases in this estimated
coefficient, Fama and French provide some bias corrections that seem to be quite
important. One should also note that, if [3(k) is zero, then the error in the
regression is just the dependent variable Akye, and this will be an M A ( k - l)
when r e -- A y, is white noise, necessitating the use of an autocorrelation consistent covariance matrix. Alternatively, one can generate standard errors by simulation methods; since the null hypothesis is the absence of temporary components,
Ay e can be taken to be white noise and samples can be generated from this
specification and the empirical distribution of [3(k) may be determined. Note that
the test is very easily formed if we have available to us regression programs that
do forward and backward lags.
Autocorrelation-consistent standard errors for [~(k) can be computed in a
number of ways. For a regression model Yt = x'ef3 + ue, where E(xeu ,) = O, the
asymptotic variance of T 1/2 ( ~ - 8 ) is
[ E ( x , x , ) ']
-
['/
E -r
E, x , u,
E, x , u,
)1
(6)
[E(x,x;)]-'
Defining d?,=xtu ,, the middle term will be ~ o ~ + 9~j~k
~ , . j =- l1,~,
r j d~ when +, is an
M A ( k - 1). Hansen and Hodrick (1980) used this approach, and it is the basis of
Table 3, which presents [~(k) and t-ratios formed with these asymptotic standard
Table 3
Long return tests for CRSP data
Return length ( k )
FF ser. corr.
FF t-stat
Bol/Hod t
12
24
36
48
60
72
84
96
- 0,002
-0,159
--0.261
- 0.200
- 0.089
0.073
0.160
0.046
- 0.018
-- 1.11
- 1.67
- 1.5
- 0.60
0.43
0.76
0.18
-- 0.72
-- 1.78
- 1.58
- 0.79
-- 0.07
0.21
0.44
- 0.36
Column 2 gives the first order serial correlation coefficient of the long returns of length given in the
first column. Column 3 is the F a m a - F r e n c h (FF) t-statistics constructed with H a n s e n - H o d r i c k (1980)
standard errors. Column 4 gives the t-ratio that the coefficient in the regression o f one period returns r t
k - I Wj r , + l - j is zero.
upon ~ 2j=~
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
25
errors. The evidence for predictability is not strong. Several authors have argued
that the asymptotic theory may not be a good approximation, e.g., Nelson and Kim
(1993) and Bollerslev and Hodrick (1992), finding that the use of t-ratios formed
in this way results in a strong tendency to reject the null that [3(k) is zero.
Richardson and Stock (1989) examine the distribution of sums of ~(k) as well as
that of a joint test that the ~(k) are zero when k / T tends to a constant as T ~ oc.
The distribution is non-standard.
Bollerslev and Hodrick suggest an altemative way of measuring the predictability of long-horizon returns that exploits the presumed covariance stationarity of
returns. As observed previously, the numerator of ~(k) involves the
LW= L
/
)1 [
j= ]
]
Thus,
where t o ~ = j for l < _ j < k and t o j = 2 k - j for ( k + l ) < j < ( 2 k - 1 ) .
when k = 3, one is considering the covariance of rt+ t with ( r t + 2 r t_ l + 3rt-2 +
2rt_ 3 + rt_4) , which equals ~'t + 2~/2 + 3"/3 + 2~/4 + "/5, and, as was established
earlier, represents the numerator of ~(k) in large samples. A simple way to see the
above for k = 3 is to write the LHS of (7) as
COV
COV
=
COV
[{(1
+L-' +L-2)r,+,},{(1 +L +LZ)rt}]
[rt+t,(L-Z+2L-l+3+2L+LZ)rt]
[L-2r,.l,{1
+2L+3L
2+2L 3+L4lrt].
Hence they advocate a regression of rt+ t upon ]E~k=-~Itojrt+ l - j , and a t-test on the
coefficient of this regressor, observing that the error term in this regression is
serially uncorrelated, and so no special formula needs to be used to take account of
the serial correlation. Simulations in their paper reveal this test to be much better
behaved in " s m a l l " samples. Table 3 gives t-statistics that the coefficient in the
regression of rt+ 1 against ~k=S Itojrt+ l--j is zero for various values of k. The
pattern turns out to be very similar to that observed with Fama and French's
method and one would draw essentially the same conclusions regardless of which
type of test one adopted.
Even if our series on (say) earnings had passed all the tests described above, it
would still not follow that the series would be independent. The reason is that they
are essentially tests of a lack of correlation in returns across time, and independence is a broader notion, referring to the ability to express a joint density as the
product of marginals. One way to think of the relation of independence and
correlation is to refer to "independence in the rth m o m e n t " . Thus what we are
investigating above is independence in the first moment, whereas complete
independence requires that this be true for all other moments as well. In particular,
let x t and x,_~ (j4:0) be independent with zero means. Then independence
26
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
Table 4
Measures of non-linearity in series
ACF
STK-M
t
BOND
t
SUS/SF
t
STK-D
t
Lag 1
Lag 2
Lag 3
Lag 4
Lag 5
Lag 6
Lag 7
Lag 8
Lag 9
Lag l0
LM test
BDS test
Tsay test
0.292
0.159
0.225
0.122
0.060
0.093
0.100
0.339
0.335
0.201
429.5
8.88
3.35
8.1
4.4
6.3
3.4
1.7
2.6
2.8
9.5
9.4
5.6
0.298
0.308
0.207
0.275
0.200
0.251
0.285
0.212
0.173
0.158
301.6
22.74
9.45
6.5
6.8
4.6
6.1
4.4
5.6
6.3
4.7
3.9
3.6
0.114
0.118
0.169
0.212
0.083
0.110
0.161
0.012
0.102
0.102
111.9
6.88
1.02
2.9
3.0
4.3
5.4
2.1
2.8
4.1
0.3
2.6
0.1
0.189
0.174
0.140
0.092
0.142
0.113
0.076
0.083
0.076
0.074
24.0
22.1
17.8
11.7
18.0
14.4
9.6
10.5
9.6
9.0
The BDS test was constructed by setting 3' to 1.25 times the standard deviation of the series and using
N = 4 histories. The test should be referred to an N(0,1) random variable. The Tsay test uses p = 5
lags and is referred to an F distribution with pm = p ( p + 1)/2 and T-pm-p-I degrees of freedom. For
m = 5 the 5% critical value is approximately 2.07 (T =~). The LM test is T times the sum of the
squared acf value for 12 lags. The 5% critical value for ×(12) is 21.02.
b e t w e e n x t and x t _ i m e a n s that E [ g ( x , ) h ( x t _ j ) ]
= E[g(xt)]E[h(xt_j)],
imply0 for any m e a s u r a b l e functions g ( . ) and h ( . ) .
ing that c o v ( g ( x t ) h ( x t _ j ) ) =
O f course the n u m b e r o f functions is vast, m a k i n g it i m p o s s i b l e to truly test
i n d e p e n d e n c e with a finite sample. O n e c o u l d replace g ( . ) and h ( - ) by p o l y n o mial approximations, e.g., g ( x ~ ) = et o + oLtx t + a 2 x ~ + .... h ( x , _ j ) = 8 0 +
8~ x , _ j + 82 xt2_j + ... and then test if all the pairwise terms, cov(xtkx~'~_j), are zero,
a l l o w i n g the order o f a p p r o x i m a t i n g p o l y n o m i a l s to tend to infinity with the
sample size. Such non-parametric tests h a v e b e e n suggested in the literature but
rarely used - see C a m e r o n and Trivedi (1993) for such an approach. Instead,
attention has focused upon w h e t h e r cov(xktx~_j) is zero for certain specific v a l u e s
o f k and f . O f particular interest h a v e been the c h o i c e s o f k = f = 2 and k - - 2,
f = 1. T h e s e h y p o t h e s e s could be tested in one o f two ways. The first i n v o l v e s
regressing x~ against a constant and one of x~_ i and xt_ J, while the second f o r m s
either X2X
t t2- j or X 2 X t _ j and regresses these against a constant, testing if the
intercept is zero. Table 4 presents these quantities for our financial series and it is
clear f r o m this that there is higher order d e p e n d e n c e ( S T K - M and S T K - D are the
m o n t h l y and daily stock returns data respectively).
G e n e r a l l y , the a r g u m e n t has been that the lack o f i n d e p e n d e n c e arises f r o m the
p r e s e n c e o f " n o n l i n e a r i t i e s " in the series. This is a s o m e w h a t i m p r e c i s e description and one that arises m o r e by default than anything else. Since the a.c.f, o f a
series x t = A y t tests if x t has a linear structure, such as x, = ~ = 0 1 3 j e t _ j, any
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
27
dependence in x t manifest in higher order moments might cause us to say that x t
has a non-linear structure. Attempts have been made to construct various tests for
general non-linear dependence. The best known o f these are T s a y ' s test (Tsay,
1986) and the BDS test - Brock et al. (1987) - but many others could be
designed•
T s a y ' s test implicity involves testing if the coefficients of w ' t = ( x Z , _ t ,
2
x t_ ~x,_ 2, • . . , X ,_p)
are zero in the regression of x, upon z~P = (1, x,_ 1, • . ., X t _ p )
and w t. His test is not expressed in this way but the interpretation just given may
be reconciled with his format by starting with
x, =
+
+ e,,
(8)
taking expectations with respect to zt,
E ( x , I z,) = z;a + E ( w , I z , y ~ l ,
(9)
and then subtracting (9) from (8) to leave
x,-
E ( x, [ zt) = ( w , -
E ( w , l z,))'~/ + e,.
(10)
If one assumes that the condi6onal expectations in (10) are linear in zt and are
estimated as the predictions, (Xt,Wt), from the regression of x t and w t upon zt
respectively, one gets T s a y ' s test, expressed as a regression of the residuals
xt - -~t against w t - fit- 14 Looking at T s a y ' s test in this way emphasizes that it is
a test for non-linearities in the conditional mean•
The BDS test can be shown to be testing for independence by focusing upon
estimated marginal and joint densities. Consider two random variables X and Z.
These are independent if
fx( x ) f : ( z )
=fx:(x,z),
(11)
where f ( . ) are the density functions of the random variables. Consider replacing x
and z with the random variables X and Z themselves, and, after that, taking the
expectation. Then, due to independence,
E[L (x)] E[L (z)] = E(L~(X,Z)),
(12)
and we will now show that the BDS test involves testing this condition.
Constructing the test based on the sample moments gives
r-'
x,
t
z,
- r - ' EL
( x,,z,).
t~l
The problem is that we d o n ' t know the densities and therefore we need to
estimate them. Suppose we do this non-parametrically with symmetric kernels K x
J4 The fact that the conditional expectations may not be linear is unimportant under the null
hypothesis, since then E( xl i z t) = 0 and "y = O.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
28
and K z, widow widths h , and h : , using the leave-one-out estimator, and a
product kernel for the estimation o f the joint density. 15 Then the test would be
based on
T-IE
t
t=
E
lj= I,je-
h x ( r - 1) K x ( ( x t - X g ) / h x )
t
× T-1
E
t=
-
~
T
T
|
T -1 E
E
h " "r
t=U=l,i~t
1
Kz((Zt-Zj)/hz
t= l,j4=t
,nA
-1)
X x ( ( x , - x ; ) / h , ) K z ( ( z , - z;)/hz))
E K,.(( x , -
2hxT(T-
1) l_<t<j
x~)/hx
x 2 h z r ( r _ 1) ~<,<j
1
-
r
2h~h.r(r- 1)
.
~
E
K.~((x,-xj)/hx)Kz((zt-zj)/hz).
l<_t<j
In the BDS case Z is simply the lagged value of X and the marginal density will
be the same for both variables, so it makes sense to set J ~ ( x ) = J ~ ( z ) as well as
making K , = K z = K, and h~. = h: = h. The effective sample size is now T 1 = T
1 and t = 1 corresponds to the second observation o f the original sample. Hence
the test simplifies to
E K((x,2 h T I ( T ~ - 1) l<_¢<j
1
-2h2TI(TI-1)
xj)/h
Tj
E
l<t<j
K((xt-xj)//h)K((zt-zj)//h)),
which can be written as
C2
C2
h2
h2 ,
where C k = 2(1/T1(T 1 -- 1))ET'_<, < jH ~=i K((x a -- xjt)/h) and x,i = x t, Xt2 = Z , .
Eliminating the constant o f proportionality h 2, and replacing the kernel by
J5 For all these concepts see Silverman (1986) and H~irdle (1990).
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
29
Ix a - x j l [ ) , , where I ( . ) is an indicator function, gives the numerator of
the BDS test when only x, and its lag is used. 16 Since the denominator is just the
standard deviation of the numerator, the two test statistics are identical. Extension
to higher order lags is straightforward.
A choice of -/, the window width, and N (the maximum lag in x, used to form
z, in constructing the test) needs to be made for the BDS test to be computed. In
our situation we choose N = 4 and y as 1.25 times the sample standard deviation
of the variable under investigation. Applying each of these tests to the series
supports the lack of independence found from the a.c.f, of squares (see Table 4). It
should be mentioned that non-linearities in stock returns, exchange rates and
interest rates have been documented before with these tests by Scheinkman and
LeBaron (1989), Hsieh (1989b) and Kearns (1990b) respectively.
The connection between method of moments testing and the BDS test illustrated above is very useful for understanding many of the properties of the test.
Defining vt = E ( f x ( X , ) ) E ( f z ( Z , ) ) - f x z ( X t , Z , )
the BDS test can be regarded as
testing if the sample mean of v, is zero, i.e., whether the intercept in the
regression of v, against unity is zero. Clearly it is possible that E ( v , ) = 0 but that
the random variables are not independent: although (11) implies (12) the converse
is not true. It is also interesting to observe that the t-statistic from such a
regression is robust to heteroskedasticity in v,. To see why, specialize (6) to the
case where there is only heteroskedasticity to get White's heteroskedastic consistent
standard
errors
(W hite,
1980).
These
are
( T - ~ y 2 x 2 ) - ~ ( T - ' ~ x , x ' t ~ ¢ 2 , ) ( T - t ~ x Z t ) - j , where ~, are the residuals from the
OLS regression of v, on x,. These are to be contrasted with the " f o r m u l a
^ 2u ( T - 1 E x t2 ) - 1 associated with OLS computer programs, where (3"
variance" ~r
^ u2
--I
T
^2
T El= lut. In the special case that x t = 1 the two estimators coincide, and this is
the situation here. Hence, this suggests that the BDS test is likely to be robust to
heteroskedasticity (but not to serial correlation).
The implications of a lack of dependence between x, and its past history raises
the issue of how to summarize such dependence in a useful way. One way is to
treat functions of x, as being determined by models such as (1), (2) and (3). Thus,
defining the expectation of a random variable conditioned upon its past history as
Et- I , these models make E,_ i(g(xt)) a function of x t _ i ( j > 0). When g ( x t ) = x 2,
and E,_ i ( x t ) = O, cr2 = E,_ l(X,2) is the conditional variance of x, and having xt2
generated by such models makes ~2 potentially dependent upon the past. The
distinction between a conditional, cr2, and an unconditional variance, E(x2), was
emphasized in Engle (1982), and it is a distinction that has proved to be more
important in finance than in any other field of econometrics. Prior to this work,
I(h/2-
.6 The numerator of the BDS test is CN - C~ where CN = 2(I/TN(T N - 1))Ef~ ,< jFlkU_-0t 1(-,/-[xt+ k - xj+kl) and TN = T - N+ 1. When N= 1 we get the formula in the text.
30
A. P a g a n / J o u r n a l
o f E m p i r i c a l F i n a n c e 3 (1996) 1 5 - 1 0 2
many finance models were understood in terms of the unconditional moments of
the series, e.g., the C A P M was stated in terms of the covariance of the return on
an asset with the market return, but it soon became evident from the theoretical
derivations that the proper items of investigation were actually the conditional
moments, unless assets were being held for long periods of time, whereupon the
relevant conditioning set for decisions is far in the past and the distinction between
conditional and unconditional moments evaporates, i.e., for stationary series in
which E ( g ( x t ) ) exists lim k__,~ E , _ k ( g ( x t ) ) = E ( g ( x t ) ) .
Table 4 not only reveals that the squares of returns are correlated, but the slow
decline in the autocorrelation coefficients may be used to argue that the correlation
is very persistent. ~7 Brockwell and Davis (1991) defined a short m e m o r y series
as one whose autocorrelation coefficients were bounded like [P:I < Ca j, as contrasted with a long m e m o r y process in which pj declined like C 2 a - 1, where
d < .5 and C is a non-zero constant. Using this definition, the squares of the
returns seem to possess long memory. Findings of long memory in the squares of
stock returns are legion - examples being Ding et al. (1993) and De Lima and
Crato (1994) - but it has also been found in other series such as the forward
premium, Baillie and Bollerslev (1994). Many techniques have been used to
investigate this property - see Baillie (1996) for a survey. By far the simplest
would be to fit models (1), (2) and (3) to suitably transformed xt, e.g., xtz o r [xt[ ,
and then to test for unit roots, fractional integration etc. as outlined earlier.
Sometimes a " m o d e l independent" way of assessing long-range dependence is
a d v o c a t e d in the f o r m o f the m o d i f i e d r e - s c a l e d r a n g e statistic,
S- l{max 1 < t _< :rE'i= 1( xi -- 7c) -- min r <, _< T~..ti = i( X i - - .~)}, where S 2 is an estimate
of the spectrum of x t using q autocovariances and ~ is the sample mean of x , see Lo (1991) and De L i m a et al. (1994). The problem with this measure is the
choice of q. As q grows the test statistic is made completely robust to serial
correlation of any order, including long-range dependence, so that the power of
the test eventually equals its size. Hence any conclusions obtained simply reflect
the choice of q. If you want to find long range dependence keep q short; if you
don't, make q long.
Table 5 looks at the cross correlation between x 2 and x ~ _ j ( j > 0), a relationship sometimes encapsulated as the " l e v e r a g e effect" that volatility depends upon
the sign of returns. There is evidence here of such a negative effect in stock
returns and exchange rates, while for interest rates it is not as clear. If there is
some relationship for interest rates it would be a positive rather than negative one.
The magnitude of the effect may not be large but what is striking is its persistence,
with many negative values for the correlation coefficients and a slow dying away
of its values.
17 T h e s a m e w o u l d be true if absolute returns w e r e e x a m i n e d , see D i n g et al. ( 1 9 9 3 ) .
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
31
Table 5
Cross correlations between squared a n d level returns
CROSS
STK-M
t
BOND
t
SUS/SF
t
STK-D
t
Lag
Lag
Lag
Lag
Lag
Lag
Lag
Lag
Lag
Lag
Lag
Lag
0.02
-0.09
-0.13
-0.12
- 0.12
-0.12
-0.I0
0.01
0.1
- 0.06
- 0.18
-0.14
0.6
2.4
3.6
3.3
3.2
3.2
2.8
0.2
2.7
1.5
5.0
4.0
0.04
0.12
0.08
-0.02
- 0.07
0.15
-0.02
0.06
0.00
0.06
- 0.02
0.05
0.9
2.7
1.7
0.5
1.6
3.4
0.4
1.2
0.1
3.6
0.3
1.1
-0.10
-0.04
-0.13
-0.10
- 0.00
-0.03
-0.02
- 0.08
-0.08
0.02
0.02
-0.05
2.5
1.0
3.2
2.5
0.0
0.8
0.5
2.1
2.1
0.5
0.5
1.3
-0.07
-0.08
-0.06
-0.05
- 0.04
-0.05
-0.05
- 0.05
-0.05
- 0.03
- 0.03
-0.02
8.7
10.4
7.2
6.4
5.6
6.4
5.7
6.0
5.8
4.3
3.5
2.9
1
2
3
4
5
6
7
8
9
10
11
12
2.3. The existence of moments
Very early in the history of financial econometrics Mandlebrot (1963) raised
the spectre that asset returns had second (unconditional) moments that did not
exist. To assess that possibility he recommended that an examination of the
recursive variance be made, and this has been done by Hois and De Vries (1991)
for exchange rates, Pagan and Schwert (1990b) for stock retums, and Loretan and
Phillips (1994) for a wide variety of returns. If the process x, was covariance
= ' 2r
stationary then the sequence of recursive variances, ~L2.r = T - I ~j=lXt,T
1,2,3 . . . . formed by adding on one observation at a time, should converge to
E(x 2) as -r --* ~. If E(x~) does not exist then there will be no convergence, and in
fact one tends to observe " j u m p s " in the sequence ix2,~. Pagan and Schwert
(1990b) describe a number of possible test statistics to formally test this feature.
One involves testing if the mean of the sample variance over one part of the
sample is the same as over the remainder. Another is based directly upon
@(t) = (TD)-t/2E~= l(X 2 --~Lz,T) ' where D is an estimate of the asymptotic variance of T - I / 2 E ( x ~ - ~ 2 , r ) (in applications below set to " ~ o + 2 ~ = ~ i ( 1 ( j / 9 ) ) , , where @j is the jth estimated autocovariance of x~). The logic of the test
is seen by replacing ~2.r by tx 2, which it estimates. Then the random variable
T - l / 2 ~ ( x j 2 -- P~2) should have variance v. O(t) is an N(O,(t/T)(1 - t / T ) ) random variable under the null hypothesis of a constant variance and Fig. 1 shows a
plot of O(t) along with the 99% confidence intervals for the monthly stock returns
series. Stock returns do not seem to be covariance stationary. Loretan and Phillips
(1994) examine a very wide range of series, all of which have this characteristic.
As Loretan and Phillips (1994) point out, what is at issue here is the shape of
the density in the tails. Unless it declines sufficiently swiftly moments will not
exist, as should be familiar from analysis of the Cauchy density, e.g. see Spanos
32
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
2"3409
I
1'13331
~
.... Upper Confidence
-.074289~
-1.2819~
"'''-_
1
.............
193
385
int~e ~ ' " /
577
768
Fig. 1. C U S U M o f squares test, CRSPretums, 99%confidence intervals.
(1986, p. 70). More precisely, suppose that there are Pareto-type tails in the
density. This means that P r ( X > x ) = kx -~, so that l o g [ P ( X > x ) ] = log k - a
logx. If [al > c then moments of order c exist, so that what we are interested in is
the question of the value of c~ for our data. After ordering the data to produce
x(~)...x(r ), where x(l) > X(2 ) > ... > X(T ) they estimate P ( X > x(j)) by the empirical
survivor function [ # ( x , >_ x(j))/T] = j ~ T, and give a plot of the log /3(X > x(j))
-2.0023~
-3'6901t.edLine '
Data~ x
slope=-2.4
x
-5.3779
-7.0657
-2.7298
r
-2,1400
f
-I.5502
-.96037
Fig. 2. Right tail of empirical survivor function of CRSP returns.
33
A. P a g a n / J o u r n a l o f Empirical Finance 3 (1996) 15-102
Table 6
Estimates of the tail index for various financial series
6t (left)
~t (right)
m
SRM
SRD
BR
SF/$
2.03
2.81
60
2.29
2.52
1600
1.49
1.50
40
2.70
3.26
50
SRM = Monthly stock returns (Fig. 1.5).
SRD = Daily stock returns.
BR = Change in bond rate.
SF/$ = Change in log SF/$ exchange rate.
against log x(j). In Fig. 2 we show a plot of these quantities for the monthly stock
return data as well as the regression line fitted to it over the last 68 data points
(after data has been re-ordered). The slope of this line is - 2.4, which points to the
fact that the variance might exist, but no higher order moments.
Loretan and Phillips also utilize more formal methods for estimating the inverse
of the tail index, ~ / = or- ~, and thereby et itself, of which the most popular is the
proposal of Hill (1975) that
¢/=
( m --
logx~i)
-- l o g x ( ~ ) .
A problem is to determine m. Asymptotic analysis with x t being i.i.d, yields
d
ml/2(~t - et)~N(O,et
2
), showing that m needs to be large. But clearly the larger
m is the greater the extent to which one is sampling out o f the range of the " t a i l "
of the density, and thereby the larger the bias found in the estimator. The situation
is made even more complex by the fact that x t is not i.i.d.: in Kearns and Pagan
(1992), where x t is generated from heteroskedastic processes typical of daily stock
returns, it is found that the standard deviation of dt is around seven times larger
than predicted from the i.i.d, case. All of this points to the fact that very large data
sets are needed to be able to accurately compute a tail index. Table 6 gives the tail
index computation for the three monthly series we are working with, as well as
that for U.S. daily stock data from 1885 to 1987. The evidence points to the
existence o f the second but probably not the fourth moment for all series except
interest rates, where the situation is clouded. Increasing m to 100 for this series
produced estimates of 1.03 and 1.47 for or, so that it may be that the second
moment does not exist for that series. 18 O f course it may be that it is higher order
moments that fail to exist rather than the second; Loretan and Phillips point to the
~8 Although one might argue that all moments are finite, albeit large, the point is that the series is
behaoing as if the moments do not exist.
34
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
possibility that the fourth moment may behave in this way. 19 Since the variance
of the estimated second moment depends upon the fourth moment the failure to
exist of the latter will generally mean that occasional large outliers will be found
for estimates of the second moment and this will show up as jumps in the
recursive estimates of the second moment.
One might ask the importance of this observation that moments might not exist
for financial analysis. There are a number of responses that can be made. First, if
true, it rules out certain models as candidates for the analysis of financial time
series, in particular those that imply covariance stationarity for returns. Second,
the inputs into many financial models pre-suppose the existence o f certain
moments, for example the optimal hedging formula depends upon the variance of
asset returns. If moments do not exist, attempts to estimate quantities that depend
upon them will generally exhibit a lot of volatility, and it may be that this explains
the observed fact that many estimates made of the optimal hedge display quite a
bit of instability. More generally, this fact may be important to many econometric
procedures, e.g., tests for unit roots such as the variance ratio test and the
Dickey-Fuller test. The effects are likely to be particularly devastating for the
former.
2.4. Are returns normally distributed?
Most tests of normality focus upon higher order moments of x t, in particular
inquiring if E ( x 3) = O, E ( x ~ ) = 3[E(x,2)] 2, i.e., whether there is skewness or
excess kurtosis. This leads to the two most widely used tests, -~ =
T - ' ~r= j ( 1 / ~
)a~ and ?2 = Y~tr= , ( 1 / ~
)(fit4 - 3t~2a~),, where a t = x, ~. There are some difficulties with the use of these statistics to assess normality
stemming from the fact that the quantities xtk exhibit dependence. Accordingly,
one might want to make these tests robust to this dependence. Perhaps the simplest
way to do this is to recognize that there are three moment conditions being used in
the construction of each of the tests, e.g., taking the test for skewness these would
3 I"1) = 0 ; E ( x t) - - I,~ = 0 ; and E ( ( x t -- hi,) 2 - - or e ) = 0 . Definbe E ( ( 1 / ~ ) x
ing these as E(~(xt;O)) = 0, where 0' = [n"l IJ- or2], the test for skewness is based
-
19 Loretan and Phillips observe that the failure o f a fourth m o m e n t to exist c a n be a problem with the
c u s u m test 0 ( t ) . In particular, they allow the errors to be realizations from a stable law, in w h i c h case
the quantity v in the d e n o m i n a t o r o f the test m a y not exist. T h e y propose modifications o f the test that
are robust to this feature. M a n d l e b r o t (1963) also argued for stable processes as a w a y o f inducing
infinite variance into financial time series. The problem with this a p p r o a c h is that it fails to produce
series that are dependent, and some additions need to be m a d e to m a k e the series exhibit the type o f
d e p e n d e n c e seen in the squares o f returns. F r o m some simulations that Phil K e a r n s a n d 1 have run, it
d o e s not seem as if the a s s u m p t i o n o f stable laws is a g o o d predictor o f w h a t the b e h a v i o r o f test
statistics w o u l d be w h e n m o m e n t s fail to exist b e c a u s e o f an I G A R C H structure - see also G h o s e and
K r o n e r (1994).
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
35
Table 7
Testing departures from normality for returns data
STK ret-M
Skewness t
Non-robust
Robust
Kurtosis t
Non-robust
Robust
Fraction around0
f(0)
STK ret-D
Bond rates
Ex rate
3.04
0.4
- 13.1
- 0.78
- 11.36
- 0.96
- 1.14
- 0.46
44.28
2. I 1
0.14
0.52
213.0
2.17
0.14
0.56
55.1
1.65
0.27
0.99
7.54
2.04
0.14
0.52
The skewness and kurtosis tests should be referred to an N(0,1) random variable.
Fraction is the fraction of standardized x t lying between -0.1257 and 0.1257. This is 0.1 for an
N(0,1) random variable. NP estimate of density of standardized x(t) at origin, f(0) for an N(0,1)
density is 0.40.
upon the first element of T -1 ~tr= l~t(xt;O),, where 0 is the method of moments
estimator of 0, i.e., upon ?1, and it is well known from the theory of G M M
estimators how to make tests concerning "r robust to dependence in 4,- The tests ?j
( j = 1,2) given above then correspond to the non-robust "t-statistic" for whether
the first element of T-~ Y'-tr=~ , ( x t ; 0 ) is zero. Table 7 presents both robust and
non-robust tests of skewness and excess kurtosis. What is striking from this table
is how the conclusions reached vary according to whether one has allowed for the
dependence in x~. If no allowance is made there are very emphatic rejections, but
the converse tends to be true once some adjustment is performed. Only in the case
of the kurtosis tests do the conclusions generally match up, and the presence of
excess kurtosis is frequently interpreted as the density of returns possessing " f a t
tails' '.
Rather than focusing upon the higher order moments of returns, it is sometimes
more useful to obtain a picture of the density for x t by using non-parametric
estimation methods, and to concentrate upon certain of the characteristics that
stand out from such an inspection. An easy to way to estimate the density
T i K ( ( x t -non-parametrically is to use a kernel based estimator f ( x ) = (1/Th))Zt=
x ) / h ) . Fig. 3, Fig. 4, and Fig. 5, computed with a Gaussian kernel and c =
0.9dx T - ~ / 5 for the window width, present non-parametric estimates of the
densities of stock returns and the changes in the one month bond yield and the log
of S U S / S F respectively, standardized as 6~- t (xt _ ~). For comparison purposes
the figures also have the density of an N(0,1) density. The densities for stock
returns and exchange rates tend to be fatter tailed than the normal and to possess
marked peaks. It is sometimes difficult to see the fatter tails, but the range of the
horizontal axis coincides with the m i n i m u m and m a x i m u m values observed in the
data, events which would be highly improbable with a normal density.
An alternative to kernel based non-parametric density estimation is the " s e m i
36
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
.56042
nsity
/
.35663
iI
iiI
.15284
Density
-.050947
-5.2719
I
-1.4040
2.4639
6.3319
Fig. 3. NP estimate of density of standardized monthly CRSP returns.
non-parametric" (SNP) method advanced in Gallant et al. (1991) and other
articles. Suppose that z is a random variable with zero expectation and unit
variance. The classical approach to the approximation of the density for z would
replace it by the product of the standard normal density q b ( z ) = ( 2 r r ) -~/2
exp( - z 2) and a polynomial [ P( z)] 2 = [1 + ~ l z + ... + c~p z P ] 2. To force this
approximation to integrate to unity the approximating density must therefore be
.82415
\i nsity
.52448
if
j~
.22477
-.074922
I
-7.6773
-3.7155
nsity
I
.24628
4.2081
Fig. 4. NP density of standardized returns on one month bond yields.
A. Pagan~Journal o f Empirical Finance 3 (1996) 15-102
.54836
/•II ~~
r
.34903
~Density
/
I
.14970
37
/~
-.049637
-3.2923
J~
i
-.89371
\\
I
I.5049
.
3.903
Fig. 5. N P density of returns to $ U S / S F .
f ( z ) = P 2 ( z ) d ? ( z ) / f P2(Okb(@)dt~. 20 If the density of x is to be approximated,
and x has mean ix and standard deviation tr, Gallant finds the density of x as
~-~fi(z), where z = t r - l ( x - Ix). After these substitutions all unknown parameters, c t l , . . . , ctp, Ix and tr are chosen by maximizing Y'!j=l log fi(xg). Computational details are available in Gallant et al. (1991) with an application to exchange
rates, while Gallant et al. (1992) explore stock returns and the volume of
trading. 21 All of these studies show that financial time series exhibit densities
which have tails that are fatter than the normal, and have much higher peaks than
the normal around zero, i.e., one has too many small and large returns to have
come from a normal density.
It is worth commenting here on the conventional wisdom that fat tails are the
cause of excess kurtosis. Deleting the two largest and smallest observations in the
data on one-month bond yields reduces "~2 from 55.1 to 28.2, demonstrating the
influence of " f a t tails". However, if one further reduces the sample by eliminating the 200 smallest and largest observations, essentially leaving the observations
around the mean, the statistic becomes - 8 . 3 9 , showing that the coefficient also
20 In the classical approach P 2 ( z ) are the sum of Hermite polynomials - see Spanos (1986, p. 204)
- but this simply represents a reparametrization.
2J In most of the SNP applications the density being estimated is the density of x, conditional upon
x,_ t . . . . . xt_ p and the volume of trading rather than the unconditional density, and the latter is derived
from the former. The procedure described for estimation remains the same however, except that ~r and
Ix are taken to be simple functions of x,_ t . . . . . xt_ p whose parameters are to be estimated along with
the ctj.
38
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
responds to the strong peak or leptokurtosis in the density. 22 Because the
leptokurtosis is so dramatic it is useful to have some measures of it that are easily
computed, and we have found two of these to be useful. The first compares the
fraction of observations of the standardized data that lie between + 0.1257; this
should be 0.1 if the density was N(0,1). The second is the non-parametric density
estimate of the standardized data at the origin, J~0). Table 7 shows these, and they
replicate the visual impression.
3. Models for univariate series
3.1. Formulating
models of volatility
Models need to be devised to reproduce some (or all) of the characteristics just
listed. 23 A useful strategy is to view returns as being the product of two random
variables, i.e., x t = crte ,, where % is i.i.d. (0,1), and 0-, is a random variable
independent of E t whose properties will determine the nature of x t. If 0-2 is
assumed to be i.i.d. (0-2,v), then E ( x ~ ) = E(tr~)E(¢~) and, if e t is n.i.d. (0,1),
E(x 4) = 3E(0- 4) > 3[E(0-t2)] 2 = 3 0 -4, which illustrates the fact that this model of
returns produces the desired feature of excess kurtosis. Historically, such "mixture
models" appealed for other reasons, principally the ability to interpret cr, as
related to the quantity of " n e w s " items arriving at a point in time. Such an
interpretation means that the variance of returns at a point in time will be an
increasing function of the amount of " n e w s " arriving in the market. Volatility of
returns will then be high when the market is exposed to large amounts of
information. Studies have also appeared in which crt was taken to be an increasing
Poisson process while e, is n.i.d. (ix,,0-2). Such " j u m p " processes have been
applied to stock returns, Friedman and Laibson (1989) and Jorion (1988), and to
exchange rates by Nieuwland et al. (1994) and Feinstone (1987).
Perhaps the major defect of this literature was its inability to account for
autocorrelation in xt2, as the assumptions make x t (and hence x~) i.i.d. This need
not lead one to jettison the model, but, instead, should concentrate attention upon
the fact that cr~ has to be made dependent upon the past in some way, and
choosing suitable relations for 0-2 has been a major item of interest in financial
22 This symmetric trimming strategy is strictly valid only if the data density is symmetric and Table
7 does indicate that there may be a mild departure from it.
23 In an interesting paper Harrison (1994) shows that the characteristics of 18th century financial
asset returns are the same as the 20th century ones. This is not to say that all series have these
characteristics; Diebold and Lopez (1995) point to the fact that there is considerable variety in the
behaviour of series. Nevertheless, whilst there are always exceptions to " s t y l i z e d facts", they are
useful as summary measures of what is distinctive about many financial series and therefore what must
be generally accounted for.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
39
e c o n o m e t r i c s during the past decade. 24 B r o a d l y , there are t w o w a y s o f doing this.
O n e is to r e c o g n i z e that (rE is a r a n d o m variable and to regard it as b e i n g
generated f r o m one o f the processes (1), (2) and (3). An a l t e m a t i v e is to allow x 2
to f o l l o w the s a m e set o f processes, i.e., g(x~) = x 2, and then derive the imputed
m o d e l for (rE that generates the requisite o u t c o m e . Both approaches are featured in
the literature. A s well as this division, there is a further one, in that, w h e n
f o l l o w i n g the first strategy, it is possible to distinguish b e t w e e n cases in w h i c h (r E
is explicitly related to past returns, and those w h e n it is treated as a r a n d o m
variable with no specific reference to past returns as a determinant o f its
e v o l u t i o n a r y path. The latter is generally referred to as " s t o c h a s t i c v o l a t i l i t y " and
will be treated later in this section. 25
3.1.1. The G A R C H class o f models
E n g l e (1982) introduced the A u t o r e g r e s s i v e Conditional Heteroskedasticity
m o d e l o f order one ( A R C H ( l ) ) 26
~E = So + o , ~ , L , ,
which, using the identity x 2 = E t_ t(x~) + v, = (rE + V,, w h e r e E t_ l(v,) = 0, bec o m e s an A R ( 1 ) in the squared returns,
x,2 = a0 + a l x 2 _ , + v , ,
and this p r o d u c e d a simple w a y o f capturing the d e p e n d e n c e in x,2, or " v o l a t i l i t y
c l u s t e r i n g " . A q u i c k glance at the i m p l i e d a.c.f, for x t2 h o w e v e r s h o w s that it is
unlikely that such a simple m o d e l has m a n a g e d to capture all the characteristics.
The a.c.f, has both h e i g h t and shape dimensions, and a m i n i m u m o f two
parameters m i g h t be e x p e c t e d to be n e e d e d to capture both features. In the
A R C H ( I ) m o d e l a single parameter et I has to account for both. A s the autocorrelation f u n c t i o n of an A R ( 1 ) is pj = tx~, a small estimated v a l u e o f Pl implies a l o w
v a l u e for ot l, while the o b s e r v e d persistence in the autocorrelation function o f
squared returns d e m a n d s that et~ be set close to unity. Inevitably, it is impossible
for the A R C H ( l ) m o d e l to reconcile these two features.
24 Another way of inducing some dependence, used by Nieuwland et al. (1994) is to have a jump
process driving an AR(1) in x~. This does not seem a very satisfactory solution as the dependence in x t
itself is weak. In fact Nieuwland et al. find that they need tr~ to be dependent as well.
25 Such a distinction does not bear up under close examination. Suppose that et was distributed as
Student's t and ~t was a function of past x,_j. Now ~t could be written as ×{q, where ×, is an
inverted gamma random variable and xl~ is N(0,1). Accordingly, x t = trtXt'qt = ~trb and now ~t is
"unobservable". Paradoxes such as these require precise definitions to resolve them, and these have
been supplied by Andersen (1992).
26 Engle actually has x t = w'tb + e t and tr2 is related to e~_ t but we will assume that there are no
variables in wr The generalization is easily made.
40
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
The introduction of the Generalized A R C H (GARCH(1,1)) process by Boilerslev (1986) improved matters, being a two parameter model
0-2 = a 0 + [3,0-L, + a,xt2_ 1,
(13)
and implying an ARMA(1,1) process for xtz of
x 2 = a o + ( ~ , + [3,) x2_, + v t -
(14)
[3Yt-,"
In this model Pl = [1 - [31(a~ + [31)]a](1 - 2131(a ~ + [3l) + [32) and autocorrelations thereafter die away like p j = (131 + e~l)J p l ( j ~ 2). Thus one could have a
small initial autocorrelation coefficient, and yet have the others slowly dying
away, provided that a~ was small and [3~ + a j large; such a combination
effectively produces " r o o t cancellation" in the AR and MA polynomials of (14)
and the resulting acf of xt2 looks like that of white noise, i.e., flat.
Another way of introducing persistence into the autocorrelation function of
squares is to invoke a components model as in (3). Engle and Lee (1994) do this,
selecting the following model for 0-2,
0"2 -~" 0"p2t ..]_ O.T2t
= +,',-1
(7 -
+ [31)L)0"L
=
+ [3,)o, +
-
+
1.
Because of the presence of the same error in both components this is the
Beveridge-Nelson model of permanent and transitory effects (Beveridge and
Nelson, 1981). In terms of observables it becomes,
(I - (~1 + [31)L)Ax,~
= (1 - ~, - ~1)o, + [1 - (1 + ~, - (1 - ~,1 - ~31)+)c + [31c2] o,.
It is clear from this expression that, if 41 + 13~ = 1 and + - 0, there will be a
unit root that effectively cancels, leaving an ARMA(1,1) process of the same form
as the GARCH(1,1) model. Engle and Lee conclude that this model actually gives
a better representation of the data than the GARCH(1,1) model. There are quite
different long-run implications however. Unless a I + 131 = 1, i.e., the G A R C H
process is an Integrated G A R C H (IGARCH) process, the ARMA(1,1) model
implied by G A R C H does not have a unit root, while the components model always
does, so that a fraction of shocks into volatility persists.
An integrated process is one with very long memory, whereas a non-integrated
A R M A process has only short memory. If it is held that the evidence is actually in
favor of long memory in x,2 it is natural to try to directly model it. The UC model
just discussed does so, at the expense of introducing a unit root into the x,z series,
but it might be felt that "treading more softly" would be advantageous. It seems
natural therefore to seek to apply the fractional integration model to volatility.
Baillie et al. (1996) have done this, naming the model FIGARCH.
The second characteristic of the data which models may need to account for is
the correlation between xtz and x t _ j. Using the basic model, E ( x Z t x t _ j ) =
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
41
E(e~Gr2e,_jcr,_j) = E(cr2e,_flr,_). If Gr2 is an even function of e,_j it follows
that this covariance is zero, indicating that any model seeking to explain the cross
correlation must specify the conditional variance as an asymmetric function of
%_j. Most importantly, G A R C H models will be incapable of replicating the
feature, and this has led to a search for modifications of the G A R C H model that
would be satisfactory. The first of these was the exponential G A R C H (EGARCH)
model of Nelson (1991), set up to incorporate the asymmetric effects of returns on
volatility noted by Black (1976) and Christie (1982),
lncr~ = et 0 + 131 ln~rL, + OLlZ,_ l ,
(15)
where z, = [ l e , I - (2/~r) ~/2] + S%. In this expression, e t is n.i.d. (0,1) so that
EI%I = (2/a'r) ~/2, accounting for that choice of centering factor. Because of the
use of both e t and let[, tr~ will be non-symmetric in e I and, for negative 8, will
exhibit higher volatility for large negative %.
The G A R C H / E G A R C H class of models make cr~ specific functions of xt_ j,
but perhaps greater flexibility in these functional forms is needed. A useful way of
characterizing the possibilities is given by Engle and Ng (1993). Fixing xt_ j,
j = 2 . . . . . they consider the mapping between tr 2 and x t_ l, terming this the
" n e w s impact curve". They point out that two broad decisions need to be made:
about the shape and the position of such a curve. In particular, one needs to
indicate whether it is symmetric or not and, if it is, what value for x,_ 1 it is
symmetric about. Thus a G A R C H process has an impact curve that is symmetric
around zero, whereas an E G A R C H model's curve is asymmetric around zero.
Other attempts have been made to produce models with generalized news
impact curves. Hentschel (1991) advocates the absolute G A R C H ( A G A R C H )
model,
cr, = a 0 + a l l X ,_ iI + 13or,_ l,
or, if one wants to allow for asymmetry and a non-zero centering,
or,= a0 + ~ x , [ I x , _ ~ - b l - c ( x , _ l - b ) ]
+~(r,_,.
The parameter c accounts for the asymmetry in the news impact curve, and the
value of b determines what point it is asymmetric about. A related model is the
Q G A R C H model of Sentana (1991):
tr~ = et 0 + ct l(X,_ , - b) 2 + [3crt2_ I.
Strictly speaking this produces a symmetric curve around b, but not around zero,
which has been the traditional point of reference. Some very general formulations
have appeared which use the B o x - C o x transformation as a way of producing a
variety of functional forms. Higgins and Bera (1992) originally wrote
t r 2 = [ c t 0 +ct,(x2t_l)l/x] 1/~,
(16)
42
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
which could be regarded as a B o x - C o x transformation applied to x , _ ~. They refer
to this specification as NARCH. Ding et al. (1993) apply the B o x - C o x transformation to G A R C H models, adding in an asymmetry, and designating the model
A - P A R C H (Asymmetric Power ARCH). One cannot be too optimistic about the
popularity of this latter approach since B o x - C o x transformations have rarely been
applied to the conditional mean, as there seem numerical difficulties in the
optimization. Perhaps its main use would be as a specification test.
In all the above formulations the non-linearity was taken to be continuous.
Friedman and Laibson (1989) are also concerned with a non-linear response of g~
to past x t, concentrating upon whether ~r2 varies continuously with xt2_ ~ up to a
certain threshold but is constant thereafter. In their application they estimate the
model
or2 = n o + eL,g( x2- 1) + [31°,2-- 1,
(17)
where g ( x 2_ ~) = sin [axe_ ~]if a x e _ , < ( w / Z ) and is unity if ax~_~ > (~r/2).
Such a modified A R C H (MARCH) model has the characteristic of subdued
volatility changes in response to sharp movement in returns. By inspection it is
clear that the distributed lag connecting O't2 to X2_ I will tend to be shorter for
large changes than for small ones, so that the effect of events like the crash of
October 1987 disappear relatively quickly. When applied to the excess stock yield
- the S & P returns less the Treasury Bill return - they find n 0 = 0 . 0 0 0 3 ,
n~ = 0.0084, [3~ = 0.6147, a = 67. Neither a nor n~ were significantly different
from zero. Because the M A R C H formulation is symmetric in x,_ ~, one way to
test for it is to estimate a G A R C H model trimming out returns whose absolute
value exceeds some specified number 8. If there is no threshold effect the
estimated parameters n 0, n ~, [3 ~ should remain much the same since the G A R C H
model would then be correctly specified. However, if there is a M A R C H effect the
G A R C H estimates for n 0, n l, [3~ should be different below the threshold than
above. Kearns and Pagan (1993) applied this symmetric trimming idea to Australian stock return data, but found no evidence of a threshold as 8 was varied
through a wide range of data. Threshold G A R C H (TARCH) models, in which
n 1g ( x t - ~) = el +I
x ( x t - J > c) + n T I( x t_ j < c), where I(- ) is an indicator function
taking the value unity if the event in brackets is true and zero otherwise, have
become increasingly popular - Glosten et al. (1993) and Zakoian (1994) being
good examples. Hentschel (1994) allows for very general functional forms with
the relation
a~, --- n 0 + [3,o'tX_2, + n,o'l~ , g(~,_ ,),
and it is easily seen that most of
located within this specification by
Non-parametric approaches are
and Monfort (1992) suggest that
might be approximated by a series
the parametric forms described above can be
suitable choice of k i and g(-).
also to be found in the literature. Gouri~roux
the unknown relation between a2 and x,_
of steps, i.e., dummy variables are added that
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
43
take the value unity within regions of possible values that x t_ ~ might take, but are
zero outside it. Pagan and Schwert (1990a) use non-parametric methods to directly
estimate the news impact curve, i.e., they compute E ( x ~ l x t_l). The major
restriction on this approach is that the number of conditioning variables has to be
small, and cannot reasonably be restricted to a few lags of x t due to the long
memory in asset returns. Hence, in the concluding section of their paper they
formulate and estimate a model in which crff is a G A R C H model augmented with
Fourier terms cos( ket_ j), sin (ket_j), k = 1,2,3. This formulation is Gallant's
(1981) Flexible Fourier Form, which has been used in the production function
literature to model producer behavior. The presence of sine terms means that cr2
27
will be an asymmetric function of xt_ j.
Other types of approximating functions
would be possible, e.g., one might use Laguerre polynomials, since these are
always positive.
The third main characteristic of asset prices is their leptokurtosis, as reflected in
the fact that their density has a very large peak around zero, i.e., there are far too
many small returns compared to what would be found with a normal density. It
seems as if G A R C H and E G A R C H models allied with the assumption that e t is
N(0,1) fail to adequately capture this leptokurtosis. As an example, we fitted such
models to the monthly stock return series, and then computed f ( 0 ) with data
simulated from them, producing values of 0.46 ( G A R C H ) and 0.43 ( E G A R C H )
versus the 0.52 seen in the data. There is an obvious failure to match this feature.
Another way of addressing this correspondence has been to examine the skewness
and excess kurtosis properties of the X,//~ t. For the CRSP equity returns the
normalized residuals have excess kurtosis of 1.46, while for the $ U S / S F rate
Engle and Bollerslev (1986) found that the normalized residuals had excess
kurtosis of 0.74.
The main response to this failure has been to change the density of ~t from an
.A:(0,1) to something else. Engle and Bollerslev made e t follow a Student's
t-density rather than a standard normal, i.e.,
=F
"~ ( 1-' "1- 1 )
I~
V
[(P -- 2)] - 1/2[1
-1" Et
--
This adds an additional parameter, v, the degrees of freedom associated with
Student's t, which can either be pre-set or estimated along with the other
parameters of the model. W h e n Engle and Bollerslev do this they find it to be
around ten, and that the normalized residuals from this fitted model did not seem
27 Sine terms raise the possibility of negative estimates of trt2. One could avoid parameter regions
that cause these by penalizing the log likelihood whenever parameter estimates are such that a negative
O't2 was observed.
44
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
to have excess kurtosis. Other suggestions along these lines have also been made.
Hsieh (1989a) and Nelson (1991) use the GED density
f(~')
exp(- i /Xl
=
x2(l+v-l)F(
p - 1) '
where F is the gamma function and h = [ 2 - ( 2 / v ) F ( 1 / v ) / ( 3 / v ) ] 1/2 (when v = 2
this produces a normal density while v > ( < ) 2 is more thin (fat) tailed than a
normal). Nelson estimates models up to an EGARCH(4,4) for stock returns and
finds v = 1.576 for an E G A R C H (2,1) process. Lye and Martin (1991) use the
"generalized t distribution" f(E t) = exp[01tan - I (et/~l) + 021og(~/2 + e~) +
M
i
~i=30iEtZ], where z is a normalizing constant.
It is useful to look at the potential for success of this solution. To do that we
will derive an expression for the value of the unconditional density of x t at the
origin, fx(0).
f~( x)
=
f f ( x, ,at) dcr,
=
f f ( x, [ cr,)f(a,) da,
=
f f , ( e = a[-'x,)a;-~f(crr)da,,
so that fx(0) = f , ( 0 ) E [ a 7 l]. This result shows that manipulation of the density for
~t will certainly enable one to control to some degree for the leptokurtosis in fx,
but that it is necessary that E[a, - l ] does not change as a result. However,
estimators of the parameters in the density f,, and those in at 2, will generally be
correlated, and therefore there may be offsetting effects. Although it is hard to find
analytical expressions for how E ( 1 / a t) changes with the values of the parameters
0 contained in the function determining o',z, numerical experiments with a
GARCH(1,1) model point to aE(1/at)/OO being negative, so that any increase in
these parameters will tend to depress the peak of the density of x t produced by the
model. In particular, there would be a conflict between the need to make [3~ + a j
high to account for the strong dependence in the series, and the need to make it
low to account for the leptokurtosis. If there is a positive correlation between f ( 0 )
and 0, where these are estimated quantities from some data set, then it may be
impossible to reproduce the observed degree of leptokurtosis simply by varying
the nature of ft. To illustrate this we fitted a GARCH(1,1) model to the monthly
stock return data using both an ./Y(0,1) and a t-density for the errors. 28 The
estimated degrees of freedom for the t-density was 7, and the parameters of the
conditional variance function were very similar, except for the intercept, which
was 0.8 X 1 0 - 4 with a normal density but 1.3 × 1 0 - 4 with a t-density. Thus,
whilst the use of a t-density has raised f,(0), the expected value of crt has
declined, and these interact to produce a value of f(0) that is very close in both
models.
2s The program used was a beta test version of MICROFIT, Version 4.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
45
The conflicts just detailed hint that, even after allowing the density for % to
deviate from normality and a use of general specifications for tr f, existing models
may not be rich enough for the statistical modeling of actual data. Both Gallant et
al. (1991) and Hansen (1994) find that there is evidence that the higher order
moments of % also depend upon the past. Both propose models for this dependence. Gallant et al. do so by using their SNP approach to density estimation,
which effectively makes the density of e t a polynomial in the past history of
returns, while Hansen's A R C D (Autoregressive Conditional Density) models make
some parameter of the density of e t, say the degrees of freedom parameter in
Student's t-density, change over time. He finds that the degrees of freedom
coefficient exhibits quite a deal of variation for some financial series, including the
excess holding yield on US Treasury Bills.
3.1.2. The stochastic volatility class o f models
Although the G A R C H class of models has met with substantial empirical
success, one item hindering their widespread acceptance within financial circles
has been the difficulty that their specification departs from the type of models
encountered in finance texts. To some extent this is a consequence of the shift
from continuous to discrete time, a subject discussed later, but it also reflects the
fact that finance models see volatility as being driven be a process separate from
returns per se. "Stochastic volatility" (SV) models have therefore arisen in which
the E G A R C H functional form is retained but the process driving the conditional
variance is made an n.i.d. (0,1) process that is independent of x t. Such a model
has the format
(18)
log cr2 = ct 0 + 131 log (rLl + (rnrlt.
Because it is a " t w o parameter" model the stochastic volatility model (ST) has
the potential for replicating the type of autocorrelation function for xt2 seen in the
data. Defining IX = et0/(1 -131) and v = t ~ / ( 1 - [ 3 ~ ) as respectively the mean
and variance of log crt2, the autocorrelation function of x 2 is p1 = exp(21x)
exp(v(1 + [3{)) (see Jacquier et al. (1994) for a very clear derivation of this result).
It tends to decline a little slower than the G A R C H models. However, the SV
model does not fare as well on the correlation between x 2 and x,_j. Since
cov(x ,x,,)
=
E(+,
j)
=
from the independence assumptions made about e, and ~7, the SV model predicts
a zero correlation between x 2 and xt_ j. No transformation of either ~z or r b
would change this situation, and the standard way to produce a correlation is to
allow wit to be correlated with e t. If e t and "qt are jointly normal with correlation
coefficient -,/ we can write T~t = "yEt "]'- ~ t , where ~, is independent of et, so that
46
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
3' 4= 0 m e a n s that o.2 b e c o m e s a function of ¢ , - i and so it yields the requisite o d d
function. 29
A variant of the S V m o d e l that has been extensively used b y H a m i l t o n in a
variety of applications, e.g., to exchange rates - Engel and H a m i l t o n (1990) - and
interest rates - H a m i l t o n (1988) - is to relate 0.,2 to an u n o b s e r v e d state variable
z, that can take either the value of zero or unity, with this variable e v o l v i n g
according to the first-order M a r k o v process
Prob [z, = I I z t - j = 1] = p
Prob [ z , = OI z , - i = 1] = 1 - p
Prob[z,=0[z,_,:0]
=q
Prob[z, =llz,_l=0]
= 1-q
(19)
It can be shown by substitution that this scheme implies that zt is an A R ( 1 )
process
z,=(
1-q)+pzt-,+~t,
(20)
where p = p + q - 1 and, conditional upon z , - ~ = 1,
%
= (1 - p )
with probability p
= -p
with probability (1 - p ) ,
(21)
while, conditional upon z,_ ~ = 0,
~1,
= - ( 1 - q)
with probability q
= q
with probability ( 1 - q ) .
(22)
Defining o.2 = ~b i + ~b2 z, we m a y write
o.? = po.2_, + (1 -- p)qb, + (1 - q)qb 2 + qbz'q,,
(23)
and, from (21) and (22),
e(n,~ I z,_~)
=
q(1-q)+Sz,_,
=
q(1 - q )
-8qb,qb 2 ' + d02'8%2._,,
where 8 = p ( 1 - p ) - q ( 1 - q ) ,
m a k i n g this model differ from a standard S V
m o d e l in two ways: the innovations driving the conditional variance are a discrete
r a n d o m variable with four states and the conditional variance of the innovations
depends upon %2 ~. The ability to replicate standard characteristics is similar to
the standard S V model. Most importantly, persistence in the autocorrelation
29 But at the expense of inducing a non-zero mean into x t since E,_ i(xt)= E t_ l(~r,e,)=
E,_ ~(o-~,~,) =~O.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
47
function of the squares is governed by the size of p and, and to ensure that it is
close to unity, one needs to have both p and q close to one, i.e., once a state is
entered into there is little tendency to leave it.
Hamilton's model has been applied to a variety of financial time series - to
interest rates in Hamilton (1988) and to stock retums in Pagan and Schwert
(1990a) and Sola and Timmermann (1994). The first of these is a particularly
interesting application; there turn out to be two regimes, corresponding to the
1979-1982 years and the remainder of the sample, respectively. The three year
period 1979-1982 coincides with the change in the Fed's operating procedures
which led to very high volatility in interest rates. The other two papers allow for
both shifts in the mean and variance of returns.
Instead of viewing the ARCH and latent state models as separate one might
attempt to link them together. A motivation for doing this comes from the high
degree of estimated persistence in volatility observed after fitting either GARCH
or SV models, a concomitant of the behavior of the acf of squares. Another
explanation for this phenomenon might be that there have been shifts in the
parameters of the underlying processes. It is well known that shifts in the mean of
a random variable will tend to show up as a unit root when an autoregressive
process is fitted, and one might extend this argument to x~ as well. Diebold
(1986) and Lamoureux and Lastrapes (1990) maintained this interpretation and
echoes of the theme may be found in Ding et al. (1993) and Diebold and Lopez
(1995). As mentioned above, Hamilton's model can be used to allow the intercept
in the variance equation to shift, but one might expect some extra autoregessive
elements in it, even after allowing for that effect. Accordingly, Hamilton and
Susmel (1994) and Cai (1994) consider switching ARCH (not GARCH) models in
which the conditional variance is selected from a number of possible ARCH
processes depending upon the state that eventuates. Such SWARCH models have
been applied to stock returns by Hamilton and Susmel (1994) and to Treasury Bill
yields by Cai (1994).
3.2. Estimating models of volatility
Five methods of estimation for the above models figure prominently in the
financial econometrics literature.
1. Generalized method of moments.
2. Maximum likelihood
3. Quasi maximum likelihood
4. Indirect estimation.
5. Bayesian estimation.
Closely associated with the decline in computational costs has been a shift
towards the use of the final two methods, and it seems likely that this will
continue, although computer packages may well continue to feature the first three.
48
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
3.2.1. G M M estimation
G M M estimation of A R C H type models has been implemented by Glosten et
al. (1993), Rich et al. (1991) among others. To find G M M estimates one must
have orthogonality conditions and Rich et al. use E{zt[ x ~ - tr~ ]} = 0, where zt
are elements of F t_ i, e.g., x t_ 1, x2-1. However, there seems little advantage to
using this estimator rather than the MLE. SV models were first estimated by
method of moments - see Taylor (1986), Melino and Turnbull (1990) and Duffie
and Singleton (1993) (the latter simulate the moments rather than deriving them
analytically) - but after these first studies there has been declining interest in this
way o f doing estimation. The moments used typically involved the first four
moments of x t as well as the a.c.f.'s of x~ and Ix,I. In a recent Monte Carlo study
Andersen and Sorensen (1994a) conclude that the results are sensitive to the
number of moments used; if too many moments are used the sampling distributions of the estimators are adversely affected, even with quite large numbers of
observations.
Under certain circumstances difficulties can be experienced with the G M M
estimator. To appreciate these suppose that there is a single parameter 0 to be
estimated and the moment condition used for this is E ( z t u t ( x t ; O ) ) = O, where zt is
an instrument. In the case of an A R C H ( I ) model one might set zt = x2t-i and
v, = x 2 - et 0 - a i xtz- 1, and assume that only one of et 0 and a i is unknown. The
G M M estimator of 0 solves Y~ztvt(xt;~)) = 0 and 0 - 00 = - ( S , z , v o t ) - l ( E z t v t ) ,
where = means that only the linear terms from the expansion of v,(xt;0) around
00 are retained, as is common in asymptotic analysis, and rot = Ovt/O0 at
0 = 0 o. 3o What makes for complications with the G M M estimator is clear from
this expression: it is the ratio of two random variables that are generally not
independent. In standard analysis T ~ / 2 ( ~ ) - 0 o) is asymptotically normal since
T-IS_,ZtVot ~ P E(z,Vo,), which is a constant, and so the distribution is determined
by the distribution of the numerator, Tl/2S_,ztv,. But what if E ( Z t O o t ) = 09. Then
we would have 0 - 0 0 = - ( T - l / 2 Y ' z t v o t ) - ~ ( T - I / 2 S ,
ztv t) and both numerator
and denominator would converge in distribution to normally distributed random
variables with zero means. If the numerator and denominator were independent,
this distribution would actually be Cauchy. Hence, the case when E ( z t v o t ) = 0,
which corresponds to the situation where the instrument has a zero correlation
with v0t, would be expected to produce a G M M estimator whose distribution
departs substantially from that predicted by asymptotic theory. O f course, this
polar case is too extreme, even if it is suggestive. Accordingly, Staiger and Stock
(1993) have performed an analysis when E ( z t v o t ) is " l o c a l to z e r o " of the form
~T - I / 2 . This allows T ~ / 2 ( ~ ) - 00) to have a limiting distribution determined by
the ratio o f N(0, var(z,v,)) to N(~, var(z~v0~)) random variables, which can also
30 if vt was linear in 0, say v~ = x2r - x~tO, then rot = - x~t and the GMM estimator would be the
simple instrumental variables estimator.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
49
depart substantially from a normal distribution unless ~ is large relative to
std(ztvo,). Simulation studies by Fuhrer et al. (1995), Kocherlakota (1990), Pagan
and Jung (1993), Mao (1990) and Staiger and Stock (1993) have all found that the
asymptotic approximation can be very poor when instruments are " w e a k " .
All of this suggests that great care needs to be exercised when applying GMM
estimators. Some guidelines for detecting cases in which there are likely to be
problems are available, see Staiger and Stock (1993) and Pagan and Jung (1993),
of which the most useful involves an examination of the "concentration ratio", a
quantity related to the expected value of the denominator, and which can be
computed from the R 2 of the regression of rot against the instruments zt if v0t
was linear in 0. What to do about the problem is a much harder question. One
might consider performing an iterated bootstrap in which the parameter estimates
found with the data are used to simulate realizations which can then be used to
study the properties of the GMM estimator. However, this requires one to be able
to simulate from a known model for x t and that may not always be available.
Fundamentally, one would like to make the denominator non-stochastic, i.e, to
replace E z t v o t by TE(ztvot). For some applications of GMM which match
moments, e.g., in SV models, this expectation can be found either analytically or
numerically (since z, is actually unity). What makes it difficult to follow this
strategy in many instances however is that the number of moments available,
E(+(xt;0)) = 0, exceeds the number of parameters in 0. To use all available
information, the larger number of moments are reduced to the correct number
E ( ~ , ( x , ; O ) ) = 0 by forming ~t = W~bt, where W is a (dim(0), dim(~b)) matrix.
Hansen (1982) showed that the best choice of W was W = [ E ( T -1
E ~ b J ~ 0 ) ] [ T - 1var(Y'.~b,)]- 1. It is clear that the resulting GMM estimator has the
form 0 - 0 o = -(Y',WOd?JO0)-I(EW~bt) and, even if 8qbJ00 was non-stochastic,
the denominator may become random if W has to be estimated from data. 31
3.2.2. M a x i m u m and quasi m a x i m u m likelihood
Estimation of the unknown parameters can be done by MLE using f( x t i X t_ 1),
where X t_ t = {Xo,Xl . . . . . xt- 1}- To see how this item determines the log likelihood, write the joint density of returns as the product of a conditional and a
marginal density, f ( x , , X t_ 1) = f ( x t ] Xt_ ])f(Xt_ 1). Building this up for all T
observations, gives the joint density
: ( x, . . . . . xT)
=f(xT I x , _ , ) f ( xT_, I xT_2) ... f ( xo),
3~ If the problem is such that artificial series can be generated allowing the estimation of W to any
desired degree of accuracy, that approach might be used. In Monte Carlo studies such an option is
available and Andersen and Sorensen (1994a) used the technique in their study of the GMM estimator
of the SV model to show that its properties are greatly improved if the randomness due to W can be
eliminated.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
50
making the log likelihood of x~ . . . . .
x r equal to
T
L = 2~ log f ( x , I Xt-. ) + log f ( x 0 ) ,
(24)
t=l
and this may be maximized with respect to the unknown parameters.
To estimate the unknown parameters of the ARCH process it is necessary to
find f ( x t I Xt_i). But, if Ct is n.i.d. (0.1), it follows that the density of x t
conditional upon X t_ l is ./V(0,ti 2) and (24) would be
L=t
~=1{ - -~1 l o g 2 ' r r - -21 log 0- 2 - - 2cr1 Z2 x ~ ) + lo g f ( x o )
(25)
After replacing crt2 in (25) by any of the GARCH type specifications discussed
earlier, one can proceed to maximize L with respect to the unknown parameters in
cr2. The main difficulty here is the final term in (25). But, because it is a single
term, it is dominated (for large T) by the summation, and so it has generally been
neglected. In fact little is known about the density f ( x 0) - if the x t are assumed
to be strictly stationary, then f ( x 0) is the unconditional density for the series x r
To date no analytic expression for this is available, but we can determine it
experimentally once particular values of the GARCH parameters are specified. 32
As mentioned earlier models have also been estimated when f, is not normal.
A useful way to characterize this search for flexibility in the density of ¢t is to
observe that the "density score", dlog f ( ¢ ) = f ' ( ~ . ) / f ( ~ . ) , is - ~ for an N(0,1)
density, i.e., it is linear in e. Different choices of the density for ~, amount to
making the "density score" a non-linear function of e,. To appreciate why this is
of interest observe that the log likelihood for any ARCH model is
l
T
T
- ~ E log tr E - Y'. log f ( ~ t ) ,
t~ 1
(26)
t= 1
and therefore the scores for 0 will be
O0
2,~=1 --~-1 t L~ f(~,) ] Ct+l
,
(27)
Notice that this involves the unknown density score so that the various
parametric approaches can just be viewed as alternative ways of estimating this
quantity. 33 Another approach, used by Engle and Gonzalez-Rivera (1991), is to
32 Diebold and Schuermann (1996) have included log .t(x 0) in the maximization by forming
non-parametric estimates of f ( . ) . For samples of 100 the difference is minor.
33 Nelson (1990b) argues that one needs to be cautious in following this strategy since there will be
an interaction between the assumed density and the specification of 02. In particular, it is possible that
the estimated ~i~ might imply thinner tales than the true cr~ would, since some of the thickness will be
absorbed by the assumed density. We have encountered this phenomenon earlier when setting out an
expression for f(0).
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
51
try to produce estimates of 0 that would be " r o b u s t " to the density function by
replacing f ' ( e ) / f ( e ) in (27) with a non-parametric estimator of that quantity, i.e.,
to find a semi-parametric (SP) estimator of 0.
Within the SP literature a distinction is made between the parameters of interest
0 and a set of " n u i s a n c e " parameters "q that may be thought of as indexing the
possible set of densities, f(00,'q), with f(00,-q0) being the true density. Because
is infinite dimensional analysis of the situation is more complex than in parametric
problems where "q might (say) be the degrees of freedom parameter for the
t-density. Nevertheless, it is the case that familiar propositions from the parametric
literature tend to generalize to the semi-parametric case. In particular, it is well
known that, to estimate 0 without knowledge of -q, it is necessary that E[a log
f(00,-q)/ao] = 0 for all possible values of ~q, where the expectation is taken with
respect to the true density. In the SP estimation context, if this condition holds 0
is said to be adaptively estimable. What are the prospects for this? It is clear from
the definition of the score in (27) that adaption requires E[tr72(atr2/a0)] = 0.
Taking the A R C H ( l ) model for illustrative purposes, and following the approach
in Linton (1993), re-parameterize trt2 as ~r2 = e ~ (1 + to(x2_ ! - a)), where e a =
0% + toa, to = e-Set i and a = E[tr72 x 2_ i]/E[o't -2 ]. It is then clear that
=
-
=o.
Hence, the above results point to the conclusion reached by Linton (1993) and
Steigerwald (1992) that neither the intercept s 0 nor the slope et 1 are adaptively
estimable. These results parallel what is known about the possibility of adaptively
estimating the intercept and slope coefficients in a linear conditional expectation,
see Pagan and Ullah (1995, Theorem 5.5). As discussed there it is also necessary
that the covariance of the scores of ~ and to be zero, but this can be shown to be
satisfied by the choice of a in the text. However, there is one important difference
to the conditional mean context: rarely is there much interest in the intercept in a
conditional mean, but most uses of the conditional variance require tr 2 in its
totality, i.e., the constant term as well as the A R C H part. Consequently, the
quantity of interest, cr2, cannot be adaptively estimated. Again this is a familiar
result from the SP literature; it is impossible to adaptively estimate a scale
parameter.
It is unwise to become too preoccupied with the possibilities of adaption. A
more useful question is to ask whether a given SP estimator is "efficient". To
answer this requires some definition of efficiency. In the parametric case this is
provided by the C r a m e r - R a o lower bound for 0 of (I00 - 10oI~nl lo0)- l, where I is
the information matrix. There is an analogue to this in the SP case that is termed
the SP efficiency bound. Steigerwald (1992) and Drost and Klassen (1994) derive
this bound for G A R C H models. Provided the density of e t is symmetric the
estimator proposed by Engle and Gonzalez-Rivera attains this bound.
Instead of trying to find the MLE when f, is unknown, it has been suggested
52
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
that one proceed as if ~, was n.i.d. (0,1), i.e., maximize the function - ( 1 / 2 ) E log
¢,~ ,_~ - (1/2)Ecr,~,2_ ,[x,z- E(x, IX,_ ,)]2 where cr,~t_, = var(x, I X,_ 1). The resulting quasi MLE o f 0,0QMLE, will be consistent and asymptotically normally
distributed, provided the expected value of the scores under the normality assumption h a v e zero e x p e c t e d value u n d e r the true density, i.e.,
~=g [ yl , t T l ( 3 0 " / 2 / 3 0 ) O ' t 2 ( e ~
- 1 ) ] = 0 . Clearly, this holds for any density. The
major difficulty with the quasi MLE is that, whilst 0 is consistent, one has to be
careful about the construction of the estimator of the covariance matrix. This
should be var(0) = H - I V H - 1, where H = -E(O2L/O0OO ') and V = var(OL/00),
which differs from the value, H -1, that would obtain if the assumed density for e,
coincides with the true one. One can replace V with the "outer product" estimate
Iv" -4.,a~a~_.g.~_x,
~ 2 -- 1) 2 SO that, in large samples, V
~Tt= I(i)Lt//OOXOL,/OO), _- - 7L.,~,
~ a0 a0 )~,et
1
2
-gH[E(~.
t - 1)2], showing that the true covariance matrix is a multiple of the
" f o r m u l a " variance, H -1 , determined by the fourth moment of %. The quasi-MLE
standard errors have become widely used. In the discussion above it is assumed
that the density of the innovations generating the QMLE is normal, but the
widespread use of the t-density in estimation prompts the question of the
properties of QMLE's based on other densities than the normal, Newey and
Steigerwald (1996) deal with this question showing that, if both assumed and true
densities are symmetric around zero, the parameters of the conditional mean are
consistently estimated, while those in the conditional variance are consistently
estimated only up to a scale factor. They also discuss how to modify the QMLE if
symmetry does not hold.
Estimation of f ( x , l X,_ i) for SV models is very demanding computationally.
To see why, observe that, since the SV model has ¢r2 following a Markovian
process,
=
f ( x, I X,_,) = f~,, f~, , f(%,cr,_,,x, ] X,_ 1),
(28)
and it becomes necessary to perform numerical integration at each observation in
order to evaluate the likelihood. In the case of Hamilton's specification the burden
is lessened by the fact that crt2 is discrete, allowing the bivariate integration to be
carried out by a dual sum over the four possible state values. Of course, this
integration first requires that the density f ( % , % _ ~,x, I X,_ 1) be found, but this
can be done recursively from the following relations between conditional and
unconditional densities:
f(%,crt_,,x,[ X , _ , ) = f ( x , [ ~ r t , c r , _ , , X , _ , ) f ( % , % _
1 [X,_I).
(29)
1)"
(30)
f(cr, ,or,_ 1 I X,-I ) = f ( % I ~r,-1 ,X,_~ )f(o',_ l ,cr,_ 2 I X,_
f(%,o-,_. IX,) = f ( ~ t , t r , _ , ,x, IX,_ t ) / f ( x , I X t_ 1)-
(31)
Beginning with some initial density f(~t,cr0] xl) we can apply (30), (29), (28) and
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
53
(31) in that order to find f(tr2,trl I X2), whereupon the process is repeated. 34
Since f ( x t I o't,tr t_ l,Xt_ 1) and f(tr, I crt_ l , X t _ 1) derive from the model the circle
is closed.
The problems o f having to perform numerical integration at each observation
has dampened the enthusiasm of investigators for exact M L E (except in the case
of H a m i l t o n ' s model). Some progress has been made in designing efficient ways
of performing the numerical integrations, of which the best method seems to be
the accelerated Gaussian importance sampler of Danielsson (1994) and Danielsson
and Richard (1993), and more can be expected. Until recently therefore, the major
way of estimating the SV model was by QMLE. To implement this method,
square x t to get tr t2Et,z and then take logs, thereby summarizing the SV model (18)
in the state space form,
log x t2
=
k + log cr2 + ~,
log tr 2 = ct 0 + 131 log tI 2_ t + trn'qt,
where k = E(log E2) and ~t = log ~ 2 _ k. If ~t was normally distributed, and ¢t
was independent of "q~, then f ( x t l X t_ l) could be found directly with the Kalman
filter. 35 W e can ignore the fact that ~t is not normal and proceed as if it was so,
i.e., use a quasi MLE, and this approach was suggested by Nelson (1988) and
implemented by Harvey and Shephard (1993) and Harvey et al. (1994). 36 Its
major defect is that it can be very inefficient, since the distribution o f ~t lies far
from a normal density, having a very long left hand tail. Computationally,
problems can arise when % becomes small, as this makes log ~.2t v e r y large, and
this " i n l i e r " problem was dealt with in Harvey and Shephard by effectively
trimming out these observations.
Drawing on ideas in Shephard (1994), K i m and Shephard (1994) show how the
formulation above may be adapted to perform exact maximum likelihood. The
idea is to replace the ~t with a mixture of m N ( i x i , v i ) , i = 1 . . . . . m , random
variables whose probability of occurrence is ~i. Stochastically this can be thought
of as making ~t a function of a discrete random variable co, taking values
i = 1. . . . , m, so that f ( ~ t ) = ~-,i% I"lTif(~t ] (Dt) = ~.mi= IqTiN(J'Li'Vi )' e.g., if co t = 2 is
realized then we would draw ~t from an N(l~2,tr 2) random variable. As K i m and
Shephard point out the weights xr i, Ix~ and v~ can be determined once m is
34 One possibility is to set f(trt,% I x~) to the unconditional density f(tr0.~); another is to estimate
it. Hamilton (1989) uses the former whereas Hamilton (1990) proceeds under the latter assumption.
35 In the case of Hamilton's model the error term in the state dynamics equation has a time varying
conditional variance that depends on unobserved quantities, as E(-q21 zt_ t) depends on try_ t- While
the Kalman filter allows for the conditional variances of the errors to vary in a known way with the
past history of x~ it does not allow them to depend on the past unobserved states.
36 Ruiz (1994) suggests that the QMLE is more efficient than the GMM estimator of the parameters
of SV models but Andersen and S0rensen (1994b) demonstrate that this conclusion is incorrect.
54
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
selected by finding what parameter values would make ET'= I'l'tiN(P°i,vi ) a s close a
fit to the density of the log of a X2(1) as possible, and these weights are given in
their paper for m = 7. 37 What makes this idea ingenious is that, conditional upon
a realization Co= (to I . . . . . tOr), the random variable in the observation equation is
~tl Co, and this will be normal. Hence the Kalman filter can be applied. Of course,
this immediately raises the issue of where Co comes from. Defining 0 2 =
(tr~ . . . . . trr~) we could obtain a set of realizations on Co if we could simulate from
f(02,Co I Xr), but that can be done with the Gibbs sampler, for which we will need
f(~2 I Co,XT) and Pr(Co 1~2,XT). The first of these, sampling from the Kalman
smoother, is accomplished by the algorithm in De Jong and Shephard (1993),
l e a v i n g the task o f getting r e a l i z a t i o n s
from Pr(~ I~Z,Xr) =
Iltr= ~ Pr(to, I tr2,10g xt2), and this can be done using Bayes theorem, culminating
in the requisite density as being proportional to II~r= l f(log x2tl~,~t,trtz) Pr(c%). 38
Kim and Shephard report that the method works very quickly.
3.2.3. Indirect estimation
A method of estimation whose popularity has grown in near exponential
fashion since its introduction in 1992 has been that of indirect estimation. To
understand the nature of this method it is useful to consider the simple example of
estimating the MA(1) coefficient et
Yt = et + ere,_ l,
(32)
where e t is n.i.d. (0,1), by using the AR(1)
Yt= ~Yt-1 +ut"
We assume that u t is n.i.d. (0,tr 2) and write down the sample scores for 13 as
T - 1o'-2 Y'.y,_ l( y I - [3yl_ i).
Equating these to zero gives the OLS estimator ~ Since this model is mis-specified it is known from the theory of estimation in mis-specified models that the
estimator [3 converges to the pseudo-true value 13" defined by
EMA[ Yt-1( Yt -- 13" Y , - l ) ] = 0,
(33)
37 It is not necessary that ~t be taken as an N(0,1) random variable as one can approximate the
distribution of ~ - E(~ 2) by a mixture of normals if ~t follows other densities. Mahieu and Schotman
(1994a), in estimating an SV model, allow ~ i and v i to be free parameters.
as The constant of proportionally is found by summing f(Iog x~ J to t = i,tr 2) Pr(to r = i) over all
i=l,...,m.
39 Both the quasi and exact M L E ' s use the state space format for squared returns but this will only
appropriate when ~t and "qt are independent, and it has already been argued that the cross correlations
between x~ and x~_ i make this assumption suspect. When one allows for the correlation it will be
necessary to estimate that parameter, but there is no information in x~ to do so. Harvey and Shephard
suggest that one use both x~2 and the sgn(x r) for estimation.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
55
where EMA means that the expectation is taken with respect to the true model (the
DGP). In this instance it is easy to solve for 13", producing
[3* = EMA( Yt Yt- l)//EMa( YL I) = Or//( 1 + Or2),
and this shows the dependence of 13" upon a . As the value of et varies so will
[3", and we might write this dependence as [3 = ~b(et), a quantity that has been
termed the "binding function". This further suggests that we might estimate ot by
using such a relation, i.e., we could solve
T-I ~Y~-I
Yt
1 -t-6 2yl-I
=0,
to get an estimate of ~. Standard properties for 6 of consistency and asymptotic
normality follow from ~ = qb(6) and the fact that ~ is consistent and asymptotically normal around [3 *. In this instance we know that 6 has these properties from
the times series literature, as factorizing the autocovariance function was one of
the earliest ways of estimating an MA(1) - see Durbin (1959).
In this simple example we could find the binding function analytically, whereas
in most instances this will be hard to do. However, all we need to be able to do is
find the value of [3 * ,~* (~), that is associated with any given value ~ through the
binding function, and to then choose a value of ot that implies a [3 * that equals ~,
i.e., we wish to choose a value of or,6, such that ~ = 6 / ( 1 + 62). Now, because
13" is defined from (33), it is clear that the value of 13,~*, given by
1/MFYm=l 1/TY"rt= 1 .v,~,- l(Y,~t- ~* Ymt- l) = 0, where the .vmt are values simulated from the MA(1) model at the ruth replication, with ot set to ~, m = 1. . . . . M,
will converge to ~* (U) as m ~ ~. Thus we might m i n ~ ( ~ - ~* )2; the value of
• changing as new values for et are tried. At the minimum value we have the
"indirect estimator", 6. It will not exactly equal the 6 found from fi = qb(6),
unless M was infinite, but we might expect it to be close. If there are more
elements in [3 than in et the distance function between [3 and fi* needs to be
modified to ( f i - ~ * Y V ( f i - ~ * ) ,
where V is some weighting matrix. This
describes the original proposal by Gouri6roux et al. (1993). A related method of
indirect estimation is due to Gallant and Tauchen (1992), who would find the
value of et that minimizes [ 1 / m ~ = ~ 1/T ~ L I Yrat-- 1.( Yrnt -- ~ Ymt-- 1)]2; this
works since, as m ~ c ,
the function tends to [ ( ~ - J 3 ( l +~2))]2, which is
minimized by setting 6 = ~(1 + 62). 4o Therefore, computer simulation of the
model we are ultimately interested in estimating, along with actual estimation of
an entirely different model, will produce consistent and asymptotically normal
estimators of the parameters of interest,
40 Although Gallant et al. (1994) refer to their method as "efficient method of moments" we feel
that the descriptor used by Gourirroux et al. is sufficiently evocative to justify calling all methods
operating with this principle by that name.
56
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
The ideas described above generalize. Think of the model to be estimated as
having some density for the random variable Yt, g(Yt;Ot), that is characterized by
unknown parameters ct, while the auxiliary model used for estimation purposes
satisfies a set of moment conditions 1/T y,r I~(Yt;~)= 0. Then the pseudo-true
value of 13 solves Eg(1/T Y'.tr=I~(Yt;13* )) = 0, and the estimator of a that we seek
is what solves this set of equations. To make this operational, we replace E~ by an
average using observations simulated from the model with a given value of et, and
replace 13* with a consistent estimator of it, [~, leading to the indirect estimates of
et being determined by
min,~ [ 1 / M Y'.~=,
1/TZ~=I~(~r,t;~)]'V[1/MZ~=, l/TY'.rt=l~(~,,t;~)].
In this formula Y,~t are simulated values from the model described by g(y~;ct) for
given values of ct, and V is some weighting matrix. As this can be thought of as a
GMM estimator, the literature on ways of selecting V is relevant; for a good
review of these see Den Haan and Levin (1994).
Obviously, the success of the method depends on a number of factors. First,
there must be some connection between 13" and a , i.e., varying values for c~ must
show up in terms of changed values of 13*. Consequently, there would be no sense
in trying to estimate parameters relating to a conditional variance of Yt from the
conditional mean, unless these were connected in some way. Second, the choice of
the auxiliary model 6(Yt;13) (or score generator in Gallant and Tauchen's terminology, since they normally select 6(Yt;13) as the scores of some model) is important
for the efficiency of the indirect estimator of ct. In terms of our simple example,
the estimator of the MA(1) coefficient using an AR(1) as the auxiliary model is
known to be very inefficient. Using a higher order AR, e.g., an AR(3), should
improve the properties of the estimator, a result established many years ago in
time series analysis. Clearly, what we are seeking in ~(y,;13) should be the best
statistical representation of the data that is possible. It is important to stress in this
connection that the auxiliary model may be a mis-specified one.
Although it is computationally very demanding, the procedure is extremely
attractive, particularly if it is hard to find MLE's of ct but it is possible to find
some representation of the data that may be easy to estimate. A good example is
the stochastic volatility model. It is simple to generate data from the model once
parameter values are selected, and it is known that GARCH models can be found
that give reasonable fits to the data, making it natural to select these as candidates
for an auxiliary model. Engle and Lee (1994), in estimating an SV model with
non-zero correlation between e, and r b, found indirect estimates using a
GARCH(1,1) model with an asymmetric effect of returns on volatility as the
auxiliary model. One might think that the scores of an EGARCH model would be
a better candidate for 6(Y,;13) in this instance, since it has a similar functional form
to the SV model. In perhaps the most comprehensive study of the indirect
estimation of the SV model, Gallant et al. choose their SNP method of approxi-
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
57
mating densities to derive ~(. ). Because dim(J3) therefore greatly exceeds dim(a)
it is possible for them to assess the adequacy of the SV model as a representation
of the daily stock data used in this survey by testing how close the scores
Etr l~(yt;[3*(dt)) are to zero. They find that the SV model is soundly rejected
unless the density of "qt is allowed to take a very general form and there is also a
very flexible specification of the way in which tr~ maps into "tit.
3.2.4. Bayesian estimation methods
Bayesian methods of estimation are becoming more common in financial
econometrics. One reason is that the parameters 0 are regarded as random
variables. Therefore, they can be treated as extending the set of latent variables tr t,
and posterior densities for both will be simultaneously obtained. Some see this as a
major advantage, i.e., instead of first determining 0, and then using this to
construct a point estimate of tr t, the complete density of the latter is found. As is
well known it is easy to write down the posterior for 0, given the data, in terms of
the product of the likelihood and the prior density, but rather harder to give this an
analytic expression. Fortunately, methods have become available that enable
simulation of realizations from the posterior without explicitly knowing its form.
Generally, these methods fall under the heading of Markov Chain Monte Carlo
methods, of which leading examples are the Gibbs sampler and Metropolis-Hastings algorithm. Surveys of this technology are available in Chib and Greenberg
(1994a) and Chib and Greenberg (1994b). Jacquier et al. (1994) have estimated the
SV model with a variant of the Metropolis algorithm, while Geweke (1994) has
estimated both GARCH and SV models with the same approach. Geweke also
shows how the prediction error decomposition (24) for the log likelihood may be
used to easily compute posterior odds recursively, and, based on an examination of
these, concludes that the SV model is superior to a GARCH model for an
exchange rate data set. Albert and Chib (1993) estimate Hamilton's model by
these Bayesian methods.
3.3. Parametric models and moment existence
Estimation experience with GARCH models is now quite extensive. One of the
most striking features has been the fact, documented in Bollerslev et al. (1992),
that the model parameters [3j, a j frequently sum to unity, i.e., for a GARCH(1,1)
process [31 + a i = 1. For EGARCH models the sum of the autoregressive coefficients on lagged values of log tr 2 can also be close to unity, e.g., Nelson (1989),
Pagan and Schwert (1990a) and Keams and Pagan (1993). One must therefore
enquire into the consequences of this feature for estimation.
Let us begin with an EGARCH process that has the autoregressive coefficients
summing to unity. This means that log crf has a unit root. Without an intercept
therefore log cr,z would evolve as an integrated process; with an intercept it would
be integrated with drift. In the former case it is known that T-2 ~T/=I log errz would
58
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
tend in law to a random variable, so that the log likelihood for such a process must
eventually tend to T - 2 E r = , log (yf, since the other term T-ZY'xrTZx f = T-2y'.e~
= T - 1 ( T - I E e 2, ) ~pO as T - l ~ e ,2L 1. Thus the estimation problem is not well
defined and we could not expect that M L E ' s obtained by maximizing (25) would
have standard properties. If there is an intercept, T - 2 E l o g ( r 2 would tend in
probability to a constant that would not depend on the E G A R C H parameters, other
than the intercept, and again this would create estimation problems. Hence, it
appears that an E G A R C H process with unit roots would have to be avoided.
Therefore the finding of near unit root behavior has to be very disturbing.
Turning to the GARCH(1,1) process in which ct~ + 131 = 1, termed an Integrated GARCH(1,1), or IGARCH(1,1) process by Engle and Bollerslev (1986), the
situation is much less clearer. Assume that •t is i.i.d, with a symmetric unimodal
density, monotonically increasing from - ~ ¢ to zero and monotonically decreasing
from zero to infinite, with zero third moment and finite moments up to order six.
Nelson (1990a) showed that, if E[fn(et 1•2 + 131)] < 0,
1. For the IGARCH(1,1) model, when the intercept in the equation for err2 was
zero, (r2 °~'0.
2. For the IGARCH(1,1) model, with a non-zero intercept in the equation for (r2,
cr2 has a strictly stationary and ergodic limiting distribution.
3. For the IGARCH(1,1) model, E(crt2k) < ~ if E[(a l•t2 .jr_ 131)k] < l. 41
Now, from (3), putting k = 1, gives E ( a l e ~ + 13~)-- ct I + 131, and therefore
when the process is I G A R C H (a~ + 131 = 1), E((rf) does not exist (this is not
surprising given the solution for E((r 2) in the GARCH(1,1) case). Now E((rt) <
{E(o't2)} I/2 by Jensen's inequality, i.e., E[(ot i • 2t -k- 131 )1/2 ] < [e(al•:, + 131)P/2 =
1 when ctl + 131 -- 1, and by (3) this guarantees the existence of the fractional
m o m e n t E(o't). Consequently, when one looks at the log likelihood for an
I G A R C H process, T-1 y, log (r, will converge to E[log (rt ] by the ergodic theorem,
while T-lEcr72e 2= T-1E•~ tends to unity. Therefore, unlike the E G A R C H
process, the log likelihood of an I G A R C H process is well behaved asymptotically,
and it is not surprising that the MLE estimators of the I G A R C H parameters will be
asymptotically normal, converging at the standard rate of T 1/2. Lumsdaine (1991)
and Lee and Hansen (1994) give formal proofs of this fact; the latter under the
conditions just cited, the former under the assumption that moments of • t up to the
32nd order exists.
The I G A R C H process is therefore one for which the variance of returns does
not exist but for which returns are a strictly stationary process (the latter follows
from x t = tr~et, and this is the product of two strictly stationary processes), i.e.,
the IGARCH(1,1) process is strictly stationary but not covariance stationary,
whereas an IEGARCH(1,1) process would be neither covariance nor strictly
4~ Results for GARCH ( p , q ) models are available in Bougerol and Picard (1992).
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
59
stationary. Accordingly, because an IGARCH process is strictly stationary, it is
somewhat misleading to attach the label "integrated" to it. This is not to say that
it does not possess some of the characteristics of an integrated series. Using
x t = et0- t it is possible to re-write (13) as
0-2 =Or. o "t-[[31 "1-OtlEt_ l(@t2 l)]O't2_ 1 + 0¢.1[F.2_ , - gt_ l(Et2_ 1)]0- 2_ 1,
whereupon
E l_
1(0-L 1) = (~-0 "[- ([31 "~ °£1)o'L , and E,_ 1(o-2) = (3/.0 -q- 0-LI if the
process is IGARCH, and this would also be true of an I(1) series with drift.
Another important feature of Nelson's results is that there must be an intercept
in the variance equation, otherwise 0-2 will be degenerate for large t. Thus
removal of an "insignificant" intercept would be rather unwise if the IGARCH
process was to be used for forecasting. It is also important to observe that one
cannot "start u p " 0-2 with a o / ( 1 - a I - 131), as in the GARCH(1,1) case, any
longer. Nevertheless, setting the initial value %2 to zero will mean that it is
effectively incorporated into the intercept, so that the estimated intercept really has
two components. It may well be that a small intercept is found if an IGARCH
process is estimated with a GARCH program because the initial value will be set
to larger and larger numbers as the [31 + et i sum tends to unity.
3.4. Specification tests for ARCH
models
A large range of models have been mentioned above as being useful for the
modeling of financial series. Each model differs from others in being able to
replicate some particular characteristic of financial data. Consequently, it seems
clear that one would want to develop some methods for assessing e x a n t e which
models would be most useful in capturing the behavior of any particular series, as
well as determining e x p o s t how successful they have been. This is the realm of
specification tests.
There are two philosophies that might be adopted when addressing issues of the
appropriateness of any given model. One possibility is that of matching "stylized
facts", i.e., a comparison is made of certain characteristics of the series being
investigated with what is implied by the fitted model. At first glance this approach
to model evaluation may not appear to be very effective, but there are many
examples in which a simple stylized fact has proven remarkably effective in
weeding out poor models. We have already encountered an example of this in the
form of the asymmetric relation between stock volatility and the level of returns.
As mentioned previously, this is a stylized fact that has proven to be very
influential in delineating the class of models needed to fully describe stock returns.
A second example is the " j u m p " behavior of recursive variances. Unless a model
is capable of replicating this feature one would be concerned about adopting it. A
third example would be the ability of models to replicate the strong peak seen in
the density of returns. If U.S. daily stock return data from 1835 to 1987 is used,
the estimates of f(0) are 0.63 (data); 0.46 (GARCH); 0.40 (EGARCH); and 0.45
60
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
(AGARCH); all of the models producing substantial understatements of the f(0)
seen in the data.
A more conventional approach to the evaluation question is to postulate an
alternative specification to the one being examined and to construct Lagrange
Multiplier (LM) tests for whether the alternative model is to be preferred. Of
course, such a test can also reveal inadequacies in the maintained model even if
the alternative is not correct, i.e., the diagnostic test can have power against
alternatives other than the one specified. Let 0 be the set of parameters in both the
alternative and null models, where a sub-set of 0 being set to zero produces the
latter. When ~, is N(0,1) it is easily seen that the scores for 0 are
' E(x,-E0(rt2/00(e~- I) (or I E f f t 2 O f f 2 / O O [ - - ( f t ( E t ) / f ( ~ t ) ) E
t - - 1],
when e t
has density f(e)). Hence the LM test that the maintained model is correct is just
based on TR 2 from the regression of ( ~ - I) upon 0 =2 00,2/00, where the hats
indicate that 0 is replaced by the MLE of 0 under the null hypothesis. In the
special case when the null hypothesis is that there is no ARCH, 0~ = 0 2 and
= x J & . Thus, if this is the null, and the alternative is an ARCH(1) process,
(rt2=ao+Ct,x)_,,
then ( ~ - 1 ) = ( x , 2 / O
2 - 1) is regressed against
~ - 2 00t2/0ao ( = 0 - 2 ) and 0 -2 00,2/00L! ( = 0 - 2 x 2 t _ l), which is the same as the
regression of x~ upon an intercept and x 2_~, i.e., the test used previously to
assess dependence in returns. 42 If instead we are checking that the model is
ARCH(2), when A R C H ( I ) is maintained, then (~2= (~0 +&lxt2-t,00~/0Cto =
1,002//0Ot l = Xt2_ 1,01~t2//00t. 2 = Xt2_2 and the regression is now ~:2 = ~ t - 2 x t 2 against
unity, E^2t - I and Et--2"
^2
The central problem therefore is to determine a suitable alternative model.
Mostly, the proposals have related to different models for the conditional variance.
Engle and Ng (1993) suggest that the extra terms be dummy variables of either an
additive or a multiplicative nature, e.g., a dummy variable that takes the value
unity only if x t_ t is negative would imply the possibility of an asymmetric
response if we try to add the product of it with x t_ ~ to the existing model. Pagan
and Schwert (1990a) argued for a non-parametric approach, approximating trt2 by
a G A R C H process along with Fourier terms. They then tested the significance of
the Fourier terms. There are clearly many alternative ways of generating diagnostic tests, varying by what we choose to add.
Instead of trying different types of non-parametric approximations, an alternative is to derive tests based on a very general parametric model. If one derives an
LM test for the presence of the jth lag in O't2 from the A G A R C H model discussed
earlier, then it involves the regression of xt2 against unity, x2_j, Ixt_jl, x t _ s, and
x, lxt_j[, followed by a test that T times the R 2 from this regression is zero. This
result is easily established by looking at the derivatives of (rt2 with respect to the
42 Lee and King (1993) point out that, since ct I >_0, the LM test can be improved upon by taking
account of this sign information.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
61
unknown parameters of the A G A R C H model. A different perspective on such test
procedures is to be had by recognising that the LM test is testing if
E[~r72(8crff/80)(e~- 1)] = 0, and this moment restriction holds under the null
hypothesis because (e~ - 1) has a zero expectation with respect to any function of
past dated random variables. One of these is that selected by the LM test viz.
(rtZ(Str,2/80). Pagan and Sabau (1992) proposed that other functions might be
used, in particular those that were related to tr if, since an estimate of this variable
was available under the null hypothesis. They applied the test to a number of
applications and found that the fitted models were rejected. Nelson (1989) also
provides tests of this nature.
It should be noted that all of the tests described above require that et be N(0,1).
If the density f ( e ) has some other form, then e ~ - 1 is replaced by ~t =
-[f'(e)/f(e)]e~1. However, since f ( e ) will typically involve some parameters
that need to be estimated, e.g., the degrees of freedom parameter if f ( e ) is
Student's t, it is not generally true that one can regress ~t against quantities such
as trt-2(Oty~/a0) to produce the requisite test statistic, since the fact that the
density parameters have been estimated means that some allowance needs to be
made for this effect. To analyze this problem it is useful to note that all the tests
discussed in this section are testing a moment condition of the form E[ ztd~,] = 0,
where ~bt is e ~ - 1, etc. and are therefore based upon testing whether Y'.ztd~t is
zero, In order that one can ignore the fact that ~t involves estimated parameters, it
is necessary that E[ zt(O+t/OO)] = 0, and this will rarely be the case. To account
for this dependence one needs to follow Newey (1985) and Tauchen (1985),
regressing z,d~t against an intercept and the scores for 0 and testing if the intercept
is zero. 43 The procedure is explained in Pagan and Vella (1989). Note that d?~
may also depend on estimated parameters if the conditional mean depends on
A R C H parameters, as occurs with the A R C H - M models discussed later, so that a
similar adjustment needs to be made for tests in that context.
If one just considers whether the model might be G A R C H or E G A R C H , it is
clear that the two formulations are not nested, owing to the use of the logarithmic
transformation and the fact that le,[---lcrt-~xtl and e t = ( r ; - l x t are the driving
forces in the E G A R C H model. Pagan and Schwert (1990a) compared the log
likelihoods from both estimated models, and one could modify this criterion along
the lines of Schwarz (1978) to reflect the extra parameters in EGARCH. Going
beyond such simple comparisons, there are a number of ways that one might effect
a non-nested comparison. The simplest is through "artificial nesting". To perform
this, define tr,zc and tr 2e as the G A R C H and E G A R C H conditional variance
specifications respectively, and set up the expanded model % 2 = ~ r ~ c + ~O.t2E,
leading to an LM test that ~ is zero (or unity). Testing if ~ is zero, i.e., testing for
43 Typically zr will involve estimated parameters so that the scores for these would also need to be
added to the regression.
62
A. Pagan~Journal o f Empirical Finance 3 (1996) 15-102
the superiority of the GARCH model, would involve taking T times the R 2 from
^2 upon unity, ~[2c ao.tZc/a0,
the regression of the standardized GARCH errors e,
and ~ 7 2 c d 2e. To fully account for the non-nested character of the models is
much more difficult, as it is hard to evaluate what the expectation of a quantity
such as the likelihood for one model would be if the other is correct. One way of
doing this is by computer simulation, just as for the choice of linear and log linear
models in Pesaran and Pesaran (1993), i.e., one could estimate both models from
the data, simulate x, from (say) the estimated GARCH model, compute whatever
criterion was being used to compare the models, and find its empirical distribution
by performing enough replications. 44 Kearns (1993) reports an application of this
idea.
Apart from the issues of appropriate specification of the conditional variance
function there has also been some concern over the stability of the conditional
variance coefficients. Lamoureux and Lastrapes (1990) argued that the fact that
parameter estimates seemed to indicate an IGARCH process might simply be
symptomatic of instability in GARCH parameters, and they split the sample to test
this hypothesis. Lee and Hansen (1991) provide a test of stability of the GARCH
coefficients using Nyblom's approach (Nyblom, 1989). This is essentially an LM
test of the hypothesis that the coefficients follow a random walk versus the
alternative that they are constant, and involves the cumulative sums of the scores
with respect to the GARCH coefficients thought to vary. The statistic used is
L* = g,r__ 1(r.'j= , d o #• V - '
(Ej=' l d0,j), where do. t are the scores for the GARCH
parameters, and V is an estimate of the variance of do, t. Hansen (1990) gives
tables for the distribution of L* under the null hypothesis that the coefficients are
constant. Hansen (1994) also applied the test to whether the degrees of freedom
parameter of the t-density varies as a way of determining if an ARCD model is
necessary. Chu (1995) also provides a test for the stability of GARCH parameters
based upon the summation of the scores over sub-samples.
3.5. A R C H models and diffusion processes
GARCH models have been fitted to data with a wide range of frequencies, and
it has been noticed that, while the GARCH effects seem to disappear the longer
the sampling interval, they become more intense for very short intervals, in the
sense that 13~ + ~
1. Diebold (1988) showed that, if one aggregated GARCH
processes over time, as the sampling interval became large y, tended to resemble a
normally distributed random variable - see also Diebold and Lopez (1995).
Nelson (1990c) and Drost and Nijman (1993) represent the most comprehensive
44 In an interesting paper W e s t et al. (1993) c o m p a r e different models for the conditional variance o f
e x c h a n g e rate returns through the utility gained b y an investor as the different models are used to
determine portfolio allocations.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
63
work on the general topic of the effects o f changing the sampling interval upon the
nature of models for conditional volatility. Drost and Nijman focus upon the A R M
A(1,1) process in (14) and study what happens to it as one aggregates over time.
They distinguish between three types of G A R C H processes, depending upon the
nature of the errors e t and v , - strong G A R C H if ~t is i.i.d.; semi-strong G A R C H
if e, is a martingale difference; and weak G A R C H in which E ( v , x ~ _ ) = 0 for
j > 1, r = 0,1,2. Only weak G A R C H aggregates to a process o f the same type. In
particular, if one tries to aggregate strong G A R C H , the equivalent innovations to
e t in the aggregated form will not be i.i.d. 45
Once one thinks of questions of aggregation of G A R C H processes in terms of
aggregation o f an A R M A process it is clear why one gets some of the outcomes
noted above. Consider an AR(1) for a stock variable Yt with Yt = 13 Yt- 1 + et,
where e t are uncorrelated. If this is observed at intervals m t ( m > 1 being an
integer), then Y,,, = 13"Y,,~t-~) + ~,,t, with ~,,t being uncorrelated with
Y,,u- ~ ) , ' " , leading to the results that, as m ~ ~ , y, becomes uncorrelated, while,
as m ~ 0, 13" ~ 1 and so a unit root tends to appear in the process.
In the analysis above, the G A R C H model is assumed to hold for some time
interval and a study is made of what happens as the time intervals of observation
expand or contract. Many models developed in the theoretical finance literature
express the evolution of financial variables in continuous rather than discrete time
and sometimes this is very important to the solution of those models, e.g., in the
B l a c k - S c h o l e s options pricing model. There are also some conceptual advantages
to working in continuous time. For example, information about the prospects of
companies continues to arrive when trading in shares is suspended, and this has
implications for the modeling of volatility on a daily basis. For example, one
would expect share trading on a Monday to be different to a Tuesday, because a
much longer interval o f time has elapsed since the last trade, and hence there has
been a longer interval for information to accumulate. Thus an investigation of the
relationship between continuous and discrete time models seems important.
Most of the continuous time processes used in finance are diffusions with the
format
+ (TtY~t d~/Vit
(34)
d l o g e 2 = a o d t + (13, _ 1) log ~r,2 dt + cdW2,,
(35)
dy r = adt + bytdt
where dW~t and dW2t are correlated Brownian motions. If the diffusion process
had non-stochastic constant volatility, i.e., it was (34) only, it was possible to find
a form for the density function that could be used to set up the likelihood and
hence produce M L E ' s o f the unknown parameters. Numerical integration is
45 This suggests that it would be illegal to perform MLE on a series with the same assumptions about
¢, for a number of different sampling intervals. However, based on simulation studies, Drost and
Nijman conclude that "'the asymptotic bias of the QMLE, if there is any, is small" (p. 922).
64
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
involved to get the density in most instances however. Moreover, it seems unlikely
that one can assume that volatility is non-stochastic with financial data, so that
other ways of estimating the parameters need to be explored. There are many ways
to approximate stochastic differential equations - see Kloeden and Platten (1992)
- o f which the simplest is the Euler scheme, which discretizes the equation using
an interval of length ht (h <_ 1) to get
1
A yth = ah + bhy(t_ I)h + crr,~Y~hh~-e'
A log o,2 = ¢x0h + (13,
--
1)hlog
O ' (2t -
(36)
l ) h ql_
ch~.qt,
where e, and -q, are n.i.d. (0,1) random variables.
As h ~ 0 this approximation should tend to the continuous time one. O f
course, data on Yth is not observable unless h = 1, and setting h = 1 produces a
very coarse approximation that can result in substantial biases in the estimators of
the parameters o f the diffusion. However, as Duffle and Singleton (1993) pointed
out, one can simulate from (34) and (35) by an approximation such as (36), and
thereby compute the moments of the continuous processes, which in turn may be
used to match the moments o f the data via a G M M estimator. 46 Alternatively, one
could use indirect estimation methods, simulating from the continuous time
process, but estimating using a discrete time auxiliary model. Gouri6roux et al.
(1993) use the Euler approximation with h = 1 as the auxiliary model and
h = 1 / 1 0 to simulate the data, while Engle and Lee (1994) use G A R C H as the
auxiliary model. 47
A n interesting question that arises when one attempts to estimate models closer
to the finance literature is the treatment of the parameter "¢. Most investigators
have set this to zero, although the literature on the term structure frequently works
with models in which "¢ = 0.5, see Cox et al. (1985). If ~/4: 0, there is a " l e v e l s "
effect on volatility, which is over and above whatever specification is adopted for
(rt2. There is some interest therefore in estimating ~/. Chan et al. (1992) use (36)
with h = l ,
( r t = ( r , since then E(u~-cr~y2t't)=O, where u t = ( A y t - - a - bYt_~) 2, provides a moment condition to estimate "V. Brenner et al. (1994)
generalize this to allow for o't2 to be a G A R C H model, since it is possible for the
levels effect on volatility to simply reflect mis-specification in or,2. They find
46 One might wish to use as accurate an approximation as possible; second order schemes such as the
Mihlstein (1974) approximation might therefore be preferred.
47 There are many other questions focussing on the relation between G A R C H and continuous time
processes that have been examined in the literature. Nelson (1990c) shows how to formulate a
continuous time SV model that is the limit of a G A R C H process. Also of interest has been the question
of whether a G A R C H process might be used to consistently estimate continuous time volatility.
Because G A R C H models average e~_j, Nelson and Foster (1992) interpreted them as a filter and
showed that the answer was in the affirmative. A good summary of this literature is available in
Bollerslev et al. (1994).
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
65
strong evidence of a levels effect. A similar conclusion is reached by Koedijk et
al. (1993) who have cr~ as driven by e,-12 rather than ~r2t_l¢2t_l. Pagan et al.
(1995) and Broze et al. (1993) use indirect estimation to estimate -/ keeping ~t2 as
a constant. In the first paper the auxiliary model was either the discrete Euler
approximation with h = 1 or an E G A R C H ( 1 , 1 ) model; in either case the value of
48
",/ was around 0.7 for short-term interest rates.
3.6. A r e means and variances related? The G A R C H - M model
W h e n modeling returns it is conventional to take these as the excess over some
risk free rate, so that it is really a risk premium that is being explained. Similarly,
the difference between spot and forward rates is a risk premium. Theoretical
models, such as Merton (1973), make the market return a function of volatility,
i.e., the risk premium should be larger when the asset return is more volatile. O f
course, for an agent making decisions at a point in time t, the appropriate concept
of volatility is what the conditional variance of the asset return, try, would be over
the holding period for the asset, leading to the relation
xt = ~ g ( tyt2 ) + e,.
(37)
In M e r t o n ' s model, g(crt2) = 13tr2, 13 > 0, but other functional forms for g ( - )
could emerge, possibly allowing the response to depend upon the sign and level of
tr 2. Higher order conditional moments might also enter into the relation in certain
types of utility maximizing models, although, in an interesting experiment pricing
stocks in an economy with a stochastic dividend process, Gennotte and Marsh
(1992) found that g was linear,
Models such as (37) feature a conditional mean for returns that depend on
higher order conditional moments. Early work on this topic, e.g., Pindyck (1984),
made cy2 a distributed lag of the squares of returns, i.e., ty 2 K
2
andcan
proceeded to regress returns on this constructed variable.
4~Many--Y"k=°wkxt-k'criticisms
be made about this practice see Pagan and Ullah (1988), but an important one is
that the assumption implies a very arbitrary model for the conditional variance.
This leads to the idea of expressing tr~ as some parametric model in the G A R C H
class, and Engle et al. (1987) accomplished this with their A R C H in mean
( A R C H - M ) model, wherein (37) was augmented by an A R C H model for tyf such
as
(r~=ao+c(le~_
,.
(38)
M a x i m u m likelihood was used to estimate the unknown parameters (once a form
for g ( . ) was specified). Assuming that e t was conditionally ~¢I(0,(r~), it is easily
48 Some of the volatility may be due to jump processes driving the diffusion. Das (1993) estimates
such a combined model for bond yields.
49 More precisely a weighted average of (x t - 2t) 2 was employed, where 2 t was a rolling sample
average.
66
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
seen that this means x, is conditionally normal (~g(ot 0 + ot~e2_ ~),crr2), allowing a
likelihood to be easily written down and maximized. Extensions to allow (r,: to
follow GARCH or EGARCH forms are quite straightforward.
Analysis of ARCH-M models is much more complex than was true of pure
ARCH models. In the latter, whatever variables impinged upon the conditional
mean could be regressed out, and then the residuals from such a regression, ~,,
might be used for diagnostic checking and specification. Previously, the autocorrelation function of the squares of the residuals was a primary device for specification analysis. Now, however, it is impossible to estimate e t without first specifying a valid model for cr~ (and g(.)), so that pre-estimation specification analysis
is now very difficult, making post-estimation investigation very important. It was
for ARCH-M models that the specification tests set out by Pagan and Sabau
(1992) were developed. Since estimation is by MLE, specification tests involve
setting up some moment restriction that is true under the null, E(mt(O))= O,
regressing mr(0) against unity and the scores for 0, and then testing if the
intercept in such a regression is zero.
The ARCH-M structure can also be troublesome when trying to establish
sampling theory for parametric estimators. Clearly, if (rt2 was to be made
IGARCH, a very common finding in the applied literature, there would be a
regressor in (37) that would have an infinite variance. If g(. ) was the square root
function then the regressor would be ~r, and one might weight the data with this
term, as was done in the proofs for IGARCH, but then difficulties would arise in
estimating an intercept in the equation. A theoretical analysis of this problem is
important, given the prevalence of the GARCH-M model in the analysis of risk
premia - see Bollerslev et ai. (1992) for a list of applications. Recently, Lee
(1992) has established sufficient conditions for the asymptotic normality of the
MLE of the GARCH-M (1,1) model when g(cr~) = (rt. An important condition is
that E[([31 - o~ l~Et//[~l --[- o£ IE~ I,~-t_ 1) ] < 1. When et is n.i.d. (0,1) this becomes
an unconditional expectation and, with ot z + 13~--1, the condition effectively
implies that 8 has to be in the range - 3 < ~ < 3.
An alternative approach that can sometimes pay dividends is to engage in a
non-parametric analysis, estimating Et_l(X t) and cr~2 in this way rather than
through parametric models. Pagan and Hong (1991) had some success with this
procedure for the series on the excess yield on Treasury Bills used in Engle et al.
(1987), but very little for the return to equity on the NYSE. In fact, the conditional
variance of the latter was well estimated, and displayed the typical skewed
response to the sign of returns mentioned earlier, but no relation was found
between the conditional mean and variance of returns. Indeed, I think that this is a
fair summary of the outcome of parametric investigations as well. In a marked
departure from the consensus of the literature, Linton (1992), using semi-parametric techniques, finds that the g ( . ) function in (37) should not be linear, and that
taking account of the non-linearity is crucial for finding an important impact of (r,2
upon the excess holding yield.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
67
A note of caution needs to be sounded regarding the use of non-parametrics in
this context, as the presence of cr) in (37) means that E t_ i(xt) depends upon all
past returns, even if ix) is just an A R C H ( l ) model. To see this let g(ty)) -- cr2,
2 i, so that (37) becomes
and (r~ -- a 0 + ot i e ,x, = 8(or 0 + txle~_ l) + e,
(39)
By continued iteration it is evident that all past x t appear in the determination of
x r In fact, the dependence is a very non-linear one, and conditions for x, to be
covariance stationary have yet to be worked out. From this simple model it is
apparent that A R C H - M models imply that the conditional mean of x t depends on
an infinite number of conditioning elements, whereas the methods of non-parametric analysis are restricted to a finite number. Hence, a non-parametric analysis
could not exactly replicate an A R C H - M model structure, and the quality of its
approximation will be a function of the magnitude of coefficients such as ~xj and
131.
4. Statistical representations of multivariate data
In an earlier section we described the properties of univariate data and
statistical models that were useful descriptions of such data. However, financial
analysis is largely concerned with the relationship between series, e.g., options
prices and volatility, rates of return on different assets, forcing a proper analysis of
the multivariate characteristics of data. By far the most important summary
measures of how series relate to one another have been those concerned with the
number of factors that are common to the data, and we will organize this section
along those lines.
4.1. The common trend factor
Given that financial series are frequently I(1), determining if series are
cointegrated is a first step in much financial analysis. For example, as seen in
Baillie (1989), the expectations theory of the term structure for n maturities would
imply that there are (n - 1) co-integrating vectors among these bond yields. There
are a variety of methods for testing for the presence of co-integration and the
number of co-integrating vectors. Johansen's technique is perhaps the one most in
use as it has been programmed into many microcomputer packages (Johansen,
1988). Testing if the one month, two month, three month, four month and five
year zero coupon bond yields in the McCulloch data set are co-integrated,
Johansen's maximal eigenvalue test statistic values were (with 5% significance
critical values in brackets): 72.89 (34.4) for r = 0 vs. r = 1; 51.94 (28.13) for
68
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
r = 1 vs. r---- 2; 39.89 (22.00) for r = 2 vs. r3; 31.28 (15.67) for r = 3 vs. r = 4;
2.90 (9.24) for r = 4 vs. r = 5. Consequently, one could be comfortable with the
conclusion that there are indeed four co-integrating vectors within this data.
However, in many applications, the co-integrating vectors are frequently predicted to be certain values by theory, so that one might want to impose these and
then test if the series formed from such linear combinations are integrated or not.
An example is the term structure case which predicts that any two yields are
linked by a co-integrating vector [1 - 1]. Thus, we could form the vector of series
such as the yield differential between yields at all maturities and one-period yields,
and then test if there are unit roots in these series. We might in fact just run a
sequence of unit root tests, such as were discussed earlier, for each of these series
separately, rather than explicitly looking at it in a multivariate framework. In any
case, the techniques for testing for co-integration, and the rationale for them,
requires a separate treatment. It is probably worth mentioning that care has to be
taken when estimating co-integrating vectors, if more than one is expected, as their
lack of uniqueness means that reduced form estimators such as Johansen's may
produce linear combinations of the vectors rather than the vectors themselves. To
illustrate this, Johansen's estimator was applied to a data set consisting of one
month, 10 year and 20 year bond yields from the McCulloch data set. The
expected co-integrating vectors were [1 - 1 0] and [0 1 - 1 ] , i.e., the interest
differentials are I(0), but the actual estimates found (by MFIT386) were [ - 1 2.8
- 2 . 0 ] and [ - 1 - 16.3 17.5]. Approximating these by [ - 1 3 - 2 ] and [0 - 1 1], it
is seen that the estimates are the linear combination - 1 × [1 - 1 0] - 2 × [0 - 1
1]. In this instance it is easily seen how to unscramble the final estimates to get
back to a set of basic co-integrating vectors, but in those cases where there are
u n k n o w n elements in the vectors that would not be possible. 50 For example, if
one was estimating a demand elasticity for financial asset holdings using observations (say) on interest rates and an aggregate portfolio, there is a risk that the
estimates found could actually be linear combinations of the demand and supply
elasticities. The solution to such problems has to be to formulate and estimate
structural models and not to attempt to recover structural parameters from a
reduced form.
Because of the emphasis upon factors in financial analysis, in most instances it
is useful to re-formulate co-integrating relations as the c o m m o n trends idea in
Stock and Watson (1988). Let Yt be an n X 1 vector of I(1) variables with r
cointegrating vectors. Then
Ay,=C(L)e,
(41)
50 For the McCulloch data the hypothesis that the co-integratingerrors are the spreads is rejected.
Pagan et al. (1995) discuss what might be the cause of this outcome. Possible reasons are non-linearities in the underlying processes, the presence of levels effects in volatility, and the effect of having
near-integratedrather than strictly integrated series.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
69
would be an M A representation of the I(0) variable A Yt, and this may be written
as
A y t = C ( 1 ) G + C* ( L ) A e t,
(42)
using the result that C ( L ) - C(1) + AC* (L) (C* (L) is actually defined by this
relation). If there are r co-integrating relations among the y~, defining a vector of
stochastic trends "rt with dimension ( n - r) which evolve as
"rt = "rt_ l + d~,,
(43)
where "ro = 0, we get the " c o m m o n trends" representation
y, = J'r t + C* ( L ) e,.
(44)
In (44) Yr is driven by ( n - r ) stochastic trends and qbt in (43) is white noise. 5~ If
there were no co-integrating vectors ( r = 0) there will be n trends driving the n
variables, i.e., there are no c o m m o n trends and each variable exhibits separate
trending behavior. For the term structure case r = n - 1 and there should be a
single common trend that drives all interest rates; for a small country this is
probably the world interest rate. Many studies have been performed on the number
and nature of common trends in various asset price series, e.g., Baillie and
Bollerslev (1989) for exchange rates and Kasa (1992) for stock prices.
4.2. C o m m o n factors in detrended series
It is possible that there may be common factors in series after these have been
detrended in some way. There are two ways in which these common factors might
be manifest. In the first, the cov(ut), u t = C * ( L ) e t , has full rank but with
restrictions upon it induced by a factor structure, while the second features the
case in which u t = C * ( L ) e t has a rank-deficient covariance matrix that can be
further decomposed into J~,, where p(cov(u,)) = p(cov(~,)) = p ( J ) < n, and p(. )
denotes the rank of the matrix in brackets. This latter situation leads to what Vahid
and Engle (1993) have called the " c o m m o n - t r e n d , c o m m o n - c y c l e " representation
Yt = J"rt + J ~ t ,
(45)
and, cast into our taxonomy, the series can be regarded as being composed of two
types of factors, the " c o m m o n trends", % and the non-trend factors, ~ r 52 This
type of representation is very popular in the term structure literature, e.g., Cox et
al. (1985), Chen and Scott (1993) and Duffle and Kan (1993). Two different views
51 We have suppressed the dependenceof Yt upon Y0.
52 Although we have used the terminologyof Vahid and Engle the situation we are interested in is
different to that of those authors. The var(ut) could be rank-deficientbecause either all elements in
C*(L) or var(e~) are singular. Our focus is upon the latter; Vahid and Engle concentrate upon the
former.
70
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
of the non-trend factors can now be elicited, depending upon how y, is "detrended" - the first concentrates upon returns A Yt, while the second deals with
the co-integrating errors [, = cx'y,, where et is the ( r × n) matrix of co-integrating
vectors. The latter will generally be the " s p r e a d s " between asset prices.
4.2.1. Factors in returns
The most famous factor model of returns r t = A y t is of course the "market
m o d e l " i n which r, is a vector of returns. 53 Suppose that the objective is to
explain the return on the jth portfolio of assets rj,. If there is only one asset per
portfolio this will simply be the return on that asset. " P o r t f o l i o s " can be
interpreted in diverse ways. For example, Harvey (1991) has rj, as the return on
equity in the jth country. In the market model rjt is related to the return on the
market or aggregate portfolio, r,,,. The latter may be a simple average of the
returns to all stocks in the economy or perhaps is a weighted average, with weights
depending on the value of the portfolio accounted for by each stock.
A popular way to derive the factor structure is to impose the restriction that rit
and r,,, are multivariate normal, from which it follows that
E ( rjt [rmt ) = etj + ~jrmt ,
(46)
where 13j = c o v ( r j t r , , , ) / v a r ( r , , t) and a j = E(rj,) - [3jE(rm,). Therefore,
rjt
=
a i + 13jr,n' + ej,.
(47)
Hence, ri, could be predicted once rm,, a j and 13j are known. The object of
interest therefore is the determination of 13j, as that parameter summarizes the
relationship of the return on the jth asset to that of the market portfolio.
Traditionally 13j has been estimated by regression, meaning that the range of
problems encountered with any linear regression model recur here as well.
Examples would be that outliers in the data can affect the point estimate of 13i,
which encourages robust estimation of the parameters, while the presence of
heteroskedasticity in returns can also be true of the errors in (47), thereby
demanding the computation of heteroskedastic robust standard errors in order to
make valid inferences. Some have argued that a linear relationship might be
inappropriate. Tracing the argument back to the source of linearity however, the
question becomes one concerning the underlying multivariate density. Linearity is
associated with normality but not uniquely so, for example Spanos (1986, pp.
122-125) points out that the Student's t-density yields a linear conditional
expectation, while densities such as the bivariate Pareto result in non-linear
conditional means. Moreover, models formed from densities such as the bivariate t
also have implications for conditional variances. Non-parametric estimation methods can be usefully employed here to shed light on the conditional moments of an
asset return given the market return.
53 This is an example of a factor model in which cov(u,) is not singular.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
71
It is possible to produce multiple factors by concentrating upon the nature of 13-i
in (47). There it was a constant equal to the ratio of the cov(rit,r,, t) to the variance
of r,,,t. However, in line with the distinction between conditional and unconditional moments, one might wish to consider models for rjt in which the conditional rather than the unconditional density of returns is utilized. To this end let
Ft- t be a set of conditioning variables, not including rjt and rmt but containing
their past histories. Then, the conditional capital asset pricing model has
E[ r-it [ F,_ 1] = fA-itE[ r,,,[ Ft_I]
(48)
where 13j, = cov(r-itr,,,, [ F t_ l)/var(r,,,t [ F t_ 1).
Because the coefficients of the conditional market model are functions of the
conditional moments for r i, and r,,,, it is necessary to model these in some way.
Assuming that
rmt = E jK= l r j t = l P r t ,
where r~ is the (1 x K ) vector of returns (rl,...rKt), it is clear that var(r,, t I Ft_ 1)
= var(~rt ] Ft-1) = /'c°v(rt I Ft- 1)~, while cov(rjtrmt I Ft- 1) = l'cov(rt I Ft_ t),
showing that the conditional moments appearing in (48) are determined by the
conditional means and variances of the returns r-it , so that multiple factors are
involved, albeit in a non-linear fashion. This is also true for the literature on
optimal hedging where the conditional variance of the optimal hedged portfolio is
minimized by choosing as weights the ratio of the conditional covariance of spot
and futures prices divided by the conditional variance of the futures price.
Modeling these constituent moments was the concern of earlier sections. A natural
way to proceed is to allow the conditional mean for r jr to depend on its
conditional variance, as in GARCH-M models; and to subsequently model
cov( G IF,_ i) by a multivariate GARCH or factor GARCH structure described
later. Bollerslev et al. (1988) estimate such a model for stocks, bonds and bills.
Estimation is just as for the GARCH-M model described earlier. A number of
other applications of this model to financial data have been made and these are
described in Bollerslev et ai. (1992). Baillie and Myers (1991) apply the GARCH
framework to the determination of optimal hedge ratios.
Although the linearity of (48) seems to be intimately connected with a
conditional normal density for returns, models have been suggested that take the
linearity as a prior constraint and then proceed to specify other models for the
conditional moments than those in the GARCH class. Harvey (1991) makes
Et(rjt)
= Z't~-i and Et(rmt) = Z'tS,,, where zt are elements in F t_ 1, so that e-it = rj,
- z't~-i and e,,,t = r,,,t- z't ~, allowing (48) to be written as
ztSj
'
-- ( zt~,.)E[
'
e-ite,,,tl z t ] / E [
e,,,,
2 I zt],
(49)
from which
E[ e~, Z;~j I zt] - E[ ej, e,., z;~,~ ] zt] = 0.
(50)
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
72
By definition E[ zt emt] = E[ zt %,] = 0, and these two moment conditions, along
with (50), may be used to estimate ~j and ~,, by GMM. No estimate of [3j, can be
made without being more specific about the conditional moments, but it is possible
to test various hypotheses about the conditional CAPM using the GMM estimates,
as well as to assess how adequate an explanation it provides of the data.
There are many variants of this idea of estimating 13 from its conditioning set,
e.g., Schwert and Seguin's (1990) single index model of the conditional covariance between the ith and jth portfolio (see (63)) implies a covariance between the
jth portfolio and the market portfolio that is a linear function of the market
conditional variance, so that [3jt can be estimated quite simply from their model
(Schwert and Seguin, 1990).
Braun et al. (1992) set up a model, initially based on (48), treating r,,, and rjt
as coming from a recursive bivariate EGARCH process in which the innovations
driving (r,,t,
2 Zm,, are "aggregate shocks", while crj2 depends on specific shocks to
the jth portfolio, z jr, as well as to the market shocks. Subsequently, however, they
adopt an ad-hoc specification of 13t as in (51) (the definition of z's is given below
(15))
13jt"c--~kO'~-~k4(~jt_l--~kO) q-~klZm.t_lZj,t_l'~-)k2Zm,t_l'~-~k3Zj,t_l ,
(51)
where the last two terms allow for leverage effects on conditional betas; if both are
negative conditional betas rise when returns fall.
The distinctive feature of the approach just mentioned was the formulation of
the conditional beta as an autoregression. Representing [3jr as an autoregression
seems to have been successful in practice, Rosenberg (1985), and one might think
of it as an approximation to whatever the true process for 13it is, hoping that by
choosing the order of autoregression large enough a good approximation would be
achieved. Rosenberg formulates his model as
rj, = ta + ~3jtr,,,t + ej,
(52)
f3j, = pj13j,_ l + vj,,
(53)
and estimates the unknown parameters
techniques. Generally this is done by
conditioning set based on past data and
it follows that the log likelihood of
asymptotically negligible)
1
1
p:, crfj by varying coefficient regression
maximum likelihood. Defining F t as a
rmt, if rj~ is N(~t,cr 2) conditional on F,,
rj~ . . . . . r : is (ignoring terms that are
- -~ E l o g 0.2 - -~ Eo-t 2( F j t - I,.t,t)2.
(54)
Generally, P-t and er~ can be computed by the Kalman filter for any given
values of the unknown parameters, and so (54) can be maximized. There is
obviously some difficulty in allowing rm, to appear in the information set as it is a
linear combination of all the portfolios, although for a large enough number of
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
73
assets, and small portfolios, we might think of the market as solely reflecting the
macro shock and it is therefore this that is being conditioned upon; the error term
e~t then just reflects the micro shocks.
4.2.2. Factors in spreads
The dichotomy in the ways that factors manifest themselves is most important
when the detrending is done via the construction of " s p r e a d s " . Mahieu and
Schotman (1994b) consider a model of bilateral exchange rates in which the
changes in the rates reflect differences in the news in each country. Taking a
system of bilateral rates it is clear that there is a c o m m o n factor - the news
occurring in the country whose currency is being used to express the rates. Thus, if
we consider the y e n / S , m a r k / $ and f r a n c / $ rates, news about the U.S. will
represent the c o m m o n factor, and this imposes a restriction upon the covariance
matrix of the returns which can be used to estimate the characteristics of the
factors.
Factors giving rise to a singularity in the cov(u t) need to be given a separate
treatment. A slightly different interpretation o f the factors ~, in (45) is available
by multiplying (45) by the transposed matrix o f co-integrating vectors, ct', and
imposing a ' J = 0, to get ~ = c~'yt = ct'.7~. Hence,
p(cov(~,))
:-
min(p(a'J), p(cov(~t) )
=
min(r, dim(~,)).
Since the total number of factors must be less than n, d i m ( ~ ) < n - (n - r ) = r,
and it is apparent that p(COV(~t)) = p(COV(~t)), i.e., the common factors ~t show
up in the cointegrating errors ~t. Generally, the co-integrating errors have a clear
interpretation in financial data as " s p r e a d s " , e.g., in the case of the term structure
they are the spreads between yields of different maturities, while in relations
between a spot and a forward market they would be the forward premium.
Consequently, this result directs attention to the spreads in the search for c o m m o n
factors. 54
Spreads between asset prices have been extensively studied in recent years.
D y b v i g (1989) applies principal components to ~, to conclude that the number of
factors in the term structure is much lower than the number of rates. Applying
principal components to the data set composed of sp2, sP3, sp4, sps, sP6, and spg,
where spj is the spread between the j-month and one month yields, the eigenval-
54 Vahid and Engle (1993) noted that, if there were common cycles in Yr, there exists a ~ such that
~'J = 0, meaning that ~'Ay~= ~'J~b~ would be white noise and a test for common cycles could be
regarded as a test for serial correlation in the residuals ~'Ay~, where g is an estimator of ~. Obviously
determining the p(cov(~)) represents another way of finding the number of common cycles. There
may be advantages to it as this method does not require the use of instrumental variables as their test
does.
74
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
ues of the covariance matrix of these variables are 1.8, 0.05, 0.005, 0.002,
7.6 × 10 -5 and 1.3 × 10 -5, pointing to the fact that these six spreads can be
summarized very well by two components (at most), s5 Restricting attention to
s P 3 , sP6 and sP9, the eigenvalues are 1.2, 0.03 and 0.002, telling the same story.
Knez et al. (1989) estimate a factor model for the excess yields over the repo rate
by maximum likelihood. Other factor models, based on the C o x - I n g e r s o l l - R o s s
model of the term structure, have been estimated by Chen and Scott (1993) and
Pearson and Sun (1994).
A difficulty in directly applying principal component methods is that ~t will
rarely be i.i.d, and some allowance should be made for that fact. To describe the
autocorrelation structure of ~t, return to the levels y,, and observe that the
presence of co-integration among the series means that they follow a vector ECM
(taking a first order system for simplicity)
AYt = ~/~t- I + vl,
so that
a'A y, = ACt = a'~/~,- i + a' v,.
(55)
It is worth dwelling on some of the implications of (55). In particular, the matrix ~/
represents the influence of past spreads upon the returns Ayt. Given the earlier
evidence that it was hard to predict A Yt using past information, it is likely that ~/
will be close to zero, and therefore ~t will exhibit " n e a r unit r o o t " behavior.
Strong persistence in spreads has indeed been observed - see Evans and Lewis
(1994), Pagan et al. (1995) and Hejazi (1994) for the term structure and Baillie
and Bollerslev (1994) and Crowder (1994) for the forward premium in exchange
rates. Baillie and Bollerslev argue that the persistence seen in the premia is best
represented as a fractionally integrated process rather than the I(1) structure
proposed by Crowder. The same point may also apply to the term structure; Hejazi
found that there was no evidence of a unit root in the excess holding yields but
that, if one regressed them on a forward rate which was I(1), the resulting
regression coefficient was non-zero. This suggests that the type of persistence
apparent in holding yields is not of the I(1) variety.
4.3. Relations between financial series
If expected returns remain constant, efficient markets demands that all information has been used in pricing an asset so that returns should be uncorrelated with
information from other sources than just the past history of returns. Thus the
autocorrelation function does not provide a sufficiently wide perspective to judge
55 It is interesting to observe that the covariance matrix of bilateral returns given in Mahieu and
Schotman (1994b, p. 282) does not reveal any singularity, emphasizing the need to distinguish between
the two impacts a factor structure might have.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
75
questions regarding efficient markets, and this has led to research demonstrating
that returns are influenced by a wide variety of other series, e.g., future stock
returns are explained by the dividend yield ( D t / P t ) . 56 Much of this work
involves just simple regressions and little in the way of new econometric technique is needed. One relatively new development has been Fama and French's
extension of their long horizon returns work (Fama and French, 1988b). They
consider the regression of k period returns on stocks, rt+t~,t , against the dividend
yield and test if this coefficient is zero or not. They sample data so as to avoid
over-lapping errors, and therefore work with traditional OLS standard errors when
testing the hypothesis of no relation. This reduces the effective sample size quite
significantly for large k.
Hodrick (1991) points out an alternative approach to Fama and French. Because
a k period continuously compounded return is the sum of one period returns, i.e.,
rt+~. t =-rt+ 1 + . . . + rt+k, the regression coefficient is proportional (in large sampies) to
E[{r,+ 1 + ... + r t + k } ( D t / P , )
],
which is the probability limit of the numerator when r t are stationary ergodic
processes. The denominator just represents a positive scaling factor so a test of
whether the parameter is zero is a test of whether the numerator is zero. But, under
stationarity, this is identical to E[ r, + l{(Dt//Pt) + "" + ( D r - k + l / / P t - k + I ) } ] , SO that
k-I
we could interpret Fama and French's methodology as regressing r t against E j=
0
[ D r _ j / P , _ j] and testing if the coefficient of the regressor is zero. This last
regression has the advantage that it uses all observations and that there is no
overlapping data problem. In practice Hodrick multiplies the sum of lagged
dividends by k - 1 so that the slope coefficient measures the response of annualized
expected returns over a given horizon to a change in the ex ante dividend yield. He
uses heteroskedastic and autocorrelation consistent standard errors, since there is
no certainty that returns are i.i.d, under the null (and we actually know that this is
extremely unlikely).
Bekaert and Hodrick (1992) mention an alternative procedure that reduces the
loss of observations owing to the need to lag ( D / P ) t k times. They put the log of
the first period return, the dividend yield, and the one month Treasury Bill yield
( r b , ) into a vector zt, and then fit a first order VAR
Zt = A z t - l + "qt,
(56)
to this trivariate system. The jth autocovariance of zt is given by C ( j ) = ArC(O),
where C(0) is the variance of zt and is a function of A and var(rl,). Hence the
autocovariance between the log of the first period return and the jth lagged
56 As Gennotte and Marsh (1992) point out one should be careful in giving such correlations a causal
interpretation. They simulated an artificial economy with fully rational actors but the generated data
exhibited a return-dividend correlation since the stochastic output process driving this economy affects
both dividends and stock prices.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
76
dividend yield is e'lC(j)e 2, where e s is a (3 × 1) vector with unity in the / t h
place and zero elsewhere. Thereupon
E[ rt+ ,{( D t / P , ) + ... + ( Dt_k+ ,/P,_k+ ,)} ] = e'lC(1)e 2 + ... +e'iC( k - 1 ) e 2
=
e ' , a c ( O ) e 2 + e',A2C(O)e2 + ... + e ' , A k - l c ( O ) e 2
=
e'lA[l+A+...
+Ak-']C(O)e2,
and we can test if this is zero by finding
var{e', A ( I + A +
... + A k - ' ) C ( O ) e z ) .
This variance can be found using the ~ method. It would seem that the only
advantage this method has is that it economizes on observations, since only one
observation is lost in estimating the VAR, and the autocovariances are all then
derived ones. For small numbers of observations, or for very large k, this may be a
saving, but it is not so clear what the advantage is otherwise. Moreover, there is a
potential cost if the process is not well represented by a linear VAR; in particular
if there are non-linearities affecting various autocovariances.
Just as was true for univariate series one might seek non-linear relations
between series rather than linear ones. Suppose we want to estimate the E( y, I x,),
without imposing any functional form upon this relation. By definition
y, = m ( x t ) + u,,
(57)
where m ( x t) is some unknown function of x, and u, is taken to be i.i.d. (0,~r2).
An example that has attracted some interest would be if y, is the price of a
derivative while x t contains the price of the derivative asset (pt), the strike price
(st), volatility (or), the risk free interest rate (r,) and the maturity of the derivative
(-r). In Black and Scholes' formula u, = 0 and a specific function linking these
quantities is provided,
mBS(Xt) =" pt¢I)( d, ) - ste-r'~'dP( d 2 ),
where
l o g ( P , l s , ) + (r, + a 2 / 2 ) ' r
dl
dz
O'T
=
d I - or-r,
and qb(.) is the cumulative normal density. However, it is known that this formula
tends to break down for out-of-the money and short-maturity options, where the
degree of non-linearity seems more pronounced. Moreover, the derivation of the
Black-Scholes formula assumes that volatility is a constant, whereas implicit
volatilities computed from options prices using the formula change over time and
also seem to be a function of p , / s , - the latter constituting the well known
" s m i l e " effect in volatility. Consequently, it is of interest to estimate m ( x t)
non-parametrically in an attempt to correct for these problems.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
77
The basic idea behind nonparametric estimation of E(ytl x,) is to approximate
m( x t) arbitrarily closely by combining a set of basis functions }2~g=l l3itpi(x) and to
then estimate the 13j, leading to ~ ( x , ) = Y',y=t[39tpj(x,). Hutchinson et al. (1994)
fitted the options model above using neural networks in which ¢pir=(1 +
e x p ( - x ' t ~ ) ) to approximate m(xt). Kearns (1993) simulated data from an options
pricing model with stochastic volatility and non-parametrically estimated the
non-linear function implied by a particular pricing model using the flexible Fourier
form of Gallant (1981), which has as tpit equal to 1, xt, x~, cos(jxt), sin (jx t)
( j = 1,2,...). After this model has been calibrated, it can be used with actual data
to produce predictions of observed options prices. The prediction error may be
used as a diagnostic device. The method seems to work quite well. In Keams' case
98% of the variation in options prices was captured.
Each of the approaches above has approximated the unknown function globally
and then predicted what value it would have at given points x t = x. An alternative
method is to estimate c~ = m(x, = x) by using only those observations whose
value is close to x. Since we can think of re(x) as a constant, the appropriate
estimate is found by choosing ~ to minimize Etr i ( Y t - o t ) 2 K ( x t - x / h ) , where
K ( - ) is a kernel or weighting function that gives low weight to observations for
which x t is far from x. The window-width parameter, h, determines exactly how
far away the observation can be in order to be included in the computation. From
standard least squares theory this gives
~n( x t = x) = ~ = , ~ K ( t ~ t )
t= K(t~t)'
(58)
where ~, = ( x ~ - x ) / h . Diebold and Nason (1990) consider predicting exchange
rates using a conditional mean estimated non-parametrically in this way, the
conditioning variables being the lagged value of the rate. Huang and Lin (1990)
have analyzed non-linearities in the term premiums and forward rates with these
methods. The Cox et al. (1985) model of the term structure implies a linear
relation between expected term premiums on Treasury securities and forward
interest rates. However, the linearity stems from the assumptions on the stochastic
processes driving the model, which make the state variables linear Markov
processes. If the forcing processes were non-linear, term premiums and forward
rates would be related in a non-linear way. Huang and Lin apply kernel based
methods to the data, but discover that the linearity assumption seems quite good.
Pagan and Hong (1991) look at the conditional mean for excess holding yields on
Treasury bills as a function of past yields and the yield differential, finding that the
relation seems to be non-quadratic.
The principal disadvantages of such local analysis are that there are well known
biases in the estimator of m(x) and, when the dim(x t) is large, very few
observations will be used to determine each point. In recent years an attempt has
been made to combine the two approaches by assuming that the function can be
parametrically approximated around the point x by, say, a linear polynomial, so
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
78
that the optimization problem becomes one of choosing ot and 13 such that
Etr-- l ( Y t - c t 1 3 ( x t - x ) ) 2 K ( t ~ , ) is minimized, e.g. see Fan et al. (1994), and it
seems as if these locally p a r a m e t r i c methods can produce big improvements in the
properties of the local estimator. Of course, there is no reason to chose a
polynomial in x t. In fact, if one thought that the global function was likely to have
a specific shape, then it would make sense to use that. Thus, for options pricing,
one might use m B S ( x , - x ) ,
and this idea has been exploited by Bossaerts and
Hillion (1994). Another important application in finance of non-parametrics has
been to estimate the yield curve. McCulloch (1971) pioneered the application of
spline functions to this - here Yt are observed yields, x, is the maturity and ~b~ are
regression splines - while Fisher et al. (1994) use smoothing splines, which add
extra constraints to penalize " r o u g h n e s s " in the estimated yield curve. Gouri6roux
and Scaillet (1994) use the local parametric idea with the candidate parametric
functions being factor models taken from the term structure literature, such as
Vasicek's (Vasicek, 1977).
Another way of proceeding is to model the joint density of y, and x t and to
then derive the conditional moments from that. If one uses the SNP method, the
joint density would be approximated as the product of a polynomial and a
multivariate normal. Gallant et al. (1992) have used this approach to examine the
relation between returns and the volume of trading. They show that the variance of
returns conditional on both past returns and the volume of trading shows no
asymmetry in returns, so that it appears that most of the asymmetry is due to the
fact that the volume of trading is different in a bear than a bull market.
4.4. M o d e l i n g multivariate volatility
It seems very likely that the conditional variance of the return on an asset
would be related not only to the past history of its own return but also to those on
other assets. Indeed if one non-parametrically computes the variance of stock
returns conditional upon the past history of returns as well as the volume of
trading, it appears that the asymmetry between the variance and past returns noted
previously disappears, emphasizing the need to look at multivariate relations
between the variances of series.
One trend has been to engage in multivariate extensions of the univariate
GARCH, E G A R C H etc. models. Defining the conditional covariance matrix of an
n × 1 vector of returns x t as ~ , , Bollerslev et al. (1988) defined the multivariate
G A R C H model to be
x t = w~13 + e t
v e c h ( ~ / ) = A o + ~q= 1Bj
vech(~Qt_j)
(59)
P
+ ]~ A j v e c h ( e t _ j e ; _ j ) ,
j=l
(60)
A. P a g a n / J o u r n a l o f Empirical Finance 3 (1996) 1 5 - 1 0 2
79
where e l = ~-~lt/2 E t and ¢t is n.i.d. (0,I,). The log likelihood is easily written
down as
T.
1
1
---~log2'rr-~F~loglf~,l--~E(x,-w',ft)'12-~l(xt-w'tB),
(61)
but maximizing (61) is a formidable challenge owing to the huge number of
parameters entering it - from (60) alone there are n(n + 1 ) / 2 + ( p + q ) n z
(n + 1)2/4, and for (say) n = 3, p = q = 1, there are 78 G A R C H parameters to be
estimated. For this reason most applications have concentrated upon ways of
restricting (60) in some sensible fashion to reduce the number of unknown
parameters.
A first suggestion was by Bollerslev et al. (1988) who set p = q = 1 and made
the matrices A~, B l diagonal, giving the simple model for the elements of 12,,
(62)
f f ij,t = OtO,ij + O~ l , i j e i t _ I e j t - 1 + ~ l , i j ° i j , t - 1
i , j = 1 . . . . . n,
for a total of ( p + q + 1)(n(n + 1 ) / 2 ) parameters: when n = 3, there are 18
parameters. Later Bollerslev (1990) proposed that (rij.t = Pij~riYt=cr)f,2(i¢ j ) . In
this model there are n(n + 1 ) / 2 correlation coefficients pq as well as the
parameters of the N conditional variance equations to estimate. If each one of
these variances was univariate GARCH(1,1), then the total number of parameters
would be reduced to 3n + (n(n + 1)/2), i.e., 12 if n = 3. Such a specification
certainly looks attractive as it does not seem unreasonable to restrict the conditional covariances to vary in line with the conditional variances. Nevertheless,
there are still a very large number of parameters and one might also wish to
further reduce the number of unknowns by imposing some structure upon the pq.
Various other formulations have been tried; one which has the advantage of being
both parsimonious and ensuring that the estimated f~t is positive definite is that
given in Engle and Kroner (1995),
M
~t=C+
E
q
M
E Bmj~t - j B ' j + E
m=lj~l
p
E A m j e t - j e t -'j Y l m-'j
,
m~lj=l
where the Bmj and Amj each possess n 2 unknown parameters.
A different approach to reducing the number of unknown parameters is by
changing the number of forcing variables, i.e., rather than operate upon Aj and
Bj, the idea is to replace vech(e,_je',_j) by a smaller dimensional vector.
Generally, this is done by concentrating upon a small number of " f a c t o r s " that
are thought to drive the conditional variances. One perspective on this is that it is
derivative from Ross's arbitrage pricing model in which a vector of returns x, is
written as a linear combination of K < N factors ft, i.e., x t =f[f5 + et, and this
idea is just being shifted to the second moment (Ross, 1976). The difficulty is to
translate this idea into an operational model, since factors are generally unob-
80
A. P a g a n / J o u r n a l of Empirical Finance 3 (1996) 15-102
served. One strategy that does so is Diebold and Nerlove's formulation which has
Ot a function of a single latent factor, ft, and the conditional variance of the
factor, crf,, following a GARCH process, thereby endowing 1~t with the structure
~ t = C + 8~'crft (Diebold and Nerlove, 1989). Since E(l)~) = C + 3~'var(ft),
they used factor analysis on the unconditional covariance matrix of returns (they
have no w'~'s) to determine estimates of j~ and then applied these to estimate the
GARCH model, producing df. After the latter quantity is determined, MLE can
proceed in the standard way. Harvey et al. (1992) point out that it is possible to
estimate E t_ i(xt) and vary_ j(x,) under this model and so the quasi-MLE can be
applied. Actually, it would seem that the simplest way to estimate the model
would be to perform simulated MLE, since ft can be generated for any given
values of the GARCH parameters, and this would avoid the two step procedure.
Although not explicitly taking the viewpoint adopted here, one can view
Schwert and Seguin's work in this way (Schwert and Seguin, 1990). In their case
x t was a vector of monthly returns to different portfolios of stocks, and the
common factor was taken as the market return, i.e., they set
~ijt = O~Oij "~- OLlij(Yt q- (?t2ijt'~t2'
(63)
where trf is the conditional variance of the market return, measured in their case
by the volatility of daily returns to the S & P composite portfolio from 1928 to
1986. Thus, in this instance, the factor is rendered observable by the use of daily
data, although it could have also been estimated by (say) making (rt2 a GARCH(1,1 )
and using the monthly data on the aggregate portfolio. Schwert and Seguin also
1/2¢r
1/2 , i.e., the equivalent of what Boilerproposed that (rijt be replaced by nor
,.vii.t
~jj.~
slev does in the general GARCH model. One can estimate (63) by OLS replacing
g~j~ by x~, xjt, although one needs to correct for heteroskedasticity in the errors
when making inferences about the a's. 57 They find that small firm portfolios are
four times more sensitive to market volatility changes than are large firm
portfolios.
A generalization of this idea of making ~ t a function of the variances of
factors which are portfolios constructed from the assets in x t is available in Engle
et al. (1990b), where the model is termed Factor-GARCH. Defining fk, = "¢~xt as
the kth portfolio, let ~r~kt be its conditional variance. Engle et al. assume that
f ~ , = E Kk= ~k~ktrj~t
, 2 + ~ , an obvious extension of the APT, and then allow each of
the factor conditional variances to be GARCH(1,1), i.e.,
trek ' = ao~ + alk(~l,ket_ ,)2 + ~31ktrj~,_ ,.
57
~ijt is replaced by the residuals ~itt~jt if there are variables w r determining x t.
(64)
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
81
After substituting out (7fkt in the expression for f~,, one is left with
K
K
~ t = C + ~ 8kS'~a , k ( ~ e , _ ,)2 + E 13,kSkS'k'Ykf~,-,~/k"
(65)
k=l
k=l
In their terms this would be a "univariate portfolio representation" since cr~kt
depends only upon its own lagged value and a combination (,/~e,_ 1)2 of the errors
associated with its own portfolio. A "recursive portfolio representation" would
allow some dependence across portfolios; as the name suggests, the conditional
variances of portfolios are assumed to be ordered so that the dependence is upon
information about other portfolios further down in the chain but not upon
information above that point.
Engle et al. apply the idea to excess returns on stocks and Treasury Bills with
maturities ranging from one to twelve months. Principal component analysis of the
data led them to the idea of having two portfolios as the factors - one comprising
stocks, and the other a linear combination of Treasury Bills (equally weighted).
The recursive portfolio representation in which stocks affect bills but not conversely seemed superior to the univariate one. In this work the portfolio weights
are given and not estimated. Lin (1992) considers full maximum likelihood
estimation with unknown weights.
In all the above analysis, it was assumed that the object of interest was to
model the complete conditional covariance matrix. Sometimes, however, one
might only be interested in the diagonal elements (or at least be prepared to
concentrate upon them alone). This occurs in the Engle et al. (1990a) study of the
effects of news upon return volatility as it is transmitted around the world's stock
markets. Thus, the vector of returns might be returns on the New York, Milan,
London, and Tokyo markets. Here a subtle modification needs to be made to the
specification of ARCH models to reflect the fact that, when New York opens,
London and Milan have already been open for a number of hours. Hence, if the
model is one of intra-day volatility, (72 for New York should be a function of not
only past returns, but also current returns in London and Milan, i.e., one would
have an ARCH specification for the ith market (out of n) as
i-I
n
0"2 = (72 "~- E ° t i j e j 2t + E ° t i j e j t 2 - 1 + ~ i i ( T i2, - l "
j=l
j=i
(66)
In addition, one might also be interested in the behavior of covariances, particularly given the belief that stock markets are now much more closely related than
they were previously, but it would be a formidable task to allow these to vary as
well.
All of the above represents generalizations of the simplest GARCH processes to
multivariate models. There have been few attempts to estimate more complex
models. Braun et al. (1992) consider a bivariate EGARCH model when estimating
betas for stocks (as described earlier) but that is a rare exception. Multivariate
82
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
stochastic volatility models have been proposed and estimated. The extension to
the multivariate case is conceptually quite easy since one is simply dealing with a
VAR rather than an AR process in the conditional variance. Harvey and Shephard
(1993) point out that the quasi-MLE can be applied easily as the Kalman filter is
already formulated for multivariate problems. Danielsson (1993) extends the
accelerated importance sampler to perform simulated MLE and applies the method
to a bivariate system of exchange rates. Gallant et al. (1994) estimate a stochastic
volatility trivariate system with an indirect estimator and an SNP multivariate
density as the auxiliary model.
4.5. Are variances co-persistent?
Many financial series possess the property of integration and are also co-integrated. Integration can be interpreted as that characteristic wherein shocks to a
series are permanent, or in which a perturbation in the initial condition never dies
out of the series. Thus, for the AR(1) in (4), OYJOYo = [3tl for any given sequence
of shocks {uj}j= z. When 13~ = 1 any changes to the initial condition are permanently embedded in y~. In contrast, for an I(0) series, shocks are not persistent.
Therefore, if a co-integrating error is formed by weighting each series by their
parameter value in the cointegrating vector, this error is I(0) by definition, and
shocks to it will not be persistent even though they are to the individual series
from which it is constituted.
Effectively, cointegration is a statement about the conditional means of variables. Hence it is not surprising that a similar distinction regarding the impact of
shocks has been proposed by Bollerslev and Engle (1993) for variances. Thus,
suppose that Yt was GARCH(1,1), i.e., its conditional variance tr f is that given in
(13). Iterating that equation backward in time
cr,2 = [H~.- ~([3, + e~e,2_j)](r 2 +(13~ + ~x,)'(x 0,
(67)
and this relation enables one to assess the impact of a change in (r 2 on cry. One
difficulty is that the impact is stochastic and so there is no unique way of
measuring it. A simple measure would be just to consider E(ao52/~(r02)=
2
_
t- 1
E[[I~-__ 11([31+otlet_j)]-[IIj=l(~31
+Otl)]. 58 When the process is I G A R C H , 13j
+ c~j= 1, and the shock is deemed persistent into (rt2, whereas if it is just
G A R C H there is no such persistence.
With the ideas relating to persistence in variance just stated, it is natural to
define co-persistence in variance as occurring if individual series are I G A R C H but
if there exists some linear combination of them which is GARCH. The terminology is a useful one, as there do seem to be instances in which this phenomenon
58 Bollerslev and Engle adopt a different definition involving the stochastic b e h a v i o r o f predictions
o f a~2 as the prediction horizon lengthens, and the current defmition w a s chosen for its simplicity.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
83
occurs, e.g., spot and forward exchange rates may each have IGARCH in the
variance, but the difference between them, the risk premium, may just display
GARCH effects. It cannot be emphasized too strongly that there may be no
relation between the vectors that co-integrate the levels of series and those that
"co-integrate" the variances. Indeed it is quite possible for two financial series to
be both 1(0), each featuring IGARCH variances, and yet for the difference (say) to
display only GARCH. Moreover, although co-persistence in variance may be an
interesting empirical phenomenon, it does not seem to have any connection with
theoretical models in the same way that co-integration does. There we have the
notion that the co-integrating vector reflects an equilibrium relation from static
theory; this idea deriving from the belief that deviations from an equilibrium
should be stationary. However, even if we had a model indicating an equilibrium
relation between variances, the fact that t~f remains an I(0) process even if it is
IGARCH means that no argument can be made in favor of the combined variances
being GARCH.
5. Economic models of multivariate financial data
In the previous section statistical models to describe financial data have been
outlined. Ultimately, however, one seeks some economic explanation of the
inter-relationships, and various models have been proposed for this task. In some
instances, these involve considering what the first order conditions for optimizing
consumers and producers would be either in a partial or a general equilibrium
framework, followed by a derivation of the implications of those conditions for
relationships between financial series. In others, the optimality conditions come
from general arguments considering the relation between the price of an asset and
the flow of income that is associated with it. It is the latter which is the focus of
the next sub-section, and the former is the concern of the following one. Mostly,
the literature in this area has been concerned with testing whether the conditions
for inter-temporal optimization hold rather than trying to calibrate the underlying
models, and that explains the emphasis given to this question in what follows. In
those cases where there is interest in estimating the parameters of an underlying
model, estimation has proceeded with one of the techniques outlined earlier, e.g.,
Chen and Scott (1993) apply maximum likelihood to estimate a multifactor model
of the term structure. Perhaps the most interesting development has been the use of
indirect estimation for this purpose - see Bansal et al. (1994). There are many
models in economics that are easy to simulate but for which it is difficult to write
down the likelihood, e.g., models of exchange rates with " b a n d s " , see Ball and
Roma (1994), and these are obvious candidates for indirect estimation methods,
using as auxiliary models the statistical structures described in the previous
section.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
84
5.1. Inter-temporal relations from equilibrium conditions
From discounted present value theory it would be expected that the beginning
of period stock prices (Pt) would be related to dividends ( D t) by
P, = E ~3JE,(Dt+j),
j=0
(68)
where 13 is an ex-ante constant real discount rate and E t indicates that the
expectation is taken conditional on information available at time t (this does not
include DI). Defining S t = Pt - 0D, = Y'.~=013J(Et(Dt+;) - Dr), where 0 = 1/1 13, S t will be I(0) if D t is an integrated process, and therefore (1 - 0) would be a
co-integrating vector between P, and D , When tests for co-integration between Pt
and D t a r e applied, one frequently sees acceptance of the null hypothesis of
no-cointegration. Timmermann (1995) notes that the discount rate may vary over
time, and therefore 0, = 0 + v, would be a random variable. The relationship
between stock prices and dividends would then be Pt - mOt and this error would
equal dO, lstD t. AS D, is an I(1) process the dOt will have a conditional (upon
Dr) variance that depends upon D , i.e., there is a " l e v e l s " effect similar to that
seen with yields. Timmermann constructs simulation experiments to show that the
rejection of co-integration might simply stem from the fact that dOt has substantial
persistence in it due to the documented fact that v t depends on the dividend/price
ratio, which tends to be quite persistent.
Earlier work by Shiller (1981) sought to determine the validity of (68) by
considering its implications for sample moments. In particular, he defined an
ex-post rational price Pt* as
=
P,* = Y'~ 13JDt+j.
j=0
(69)
Then
9C
e,'= E 13Je,(o,+ ) + E 13 (o,+j- e,(o,+j))
j=O
=P,+
(70)
j=O
~ 13iV,+j,
j=o
(71)
and, following from the fact that Et(V,+ ;) = O, E(PtV,+ ;) = 0, producing
var(Pt* ) _ v a r ( P t ) .
(72)
Shiller took this variance bounds inequality and estimated both sides from sample
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
85
data. The LHS poses difficulties since it depends upon dividends an infinite
T-1 j
distance into the future, so he truncated this as Pt * * = Ej=0[3
Dt+ J + f3rPr, i.e.,
instead of Pt* being used to summarize future information he replaced it with the
observed quantity Pr. He then discovered that the sample variance of Pt exceeded
that of Pt* *
An immediate problem with this test, noted by Kleidon (1986), was that Pt was
very likely to be an integrated process so that the " v a r i a n c e " does not exist.
Hence, a comparison such as (72) can only be done with Pt being a non-integrated
process. This has led to other proposals for checking the validity of (68). One of
these is to recognize that Pt and D t should be co-integrated with vector (1 - 0)
(see the discussion in (a) above) so that if we formed
o~
St*
=
Pt* - OD, = E fSJDt+j - OD,
(73)
j=o
= Y'.[3J(Dt+j - D,),
(74)
St* will be an I(0) process when D t is integrated. Comparing this to S t =
:¢
j
Y'.j= 013 ( E t ( D t + j) -- Dr), shows that a prediction of the present value model is that
var(s,* ) _> var(s,),
(75)
and a comparison can be made using the sample variances owing to the fact that
S,* and S t are both 1(0). 59 However, even if these variables are I(1) there are a
number of factors that make an empirical comparison of the population quantities
in (75) much more clouded. Flavin (1983) pointed out that there are small sample
biases in estimating variances if the series have substantial serial correlation.
Indeed that must be the case for St* as it involves an infinite combination of
AJDt+ j. Moreover, as Shea (1989) shows in some simulation experiments, the
comparison can be a problem if AD, is close to being integrated: For such near
integrated processes statistics such as sample variances frequently behave more
like I(1) than I(0) series if the sample is small. Shea presents some Monte Carlo
evidence to show that the sample variance of S, can easily exceed that of St* even
if the present value model is correct. Actually, what Shea examines is the
properties of the Mankiw et al. (1985) statistic which involves comparing var(Pt*
pO) to var(P t - pO), where Pt° is a " n a i v e " forecast. However, as they set pO
to 0 D t, the experiment is relevant. Another variant is LeRoy and Parke (1987)
who compare the vat(St*) to the var(St), where St* = St*/D, and ~St = S t / / D t.
_
59 O f course St* is replaced by St* * but, as S t" * = St* + 13r (Pt - Pt* ), and P, - '°1" should be
I(0) when the present value model holds, that substitution should not be of concern, although the actual
computed sample variance can be different if we decide to compute the ex-post rational price using a
terminal date that is within the sample. In the remaining discussion we focus on St* only.
86
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Because S,* = ( P , * / D , ) - 0 and St = ( S t / D , ) - O ,
0 does need not to be estimated in their comparison. Generally, one has to conclude that these relative
variance tests are very inefficient ways of examining the validity of a present
value relation, and that direct testing of the model seems a more sensible strategy.
Another variance bounds test that does not suffer from problems arising out of
integrated processes has been developed by West (1988). He modifies (68) to
Pt = E t
,
(76)
where P, is now an end-of-period price rather than a beginning-of-period price.
(76) implies that
(77)
Pt = ~ E t ( O t + l + P t + l ) ,
where the appropriate transversality condition is assumed satisfied. Now, suppose
that the information set used in (76) is I t and that there is a sub-set of it designated
G t. Constructing the hypothetical quantities
xlt=E
II, , x c , = E
l G, ,
(78)
he shows that, under the present value model,
var(xc,- e(xo, Ja,_ ,)) _> var(x,,- e( x,, it,_ ,)).
(79)
He proposes that both sides of the inequality be estimated from sample data and
that the present value model be rejected if the inequality fails to hold.
The main issue is how to construct sample estimates of both variances. From
inspection of (78) and (76), xtt = D, + P,, and so the RHS of (79) is val'(Dt .4- P,
- E ( D t + P, [ I,_ l))" From (77) we can write
e,= t3(D,+, +P,+,) - n,,
(80)
where "tit = fS(Dt+ i + P,+ 1 - E(D,+ l + P,+ l I 6)) has the property that E('q, [ I,)
= 0. Lagging rl, once shows that its variance is the variance on the RHS of (79).
Hence, it is simply a matter of determining the variance of -q, from (80). By
definition E(rl, [ I t) = 0
. ' . E [ P t - ~3(D,+, + P t + , ) l l , ]
=0
or E[(1 - f3( D,+, + P,+ ,) ) / P , r 1,] = 0
as I t includes Pt in the information set.
(81)
(82)
A. Pagan/Journal o f Empirical Finance 3 (1996) 15-102
87
The method of moments estimator of [3 from (82) is just fi such that
]~(1-
~[(D,+
1 -I'-Pt+I)/Pt])=O
(83)
or ~ = I / T - I E[(D,+ l + It+ 1)/Pt] . As the denominator is one plus the sample
mean of {(D/+ 1 + APt+ l)/Pt}, this will give an estimated discount rate approximately equal to 1 minus the sample mean of (Dr+ l + APt+ ~)/Pt, i.e., one minus
the mean of ex-post returns. 6o Since (Dt+ l + APt+ l ) / P t will be I(0), ~ should
therefore be consistently estimated in this way. Thereupon the variance of "qt can
be estimated as T - l E ( P , - ~(Dt+ l + Pt+ 1)) 2.
To determine the LHS of (79) requires that the information set G t be specified
and a model for the dividend process be set up. Suppose A D t = e t is white noise
and G t contains Dr_ I. Then x ~ t = ( l - 1 3 ) -~ D t, E( xctl Gt_ l ) = ( 1 - [5) -1
Dr- 1 and the LHS is just var[(1 - 13)-l ( D t ~ Dt - 1)] = (1 - 13)-2 varAD t, from
which an estimate can be constructed using 13 and the sample variance of A D t.
Since both quantities involve estimating [3, var(ADt) and var(-qt) by method of
moments, a standard error can be placed on the difference between the estimated
var(xct - E ( x c t I Gt- l)) and var(xlt - E ( X l t I I t - 1)), a l t h o u g h this does not provide a solution to the problem of testing an inequality. With data like Shiller's on
stocks and dividends the inequality was clearly rejected.
Restricted V A R ' s have been used to test other aspects of models of financial
data. For example, if we return to the question of the present value pricing model
for stocks or the term structure of interest rates, we have observed that the
relationship should exhibit a co-integrating vector. Concentrating on stocks and
dividends, S t = Pt - ODt = ~=o[3JEtADt+j, and if we form a VAR from S t and
A Dr, restrictions are implied upon its parameters. In particular, writing it as (56),
where z't=(St ADt), the restrictions on a pth order V A R would be that
dp+ 1 = e'113A(l- 13A) - I , where ej is a vector with unity in the jth position and
zeros elsewhere. One can test these restrictions by doing a Wald test. A better way
¢
¢
to test the restrictions would be to test ep+ 1(1 -- 13A) -~ ep+ 113A, as this is testing
a set of linear rather than non-linear restrictions and Wald tests have better
sampling properties when restrictions are close to linear - see Gregory and Veall
(1985) and Phillips and Park (1988). Such tests have been used previously by
Campbell and Shiller (1987) and Kearns (1990a) in studying the term structure
relations. They find that the restrictions are generally rejected.
60 West suggests using other instruments, ta t , to estimate 13, but if they are to be instruments they
must be uncorrelated with 1 - 13[(Dt+ t + Pr+ l)/P,], i.e., E[wt(l - 13[D1+ i +/>1+ t / P t ] ) ] = 0. Since
there is a constant being used. w, can be assumed to have E(w t) = 0, so that this condition requires
13E[wr(Dt+2+ Pt+ i ) / P i ] = 0. However, this means there is no information in wl for !3 as the
derivative of [1 - f~(Dt+ i + Pt+ l)/Pr] with respect to 13 has to be correlated with the instrument for
the latter to be relevant. As discussed earlier in connection with the GMM estimator the use of such
weak instruments can have deleterious effects upon the properties of the instrumental variables
estimator.
88
A. Pagan/Journal o f Empirical Finance 3 (1996) 15-102
I f one o b s e r v e s rejection the q u e s t i o n o f what caused the rejection arises. O n e
candidate is that there are speculative bubbles. 61 A second is that the approximations e m p l o y e d to get relations such as the expectations hypothesis R t =
k - ~ E ~ S _ ~ E ~ r , + j , w h e r e R t and r t are r e s p e c t i v e l y the long and short term interest
rates, are invalid; in particular they i n v o l v e the linearization around (1 + ,~)-~,
where R is taken to be the m o n t h l y a v e r a g e yield on long bonds. W h e n yields
f o l l o w an integrated process R will no l o n g e r c o n v e r g e to a constant so that the
linearization o f the holding yield o f an n-period bond p a y i n g a c o u p o n C,
C
h~'=
R t + 11 - C
C+---:"~+R,,_)I.I~,~_I],,_
"'t+)
" ' , + ' t . . . . ,+, j
C
l
R-~+
R7 -
C
R711 +R~']
.
-1
-- see Shiller (1979) - is being p e r f o r m e d around a stochastic quantity. If the
process is close to b e i n g integrated_the s a m e p r o b l e m is likely to arise unless the
sample size is v e r y large. E v e n if R can be taken as a constant the linearization
w o u l d fail if b o n d yields d e v i a t e d substantially f r o m R. It has generally b e e n
argued that the linearization is a g o o d approximation, but this is not sufficient for
estimation purposes. In a regression, any a p p r o x i m a t i o n error in the conditional
m e a n g o e s into the regression error and generally causes a correlation b e t w e e n
regressors and errors, thereby i n d u c i n g inconsistencies into p a r a m e t e r estimators.
In turn, these m a y translate into rejections o f hypotheses. I n d e e d this seems to be
true here. Both Kearns (1990a) and S h e a (1992) simulate data f r o m m o d e l s in
which the present v a l u e m o d e l holds but find substantial rejections o f it. W h e t h e r
this is due to p o o r s a m p l i n g properties o f the test statistics or the linearization
error remains to be determined. H o w e v e r , it s e e m s m o r e likely to be the latter.
F o l l o w i n g the A b e l - M i s h k i n approach, C a m p b e l l and Shiller (1987, p. 1068)
s h o w that the linear restrictions on the V A R can be tested by regressing z~ = ( A r t
q- S t - - ( S t _ 1//~)) o n A Ft _ i a n d S , _ i, and testing if the effects o f these variables
are zero. C o n s e q u e n t l y , the W a l d test e f f e c t i v e l y i n v o l v e s adding variables to a
6~ There is another literature that treats the presence of bubbles as arising from a failure of the
transverSality condition l i m j ~ 13./ Et(PI+j+j) = 0 to hold, so that the solution to the forward
difference equation underlying (68), Pt = 13Et(Pt+ i) + Dr, is not unique and can be augmented with
any process satisfying Br+ 1= 13- J B, + ~t+ J, where E t ( ~ t )
0. This results in a failure of co-integration between Pt and D t - see e.g., Hamilton and Whiteman (1985). In fact, if the bubble is an
integrated process, the Wald test based on the VAR would not be a consistent test for a rational bubble
as the test statistic can be shown to have a limiting distribution under the alternative and so will not
reject the null with probability one. The main complication with this view is the difficulty of
discriminating between a bubble and a mis-specification of the fundamentals process as the source of a
lack of co-integration. Durlauf and Hall (1989) have made some progress in choosing between the two
possibilities.
=
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
89
linear regression, albeit with predetermined rather than strongly exogenous regressors, and therefore one would not expect such a test to exhibit the level of biases
observed in the simulations (rejection rates of 25.3% when the nominal 10%
significance levels are used). 6z
5.2. Intertemporal models with varying IMRS
Perhaps the most distinctive feature o f financial economics has been the
development of theories of asset prices that are capable of being tested with data.
Of these the most famous would have to be the S h a r p e / L i n t n e r Capital Asset
Pricing Model. Stochastic intertemporal asset pricing theories have been developed
by a number of authors, and a particularly influential one was Breeden (1979). The
central implication of these theories was that the first order conditions for an
intertemporally optimizing consumer were
E,[([3u'(C,÷ , ) / d ( C t ) ) R t ]
= 1,
(84)
where u'(C t) is the marginal utility from consumption, R z is the gross return
(Pt+ I + dt)//Pt, where Pt is the real price of the asset, d t are real dividends and
E t denotes that the expectation is taken with respect to information at time t. This
model is also sometimes referred to as the consumption - C A P M as it emphasizes
that returns are a function of the covariance of the intertemporal rate of substitution (IMRS) - "stochastic discount factor" - ~t = fSu'(Ct+ t)/u'(Ct) with the
return R t. 63 TO see this write E ( t ~ t g t) = c o v ( t ~ t g t) - E(Rt)E(~t), so that (84)
implies E(R,) = c o v ( R t O , ) / / E ( ~ t ) , and it is the covariance with 4, rather than the
market portfolio which is important for asset prices (although if one follows
Campbell (1992) and log-linearizes the c o n s u m e r ' s budget constraint to find
E,(A c,+ i) = P~,, + E,(r,,,t+ l), where c, = log(C/), it is clear that there is implicitly
a relation with the market return, rm,t).
Under a special form for the utility function it is possible to write out a specific
set of restrictions that are implied by (84). F o r example, if u(C,) has the constant
relative risk aversion ( C R A A ) form y - ~ (C~t - 1), then (84) becomes
E,[~3(C,+i/Ct)~R,]
where et = ~ / -
= 1,
(85)
1. Hansen and Singleton (1982) suggested that one estimate the
62 Of course if zt, St and Art were near integrated, such over-rejection would be predicted.
63 If 0t is a constant and d t = 0 one gets the implication that returns are unpredictable.
90
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
parameters of (85) by GMM, using as "instruments" z't = (1 w~), the information
being conditioned upon in E,, leading to the moment restrictions
E[13(Ct+JC,)~R,] - - 1 = 0
(86)
E[13wt( C,+ J Ct) '~] = 0 ,
(87)
where we can assume without loss of generality that E ( w t) = 0 due to the
presence of unity in z,. Replacing the population mean in (86) by its sample value
produces a method of moments estimator of 13 of [3 = 1 / ( T - I E ( c , + ./C,)aR,),
where 8 is some estimator of a from (87), making [3 just the inverse of a sample
mean.
This estimation method has been applied to a wide variety of asset prices, both
domestic and international. Because, at heart, it involves a G M M estimator, there
are potential problems associated with it, which were outlined in an earlier section.
As explained there, these stem from the possibility that w t is a poor instrument for
avJO0, where v,= ~3(C,+I/Ct) '~ R t - 1 and 0 ' = (13 a ) , and this would be
assessed by regressing Ov,/O0 against w,. A small value of the R 2 from this
regression signals likely poor performance of the G M M estimator of 0. Unfortunately, Ov,/OO depends upon 0, leading to a circularity in the argument, in that
accurate computation of the R 2 etc. depends upon a good estimate of 0, and this
will be poor if the R 2 is low (the situation one is attempting to detect[).
Nevertheless, it would seem useful to compute measures such as the R 2 using the
point estimates of 0, as that should provide some insights into possible problems
with the GMM estimator.
Returning to the asset pricing model in (86) and (87), Hansen and Singleton
(1982) fitted this with various instruments chosen from lags of consumption
growth and gross returns. As an exercise, we set w t to unity, (Ct//ft_ 1) and
(C t_ l/C,_2) and take R t to be the gross market return on equities. This produces
G M M estimates for [3 and a of [~ = 0.998 and 8 = - 1.02 (assuming that v t is an
MA(2)).
Now
Ov,/~[3 = (C,+ 1 / C , ) ~ ' R ,
and
Ovt/Oa =
131og(C,+ JC,)(C,+ ,/C,)~'R,, and these derivatives (evaluated at the G M M estimates of [3 and oL) can be regressed against the instruments, resulting in R 2 of
0.007 and 0.13 respectively. These results demonstrate that one would need to be
very careful when using the GMM estimates, although the very tight distribution
of the squared instrument for 13 around its mean (the mean is 1.0015 and the
standard deviation is 0.007) suggests that this parameter should be quite accurately
estimated, a result in accord with existing simulation work - see Mao (1990).
If the dim(w,) > dim(0) it is possible to test whether the moment conditions for
inter-temporal optimization are satisfied using the J-test set out in Hansen (1982).
Mostly, applications of this test have led to rejection of the model. There are a
number of possible reasons for this. One is that the over-rejection is caused by the
poor sampling distributions of the GMM estimator of 0, as this has been observed
in some simulation work, e.g., Mao (1990). Another is that the specifications used
for the utility function are incorrect, and this has led to a literature exploring
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
91
generalized forms, e.g., the introduction of inertia into the utility function as in
Constantinides (1990).
Instead of performing a specification test directly upon the data a popular way
of assessing the quality of a model has been to utilize " H a n s e n - J a g a n n a t h a n
bounds". Defining p as the correlation between R t and ~, we have
02
=
=
[e(e,,,)
=
[1-E(Rt)E(
2
2 2
if the asset pricing condition (84) holds. 64 Inverting this relation and recognizing
that the maximum value of p2 is unity gives
%2 >_ [1 - E ( R , ) E ( , , ) ]
2/cry,
2
(88)
and this is the bound provided by Hansen and Jagannathan (1991) (they work with
a vector of gross returns so that a matrix representation of the above is needed).
Using the equality one can trace out a relation between E(~t) and the minimum
value of cry. For any given candidate for ~t, e.g., ~ = [3(Ct+ 1 / C t ) ~, it is possible
to derive estimates of E(~t) and cr,z and to see if the bounds are satisfied. One
nice feature of this approach is that it enables one to perform a sensitivity analysis
with respect to parameters such as c~, i.e., one asks what value ot would have to
take in order to make the bound hold. If one does not wish to perform such a
sensitivity analysis then it would be necessary to take account of the fact that the
point estimates of the left and right hand sides of (88) are both random variables
and one might therefore wish to attach some probability statement to whether the
bounds are violated. Burnside (1994) and Cecchetti et al. (1994) do this using the
method of deriving asymptotic distributions of non-linear functions of random
variables. One might well ask what the advantage of using these bounds as a
formal specification test is over the direct test of the over-identifying restrictions.
Sometimes the latter may not be available as dim(w t) = dim(0) but, when it is, the
simulation studies in Burnside (1994) suggest that it is the best test, although that
it could over-reject in some cases as the true values of [3 and a changed, and this
is most likely due to the changing quality of the instruments.
6. Conclusion
A striking feature from this survey is how a purely statistical modeling
approach has been dominant in the analysis of financial data, particularly when the
64 The conditional expectation has been replaced by an unconditional expectation using the law of
iterated expectations. Gallant et al. (1990) use the conditional expectations, estimating these with SNP
densities.
92
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
focus is a univariate one. One cannot help but feel that these statistical approaches
to the modeling of financial series have possibly reached the limits of their
usefulness. The models are having to be made increasingly complex so as to
capture the nature of the conditional density of returns. Initially, it looked as if the
ARCH class of processes would provide a simple explanation, but, as documented
in these notes, it is becoming clear that this is not so. Ultimately one must pay
more attention to whether simple economic models are capable of generating the
complex behavior that is evident, i.e., is it possible to construct economic models
that might be useful in explaining what is observed? To date progress in this area
has been slow. Few models are capable of generating the type of ARCH one sees
in the data. The same can be said about present value models. To find the type of
dependence observed in returns one would need a very strong dependence in the
dividends process (or endowments if one is looking at general equilibrium
methods of generating asset prices), and the evidence on this is weak. Most of
these studies are best summarized with the adage that "to get GARCH you need
to begin with GARCH". Nevertheless, the search for models that will take as
inputs moderate degrees of volatility and accentuate it has to be accorded a high
priority if we are to make further progress in the modeling of this aspect of
financial series. An interesting paper that does just this is Den Haan and Spear
(1994) who manage to produce GARCH in interest rates by allowing for income
distribution effects changing with the cycle.
The situation with multivariate modeling differs in that theoretical models have
informed decisions about how to link the mean behavior of variables, although the
explanation of co-movements in volatility is still the province of statistical models.
To some extent this emphasis comes from the motivation for much empirical work
in finance; if it works and one can make money from it then the lack of a
theoretical base is not accorded much importance. For many economists however
it is desirable to be able to understand the phenomena one is witnessing, and this is
generally best done through theoretical models. Of course this desire does not
mean that the search for statistical models which fit the data should be abandoned.
One of the nice features of the indirect estimation approach discussed in Section
3.2 is the emphasis placed upon the use of a statistical model that fits data well as
a way of estimating the parameters of an underlying theoretical model. It is this
interplay between statistics and theoretical work in economics which needs to
become dominant in financial econometrics in the next decade.
7. O t h e r s o u r c e s
Abel and Mishkin (1983, Chesney and Scott (1989, Clark (1973, Engle and Lee
(1992, Fama (1976, French et al. (1987, Gallant and Tauchen (1990, Hansen and
Singleton (1990, Kearns (1992, Nelson and Startz (1990, Newey and West (1987,
Pagan (1975, Summers (1986, Tauchen (1986, Tauchen and Pitts (1983, Taylor
(1990) and Watson (1964).
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
93
Acknowledgements
I would like to thank Torben Andersen, Richard Baillie, Tim Bollerslev, Colin
Cameron, Mardi Dungey, Frank Diebold, Phil Kearns, Walter Kramer, John
Robertson, Allan Timmermann and anonymous referees for their comments on
earlier versions of this survey.
References
Abel, A.B. and F.S. Mishkin, 1983, An integrated view of tests of rationality, market efficiency and
short-run neutrality of monetary policy, Journal of Monetary Economics 11, 3-24.
Albert, J. and S. Chib, 1993, Bayesian analysis via Gibbs sampling of autoregressive time series
subject to mean and variance Markov shifts, Journal of Business and Economic Statistics l 1, 1-15.
Andersen, T.G., 1992, Volatility, Working paper No. 144, Kellogg Graduate School of Business,
Northwestern University.
Andersen, T. and B.E. Sorensen, 1994a, GMM estimation of a stochastic volatility model: A Monte
Carlo study, Working paper No. 94-6, Brown University.
Anderson, T.G. and B.E. S0rensen, 1994b, A note on Ruiz (1994): Quasi-maximum likelihood
estimation of stochastic volatility models, Working paper No. 189, Kellogg Graduate School of
Management, Northwestern University.
Baillie, R.T., 1989, Tests of rational expectations and market efficiency, Econometric Reviews 8,
151-186.
Baillie, R.T., 1996, Long memory processes and fractional integration in economics and finance,
Journal of Econometrics (forthcoming).
Baillie, R. and T. Bollerslev, 1989, Common stochastic trends in a system of exchange rates, Journal of
Finance 44, 167-182.
Baillie, R. and T. Bollerslev, 1994, The long memory of the forward premium, Journal of International
Money and Finance 13, 565-572.
Baillie, R.T., T. Bollerslev and H.O. Mikkelson, 1996, Fractionally integrated autoregressive conditional heteroskedasticity, Journal of Econometrics (forthcoming).
Baillie, R.T. and R.J. Myers, 1991, Bivariate GARCH estimation of the optimal commodity futures
hedge, Journal of Applied Econometrics 6, 109-124.
Ball, C.A. and A. Roma, 1994, Target zone modelling and estimation for European monetary system
exchange rates, Journal of Empirical Finance 1,385-420.
Bansal, R., A.R. Gallant, R. Hussey and G. Tauchen, 1994, Nonparametric estimation of structural
models for high frequency market data, Journal of Econometrics (forthcoming).
Bekaert, G. and R.J. Hodrick, 1992, Characterizing predictable components in excess returns on equity
and foreign exchange markets, Journal of Finance 47, 467-509.
Beveridge, S. and C.R. Nelson, 1981, A new approach to the decomposition of economic time series
into permanent and transitory components with particular attention to measurement of the business
cycle, Journal of Monetary Economics 7, 151-174.
Black, F., 1976, Studies in stock volatility changes, Proceedings of the 1976 Meetings of the Business
and Economics Statistics Section, American Statistical Association, 177-181.
Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics 31, 307-327.
Bollerslev, T., 1990, Modelling the coherence in short-run nominal exchange rates: A generalized
ARCH model, Review of Economic Statistics 72, 498-505.
Bollerslev, T. and R.F. Engle, 1993, Common persistence in conditional variances: Definition and
representation, Econometrica 6 l, 167-186.
94
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Bollerslev, T. and R.J. Hodrick, 1992, Financial market efficiency tests, Working paper No. 132,
Kellogg Graduate School of Management, Northwestern University.
Bollerslev, T., R.Y. Chou and K.F. Kroner, 1992, ARCH modeling in finance: A review of the theory
and empirical evidence, Journal of Econometrics 52, 5-59.
Bollerslev, T., R.F. Engle and D.B. Nelson, 1994, ARCH models, in: R.F. Engle and D. McFadden
(eds.), Handbook of Econometrics, Vol. 4 (North-Holland).
Bollerslev, T.R., F. Engle and J. Wooldridge, 1988, A capital asset pricing model with time varying
covariances, Journal of Political Economy 96, 116-131.
Bossaerts, P. and P. Hillion, 1994, Local parametric analysis of hedging in discrete time (mimeo,
Tilburg University).
Bougerol, P. and N. Picard, 1992, Stationarity of GARCH processes and of some non-negative time
series, Journal of Econometrics 52, 115-127.
Braun, P.A., D.B. Nelson and A.M. Sunier, 1992, Good news, bad news, volatility and betas (mimeo,
University of Chicago).
Breeden, D.T., 1979, An intertemporal asset pricing model with stochastic consumption and investment
opportunities, Journal of Financial Economics 7, 265-296.
Brenner, R.J., R.H. Harjes and K. Kroner, 1994, Another look at alternative models of the short-term
interest rate (mimeo, University of Arizona).
Brock, W., W.D. Dechert and J.A. Scheinkman, 1987, A test for independence based on the correlation
dimension (mimeo, University of Wisconsin, Madison).
Brockwell, P.J. and R.A. Davis, 1991, Time series: Theory and methods, 2nd ed. (Springer-Verlag,
New York).
Broze, L., O. Scaillet and J.M. Zakoian, 1993, Testing for continuous time models of the short-term
interest rates, Discussion paper No. 9331, CORE.
Burnside, C., 1994, Hansen-Jagannathan bounds as classical tests of asset pricing models, Journal of
Business and Economic Statistics 12, 57-79.
Cai, J., 1994, A Markov model of unconditional variance in ARCH, Journal of Business and Economic
Statistics 12, 309-316.
Cameron, A.C. and P.K. Trivedi, 1993, Tests of independence in parametric models with applications
and illustrations, Journal of Business and Economic Statistics 11, 29-43.
Campbell, J.Y., 1992, Intertemporal asset pricing without consumption data (mimeo, Princeton
University).
Campbell, J. and R.J. Shiller, 1987, Cointegration and tests of present value models, Journal of
Political Economy 95, 1062-1088.
Cecchetti, S.G., P-S Lam and N.C. Mark, 1994, Testing volatility restrictions on intertemporal
marginal rates of substitution implied by Euler equations and asset returns, Journal of Finance,
XLIX, 123-152.
Chan, K.C., G.A. Karolyi, F.A. Longstaff and A.B. Sanders, 1992, An empirical comparison of
alternative models of the short-term interest rate, Journal of Finance, XLVII, 1209-1227.
Chen, R.R. and L. Scott, 1993, Maximum likelihood estimation for a multifactor equilibrium model of
the term structure of interest rates, Journal of Fixed Income 3, 14-31.
Chesney, M. and L.O. Scott, 1989, Pricing European currency options: A comparison of the modified
Black-Scholes model and a random variance model, Journal of Financial and Quantitative Analysis
24, 267-284.
Chib, S. and E. Greenberg, 1994a, Markov chain Monte Carlo simulation methods in econometrics
(mimeo, Washington University).
Chib, S. and E. Greenberg, 1994b, Understanding the Metropolis-Hastings algorithm (mimeo, Washington University).
Christie, A.A., 1982, The stochastic behaviour of common stock variances: Value, leverage and interest
rate effects, Journal of Financial Economics 10, 407-432.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
95
Christiano, L.J. and M. Eichenbaum, 1990, Unit roots in GNP: Do we "know and do we care?,
Carnegie-Rochester series on public policy, 32, Spring, 7-61.
Chu, C-S.J., 1995, Detecting parameter shift in GARCH models, Econometric Reviews 14, 241-266.
Clark, P.K., 1973, A subordinated stochastic process model with finite variance for speculative prices,
Econometrica 41, 135-155.
Cochrane, J.H., 1988, How big is the random walk in GNP, Journal of Political Economy 96, 893-920.
Constantinides, G.M., 1990, Habit formation: A resolution of the equity premium puzzle, Journal of
Political Economy 98, 519-543.
Cox, J., J. Ingersoll and S. Ross, 1985, A theory of the term structure of interest rates, Econometrica
53,385-407.
Crowder, W.J., 1994, Foreign exchange market efficiency and common stochastic trends, Journal of
International Money and Finance 13, 551-564.
Danielsson, J., 1993, Multivariate stochastic volatility (mimeo, University of Iceland).
Danielsson, J., 1994, Stochastic volatility in asset prices: Estimation with simulated maximum
likelihood, Journal of Econometrics 64, 375-400.
Danielsson, J. and J.F. Richard, 1993, Accelerated Gaussian importance sampler with application to
dynamic latent variable models, Journal of Applied Econometrics 8, S153-S174.
Das, S.R., 1993, Jump-hunting interest rates (mimeo, New York University).
De Jong and N. Shephard, 1993, Efficient sampling from the smoothing density in time series models
(mimeo, Nuffield College).
De Lima, P. and N. Crato, 1994, Long range dependence in the conditional variance of stock retums,
Economic Letters 45, 281-285.
De Lima, P. and F.J. Breidt and N. Crato, 1994, Modelling long-memory stochastic volatility (mimeo,
Johns Hopkins University).
Den Haan, W.J. and S. Spear, 1994, Credit conditions, cross-sectional dispersion and ARCH effects in
a dynamic equilibrium model (mimeo, University of California, San Diego).
Den Haan, W.J. and A. Levin, 1994, Inferences from parametric and non-parametric covariance matrix
estimation procedures (mimeo, University of California, San Diego).
Diebold, F.X., 1986, Modeling the persistence of conditional variances: A comment, Econometric
Reviews 5, 51-56.
Diebold, F.X., 1988, Empirical modeling of exchange rate dynamics, (Springer-Verlag, New York).
Diebold, F.X. and J.A. Lopez, 1995, ARCH models, in: K. Hoover (ed.), Macroeconometrics:
Developments, tensions and prospects (Kluwer Publishing Co).
Diebold, F.X. and M. Nerlove, 1989, The dynamics of exchange rate volatility: A multivariate
latent-factor ARCH model, Journal of Applied Econometrics 4, 1-22.
Diebold, F.X. and J. Nason, 1990, Nonparametric exchange rate prediction, Journal of International
Economics 28,315-322.
Diebold, F.X. and G.D. Rudebusch, 1989, Long memory and persistence in aggregate output, Journal
of Monetary Economics 24, 189-209.
Diebold, F.X. and T. Schuermann, 1996, Exact maximum likelihood estimation of observation-driven
econometric models, in: R.S. Mariano, M. Weeks and T. Schuermarm (eds.), Simulation-based
inference in econometrics: Methods and applications, (Cambridge University Press, Cambridge)
(forthcoming).
Ding, Z., C.W.J. Granger and R.F. Engle, 1993, A long memory property of stock market returns and a
new model, Journal of Empirical Finance 1, 83-105.
Drost, F.C., and C.A.J. Klassen, 1994, Adaptivity in semiparametric GARCH models (mimeo,
University of Tilburg).
Drost, F.C. and T.E. Nijman, 1993, Temporal aggregation of GARCH processes, Econometrica 61,
909-927.
Duffle, D. and R. Kay, 1993, A yield-factor model of interest rates (mimeo, Graduate School of
Business, Stanford University).
96
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Duffle, P. and K.J. Singleton, 1993, Simulated moments estimation of Markov models of asset prices,
Econometrica 61,929-952.
Dufour, J.M. and M.L. King, 1991, Optimal invariant tests for the autocorrelation coefficient in linear
regressions with stationary or non-stationary AR(1) errors, Journal of Econometrics 47, 115-143.
Durbin, J., 1959, Efficient estimation of parameters in moving-average models, Biometrika 46,
306-316.
Durlauf, S.N. and R.E. Hall, 1989, A signal extraction approach to recovering noise in expectations
based models (mimeo, Stanford University).
Dybvig, P.H., 1989, Bonds and bond option pricing based on the current term structure, Working
paper, Washington University (St. Louis).
Elliott, G., T.J. Rothenberg and J.H. Stock, 1992, Efficient tests for an autoregressive unit root (mimeo,
Harvard University).
Engel, C. and J.D. Hamilton, 1990, Long swings in the exchange rate: Are they in the data and do
markets "know it, American Economic Review 80, 689-713.
Engle, R.F., 1982, Autoregressive conditional heteroskedasticity with estimates of the variance of U.K.
inflation, Econometrica 50, 987-1008.
Engle, R.F. and T. Bollerslev, 1986, Modelling the persistence of conditional variances, Econometric
Reviews 5, 1-50.
Engle, R.F. and G. Gonzalez-Rivera, 1991, Semiparametric ARCH models, Journal of Business and
Economic Statistics 9, 345-360.
Engle, R.F. and K.F. Kroner, 1995, Multivariate simultaneous generalized GARCH, Econometric
Theory I 1, 122-150.
Engle, R.F. and G.G.J. Lee, 1992, A permanent and transitory component model of stock return
volatility (mimeo, University of California at San Diego).
Engle, R.F. and G.G.J. Lee, 1994, Estimating diffusion models of stochastic volatility (mimeo,
University of California at San Diego).
Engle, R.F. and V.K. Ng, 1993, Measuring and testing the impact of news on volatility, Journal of
Finance 48, 1749-1778.
Engle, R.F., T. Ito and W.L. Lin, 1990a, Meteor showers or heat waves? Heteroskedastic intra-daily
volatility in the foreign exchange market, Econometrica 58 525-542.
Engle, R.F., D.M. Lillien and R.P. Robins, 1987, Estimating time varying risk premia in the term
structure, Econometrica 55, 391-407.
Engle, R.F., V.K. Ng and M. Rothchild, 1990b, Asset pricing with a factor-ARCH covariance
structure: Empirical estimates for treasury bills, Journal of Econometrics 45, 213-237.
Evans, M.D.D. and K.L. Lewis, 1994, Do stationary risk premia explain it all? Evidence from the term
structure, Journal of Monetary Economics 33, 285-318.
Fama, E.F., 1976, Foundations of Finance (Basic Books, New York).
Fama, E.F. and K.R. French, 1988a, Permanent and temporary components of stock prices, Journal of
Political Economy 96, 246-273.
Fama, E.F. and K.R. French, 1988b, Dividend yields and expected stock returns, Journal of Political
Economy 96, 246-273.
Fan, J., N.E. Heckman and M.P. Wand, 1994, Local polynomial kernel regression for generalized
linear models and quasi-likelihood functions, Annals of Statistics 22, 1346-1370.
Feinstone, L.J., 1987, Minute by minute: Efficiency, normality and randomness in intra-daily asset
prices, Journal of Applied Econometrics 2, 193-214.
Fisher, M., D. Nychka and D. Zervos, 1994, Fitting the term structure of interest rates with smoothing
splines (mimeo, Board of Governors of the Federal Reserve System).
Flavin, M., 1983, Excess volatility in the financial markets: A reassessment of the empirical evidence,
Journal of Political Economy 91, 89-111.
French, K.R., G.W. Schwert and R. Stambaugh, 1987, Expected stock returns and volatility, Journal of
Financial Economics 19, 3-30.
A. Pagan~Journal of Empirical Finance 3 (1996) 15-102
97
Friedman, B.M. and D.I. Laibson, 1989, Economic implications of extraordinary movements in stock
prices, Brookings Papers on Economic Activity, 2/89, 137-189.
Fuhrer, J.G. Moore and S. Schuh, 1995, Maximum likelihood versus generalized method of moments
estimation of the linear-quadratic inventory model, Journal of Monetary Economics 35, 115-157.
Gallant, R., 1981, On the bias in flexible functional forms and an essentially unbiased form: The
Fourier flexible form, Journal of Econometrics 15, 211-244.
Gallant, A.R. and G. Tauchen, 1990, A nonparametric approach to nonlinear time series analysis:
Estimation and simulation, IMA Volumes in Mathematics and its Application (Springer-Verlag).
Gallant, A.R. and G. Tauchen, 1992, Which moments to match? (mimeo, Duke University).
Gallant, A.R., L.P. Hansen and G. Tauchen, 1990, Using conditional moments of asset payoffs to infer
the volatility of intertemporal marginal rates of substitution, Journal of Econometrics 45, 141-179.
Gallant, A.R., D.A. Hsieh and G. Tauchen, 1991, On fitting a recalcitrant series: The pound/dollar
exchange rate, 1974-83, in: W.A. Barnett, J. Powell and G.E. Tauchen (eds.), Nonparametric and
semiparametric methods in econometrics and statistics (Cambridge University Press, Cambridge).
Gallant, A.R., D. Hsieh and G. Tauchen, 1994, Estimation of stochastic volatility models with
diagnostics (mimeo, Duke University).
Gallant, A.R., J. Rossi and G. Tauchen, 1992, Stock prices and volumes, Review of Financial Studies
5, 199-242.
Gennotte, G. and T.A. Marsh, 1992, Variations in economic uncertainty and risk premiums on capital
markets (mimeo, University of California at Berkeley).
Geweke, J., 1994, Bayesian comparison of econometric models (mimeo, University of Minnesota).
Ghose, D. and K.F. Kroner, 1994, The relationship between GARCH and symmetric stable processes:
Finding the source of fat tails in financial data, Journal of Empirical Finance (forthcoming).
Glosten, L.R., R. Jagannathan and D. Runkle, 1993, On the relation between the expected value and
the volatility of the nominal excess return on stocks, Journal of Finance 48, 1779-1802.
Gourirroux, C. and A. Monfort, 1992, Qualitative threshold ARCH models, Journal of Econometrics
52, 159-199.
Gourirroux, C., A. Monfort and E. Renault, 1993, Indirect inference, Journal of Applied Econometrics
8, $85-S118.
Gourirroux, C. and O. Scaillet, 1994, Estimation of the term structure from bond data, Working paper
No. 9415, CEPREMAP.
Gregory, A. and M. Veall, 1985, Formulating tests of non-linear restrictions, Econometrica 53,
1465-1468.
Hamilton, J.D., 1988, Rational-expectations econometric analysis of changes in regime: An investigation of the term structure of interest rates, Journal of Economic Dynamics and Control 12,
385-423.
Hamilton, J.D., 1989, A new approach to the economic analysis of nonstationary time series and the
business cycl e, Econometrica 57, 357-384.
Hamilton, J.D., 1990, Analysis of time series subject to changes in regime, Journal of Econometrics 45,
39-70.
Hamilton, J.D. and R. Susmel, 1994, Autoregressive conditional heteroskedasticity and changes in
regime, Journal of Econometrics 64, 307-333.
Hamilton, J.D. and C.H. Whiteman, 1985, The observable implications of self-fulfilling expectations,
Journal of Monetary Economics 16, 353-373.
Hansen, B.E., 1990, Lagrange multiplier tests for parameter instability in non-linear models (mimeo,
University of Rochester).
Hansen, B.E., 1994, Autoregressive conditional density estimation, International Economic Review 35,
705-730.
Hansen, L.P., 1982, Large sample properties of generalized method of moments estimators, Econometrica 50, 1029-1054.
98
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Hansen, L.P. and R.J. Hodrick, 1980, Forward exchange rates as optimal predictors of future spot rates:
An econometric investigation, Journal of Political Economy 88, 829-853.
Hansen, L.P. and R. Jagannathan, 1991, Implications of security market data for models of dynamic
economies, Journal of Political Economy 99, 225-262.
Hansen, L.P. and K.J. Singleton, 1982, Generalized instrumental variables estimation of non-linear
rational expectations models, Econometrica 50, 1269-1285.
Hansen, L.P. and K.J. Singleton, 1990, Efficient estimation of linear asset pricing models with
moving-average errors, NBER Technical paper No. 86.
HSrdle, W., 1990, Applied non-parametric regression (Cambridge University Press, Cambridge).
Harrison, P., 1994, Are all financial time-series alike (mimeo, Duke University),
Harvey, A.C. and N. Shephard, 1993, The econometrics of stochastic volatility (mimeo, London School
of Economics).
Harvey, A., E. Ruiz and E. Sentana, 1992, Unobserved component time series models with ARCH
disturbances, Journal of Econometrics 52, 129-157.
Harvey, A., E. Ruiz and N. Shephard, 1994, Multivariate stochastic variance models, Review of
Economic Studies 61,247-264.
Harvey, C., 1991, The world price of covariance risk, Journal of Finance 46, I I 1-157.
Hejazi, W., 1994, Are term premia stationary? (mimeo, University of Toronto).
Hentschel, L., 1991, The absolute value GARCH model and the volatility of U.S. stock returns
(mimeo, Princeton University).
Hentschel, L., 1994, All in the family: Nesting asymmetric GARCH models (paper given to the
Econometric Society Winter Meeting, Washington DC).
Higgins, M.L. and A.K. Bera, 1992, A class of nonlinear ARCH models: Properties, testing and
applications, International Economic Review 33, 137-158.
Hill, B.M., 1975, A simple general approach to inference about the tail of a distribution, Annals of
Statistics 3, 1163-1174.
Hodrick, R.J., 1991, Dividend yields and expected stock returns: Alternative procedures for inference
and measurement, Finance Department working paper No. 88, Northwestern University.
Hols, M.C.A.B. and C. de Vries, 1991, The limiting distribution of extremal exchange rate changes,
Journal of Applied Econometrics 6, 287-302.
Hsieh, D.A., 1989a, Modeling heteroskedasticity in daily foreign exchange rates, Journal of Business
and Economic Statistics 7, 307-317.
Hsieh, D.A., 1989b, Testing for nonlinear dependence in daily foreign exchange rates, Journal of
Business 62, 339-368.
Huang, R.D. and C.S.Y. Lin, 1990, An analysis of nonlinearities in term premiums and forward rates
(mimeo, Vanderbilt University).
Hutchinson, J.M., A.W. Lo and T. Poggio, 1994, A nonparametric approach to pricing and hedging
derivative securities via learning networks, Journal of Finance, XLIX, 851-889.
Jacquier, E., N.G. Polson and P.E. Rossi, 1994, Bayesian analysis of stochastic volatility models,
Journal of Business and Economic Statistics 12, 57-80.
Johansen, S., 1988, Statistical analysis of cointegration vectors, Journal of Economic Dynamics and
Control 12, 231-254.
Jorion, P., 1988, On jump processes in the foreign exchange and stock markets, Review of Financial
Studies 1,427-445.
Kasa, K., 1992, Common stochastic trends in international stock markets, Journal of Monetary
Economics 29, 95-124.
Kearns, P., 1990a, Testing speculative bubbles in interest rates with the present value model (mimeo,
University of Rochester).
Kearns, P., 1990b, Non-linearities in the term structure (mimeo, University of Rochester).
Kearns, P., 1992, Pricing interest rate derivative securities when volatility is stochastic (mimeo,
University of Rochester).
A. Pagan / Journal of Empirical Finance 3 (1996) 15-102
99
Keams, P., 1993, Volatility and the pricing of interest rate derivative claims (unpublished Ph.D. thesis,
University of Rochester).
Keams, P. and A.R. Pagan, 1992, Estimating the density tail index for financial series (mimeo,
Australian National University).
Keams, P. and A.R. Pagan, 1993, Australian stock market volatility: 1875-1987, Economic Record 69,
163-178.
Kim, S. and N. Shephard, 1994, Stochastic volatility: Likelihood inference and comparison with ARCH
models (mimeo, Nuffield College, Oxford).
King, M.L., 1985, A point optimal test for autoregressive disturbances, Journal of Econometrics 27,
21-37.
Kleidon, A.W., 1986, Variance bounds tests and stock price valuation models, Journal of Political
Economy 96, 953-1001.
Kloeden, P. and E. Platten, 1992, The numerical solution of stochastic differential equations (SpringerVerlag).
Knez, P., R. Litterman and J. Scheinkman, 1989, Explorations into factors explaining money market
returns, Discussion paper No. 6, Goldman Sachs and Co.
Kochedakota, N., 1990, On tests of representative consumer asset pricing models, Journal of Monetary
Economics 26, 285-304.
Koedijk, K.G., F.G3.A. Nissen, P.C. Schotman and C.P. Wolff, 1993, The dynamics of short-term
interest rate volatility reconsidered (mimeo, Limburg Institute of Financial Economics).
Koop, G., 1994, An objective Bayesian analysis of common stochastic trends in international stock
prices and exchange rates, Journal of Empirical Finance l, 343-364.
Kwiatowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin, 1992, Testing the null hypothesis of
stationarity against the alternative of a unit root, Journal of Econometrics 54, 159-178.
Lamoureux, C.G. and W.D. Lastrapes, 1990, Persistence in variance, structural change and the
GARCH model, Journal of Business and Economic Statistics 8, 225-234.
Lee, J. and M.L. King, 1993, A locally most mean powerful based score test for ARCH and GARCH
regression disturbances, Journal of Business and Economic Statistics 1l, 17-27.
Lee, S.W., 1992, Asymptotic properties of the maximum likelihood estimator of the GARCH-M and
IGARCH-M models (mimeo, University of Rochester).
Lee, S.W. and B.E. Hansen, 1991, Asymptotic properties of the maximum likelihood estimator and test
of the stability of parameters of the GARCH and IGARCH models (mimeo, University of
Rochester).
Lee, S.W. and B.E. Hansen, 1994, Asymptotic theory for the GARCH (1,1) quasi maximum likelihood
estimator, Econometric Theory 10, 29-52.
LeRoy, S. and W.R. Parke, 1987, Stock price volatility: A test based on the geometric random walk
(mimeo, University of California at Santa Barbara).
Lin, W.L., 1992~ Alternative estimators for factor GARCH models - A Monte Carlo comparison,
Journal of Applied Econometrics 7, 259-279.
Linton, O., 1992, The shape of the risk premium: Evidence from a semi-parametric, mean-exponential
GARCH Model (mimeo, Nuffield College, Oxford).
Linton, O., 1993, Adaptive estimation in ARCH models, Econometric Theory 9, 539-569.
Lo, A.W., 1991, Long-term memory in stock market prices, Econometrica 59, 1279-1313.
Lo, A.W. and A.C. MacKinlay, 1989, The size and power of the variance ratio test in finite samples: A
Monte Carlo investigation, Journal of Econometrics 40, 203-238.
Loretan, M. and P.C.B. Phillips, 1994, Testing the covariance stationarity of heavy-tailed time series:
An overview of the theory with applications to several financial data sets, Journal of Empirical
Finance 1, 211-248.
Lumsdaine, R.L., 1991, Asymptotic properties of the maximum likelihood estimator in GARCH (l,1)
and IGARCH (1,1) models (mimeo, Princeton University).
100
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Lye, J.N. and V.L. Martin, 1991, Modelling and testing stationary exchange rate distributions: The case
of the generalized Student's t distribution (mimeo, University of Melbourne).
McCulloch, J.H., 1971, Measuring the term structure of interest rates, Journal of Business 44, 19-31.
McCulloch, J.H., 1989, U.S. term structure data, 1946-1987, Handbook of monetary economics,
672-715.
McDonald, J. and J. Darroch, 1983, Consistent estimation of equations with composite moving average
disturbance terms, Journal of Econometrics 23, 253-267.
Mahieu, R. and P. Schotman, 1994a, Stochastic volatility and the distribution of exchange rate news,
(mimeo, University of Limburg, Maastricbt).
Mahieu, R. and P. Schotman, 1994b, Neglected common factors in exchange rate volatility, Journal of
Empirical Finance 1, 279-31 I.
Mandlebrot, B., 1963, The variation of certain speculative prices, Journal of Business 36, 394-419.
Mankiw, N.G., D. Romer and M.D. Shapiro, 1985, An unbiased re-examination of stock market
volatility, Journal of Finance 40, 677-687.
Mao, C.S., 1990, Hypothesis testing and finite sample properties of generalized method of moments
estimators: A Monte Carlo study (mimeo, Federal Reserve Bank of Richmond).
Melino, A. and S. Tumbull, 1990, Pricing foreign currency options with stochastic volatility, Journal of
Econometrics 45, 239-265.
Merton, R., 1973, An intertemporal capital asset pricing model, Econometrica 41,867-888.
Mihlstein, G.N., 1974, Approximate integration of stochastic differential equations, Theory of Probability and its Applications 19, 557-562.
Nelson, C.R. and M.J. Kim, 1993, Predictable stock returns: The role of small sample bias, Journal of
Finance 48, 641-661.
Nelson, C.R. and R. Startz, 1990, The distribution of the instrumental variables estimator and its t-ratio
when the instrument is a poor one, Journal of Business 63, S 125-1 40.
Nelson, D.B., 1988, The time series behaviour of stock market volatility and returns (unpublished
Ph.D. thesis, M.I.T.).
Nelson, D.B., 1989, Modeling stock market volatility changes, Proceedings of the American Statistical
Association, Business and Economic Statistics Section.
Nelson, D.B., 1990a, Stationarity and persistence in the GARCH (1,1) model, Econometric Theory 6,
318-334.
Nelson, D.B., 1990b, A note on the normalized residuals from ARCH and stochastic volatility models
(mimeo, University of Chicago).
Nelson, D.B., 1990c, ARCH models as diffusion approximations, Journal of Econometrics 45, 7-38.
Nelson, D.B. 1991, Conditional heteroskedasticity in asset returns: A new approach, Econometrica 59,
347-370.
Nelson, D.B. and D.P. Foster, 1992, Filtering and forecasting with misspecified ARCH models II:
Making the right forecast with the wrong model (mimeo, Graduate School of Business, University
of Chicago).
Newey, W.K., 1985, Maximum likelihood specification testing and conditional moment tests, Econometrica 53, 1047-1070.
Newey, W.K. and D.G. Steigerwald, 1996, Consistency of the quasi-MLE for models with conditional
heteroskedasticity, Econometric Theory (forthcoming).
Newey, W.K. and K.D. West, 1987, A simple positive semi-definite heteroskedasticity and autocorrelation consistent covariance matrix, Econometrica 55, 703-708.
Nieuwland, F.G.M.C., W.F.C. Verschoor and C.C.P. Wolff, 1994, Stochastic trends and jumps in EMS
exchange rates, Journal of International Money and Finance 13, 699-727.
Nyblom, J., 1989, Testing the constancy of parameters over time, Journal of the American Statistical
Association 84, 223-230.
Pagan, A.R., 1975, A note on the extraction of components from time series, Econometrica 43,
163-168.
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
101
Pagan, A.R. and Y.S. Hong, 1991, Non-parametric estimation and the risk premium, in: W. Barnett, J.
Powell and G. Tauchen (eds.), Semiparametric and nonparametric methods in econometrics and
statistics (Cambridge University Press, Cambridge).
Pagan, A.R. and Y. Jung, 1993, Understanding the failure of some instrumental variables estimators
(mimeo, Australian National University).
Pagan, A.R. and H. Sabau, 1992, Consistency tests for heteroskedasticity and risk models, Estudios
Econ6micos 7, 3-30.
Pagan, A.R. and G.W. Schwert, 1990a, Alternative models for conditional stock volatility, Journal of
Econometrics 45, 267-290.
Pagan, A.R. and G.W. Schwert, 1990b, Testing for covariance stationarity in stock market data,
Economics Letters 3, 165-170.
Pagan, A.R. and A. Ullah, 1988, The econometric analysis of models with risk terms, Journal of
Applied Econometrics 3, 87-105.
Pagan, A.R. and A. Ullah, 1995, Non-parametric econometrics (unpublished manuscript, Australian
National University).
Pagan, A.R. and F.X. Vella, 1989, Diagnostic tests for models based on individual data, Journal of
Applied Econometrics 45, 429-452.
Pagan, A.R., V. Martin and A.D. Hall, 1995, Modelling the term structure, Working paper in
Economics and Econometrics No. 284, Australian National University.
Pearson, N.D. and T.-S. Sun, 1994, Exploiting the conditional density in estimating the term structure:
An application to the Cox, Ingersoll and Ross model, Journal of Finance, XLIX, 1279-1304.
Pesaran, M.H. and B. Pesaran, 1993, A simulation approach to the problem of computing Cox's
statistic for testing non-nested models, Journal of Econometrics 57, 377-392.
Phillips, P.C.B. and J.Y. Park, 1988, On the formulation of Wald tests of nonlinear restrictions,
Econometrica 56, 1065-1083.
Pindyck, R.S., 1984, Risk, inflation and the stock market, American Economic Review 74, 335-351.
Rich, R., J. Raymond and J.S. Buffer, 1991, Generalized instrumental variables estimation of
autoregressive conditional heteroskedastic models, Economics Letters 35, 179-185.
Richardson, M. and J.H. Stock, 1989, Drawing inferences from statistics based on multiyear asset
returns, Journal of Financial Economics 25, 323-348.
Rosenberg, B., 1985, Prediction of common stock betas, Journal of Portfolio Management 11, 5-14.
Ross, S.A., 1976, The arbitrage theory of capital asset pricing, Journal of Economic Theory 13,
341-360.
Ruiz, E., 1994, Quasi-maximum likelihood estimation of stochastic volatility models, Journal of
Econometrics 63, 289-306.
Scheinkman, J. and B. LeBaron, 1989, Nonlinear dynamics and stock returns, Journal of Business 62,
311-337.
Schotman, P. and H.K. van Dijk, 1991, ~, Bayesian analysis of the unit root in real exchange rates,
Journal of Econometrics 49, 195-238.
Schwarz, G., 1978, Estimating the dimension of a model, Annals of Statistics 6, 461-464.
Schwert, G.W., 1990, Indexes of stock prices from 1802 to 1897, Journal of Business 63, 399-426.
Schwert, G.W. and P.J. Seguin, 1990, Heteroskedasticity in stock returns, Journal of Finance 45,
1129-1155.
Sentana, E., 1991, Quadratic ARCH models: A potential re-interpretation of ARCH models (mimeo,
London School of Economics).
Shea, G.S., 1989, Testing stock market efficiency with volatility statistics: Some exact finite sample
results (mimeo, Pennsylvania State University).
Shea, G.S., 1992, Benchmarking the expectations hypothesis of the interest rate term structure: An
analysis of cointegration vectors, Journal of Business and Economic Statistics 10, 347-366.
Shephard, H., 1994, Partial non-Gaussian state space, Biometrika 81, 115-131.
102
A. Pagan/Journal of Empirical Finance 3 (1996) 15-102
Shiller, R.J., 1979, The volatility of long-term interest rates and expectations models of the term
structure, Journal of Political Economy 87, 1190-1219.
Shiller, R.J., 1981, Do stock prices move too much to be justified by subsequent changes in dividends?,
American Economic Review 71,421-436.
Silverman, B.W., 1986, Density estimation for statistics and data analysis (Chapman Hall, New York).
Sola, M. and A. Timmermann, 1994, Fitting the moments: A comparison of ARCH and regime
switching models for daily stock returns (mimeo, Birkbeck College).
Sowell, F., 1992, Maximum likelihood estimation of stationary univariate fractionally integrated time
series models, Journal of Econometrics 53, 165-188.
Spanos, A., 1986, Statistical foundations of econometric modelling (Cambridge University Press,
Cambridge).
Staiger, D. and J.H. Stock, 1993, Instrumental variables regression with weak instruments, NBER
Technical working paper No. 151.
Steigerwald, D.G., 1992, Semi-parametric estimation in financial models with time varying variances
(mimeo, University of California, Santa-Barbara).
Stock, J.H. and M.W. Watson, 1988, Testing for common trends, Journal of the American Statistical
Association 83, 1097-1107.
Summers, L.H., 1986, Does the stock market rationally reflect fundamental values?, Journal of Finance
41,591-601.
Tanaka, K., 1990, Testing for a moving average unit root, Econometric Theory 6, 445-458.
Tauchen, G., 1985, Diagnostic testing and evaluation of maximum likelihood models, Journal of
Econometrics 30, 415-443.
Tauchen, G., 1986, Statistical properties of generalized method-of-moments estimators of structural
parameters obtained from financial market data, Journal of Business and Economic Statistics 4,
397-425.
Tanchen, G. and M. Pitts, 1983, The price variability-volume relationship on speculative markets,
Econometrica 51,485-505.
Taylor, S.J., 1986, Modelling financial time series (John Wiley, Chichester).
Taylor, S.J., 1990, Modelling stochastic volatility (mimeo, University of Lancaster).
Timmermann, A., 1995, Cointegration tests of present value models with a time-varying discount
factor, Journal of Applied Econometrics 10.
Tsay, R., 1986, Nonlinearity tests for time series, Biometrika 73, 461-466.
Vahid, F. and R. Engle, 1993, Common trends and common cycles, Journal of Applied Econometrics
8, 341-360.
Vasicek, O., 1977, An equilibrium characterization of the term structure, Journal of Financial
Economics 5, 177-188.
Watson, G.S., 1964, Smooth regression analysis, Sankhya, Series A, 26, 359-372.
West, K.D., 1988, Dividend innovations and stock price volatility, Econometrica 56, 37-61.
West, K.D., H.J. Edison and D. Cho, 1993, A utility-based comparison of some models of exchange
rate volatility, Journal of International Economics 35, 23-45.
White, H., 1980, A heteroskedasticity-consistent covariance matrix estimator and a direct test for
heteroskedasticity, Econometrica 48, 817-838.
Wu, P., 1992, Testing fractionally integrated time series, Working paper, Victoria University of
Wellington.
Zakoian, J.M. 1994, Threshold heteroskedastic models, Journal of Economic Dynamics and Control 18,
931-955.
View publication stats