Pairs trading
Gesina Gorter
December 12, 2006
Contents
1 Introduction
1.1 IMC . . . . . . . .
1.2 Pairs trading . . .
1.3 Graduation project
1.4 Outline . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Trading strategy
2.1 Introductory example . . .
2.2 Data . . . . . . . . . . . .
2.3 Properties of pairs trading
2.4 Trading strategy . . . . .
2.5 Conclusion . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
5
6
.
.
.
.
.
7
8
14
15
17
26
3 Time series basics
4 Cointegration
4.1 Introducing cointegration
4.2 Stock price model . . . .
4.3 Engle-Granger method .
4.4 Johansen method . . . .
4.5 Alternative method . . .
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Dickey-Fuller tests
5.1 Notions/ facts from probability theory .
5.2 Dickey-Fuller case 1 test . . . . . . . .
5.3 Dickey-Fuller case 2 test . . . . . . . .
5.4 Dickey-Fuller case 3 test . . . . . . . .
5.5 Power of the Dickey-Fuller tests . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
39
48
55
62
.
.
.
.
.
65
66
71
76
82
89
5.6
5.7
Augmented Dickey-Fuller test . . . . . . . . . . . . . . . . . . 94
Power of the Augmented Dickey-Fuller case 2 test . . . . . . . 105
6 Engle-Granger method
6.1 Engle-Granger simulation with random walks . .
6.2 Engle-Granger simulation with stock price model
6.3 Engle-Granger with bootstrapping from real data
6.4 Engle-Granger simulation with alternative method
7 Results
7.1 Results
7.2 Results
7.3 Results
7.4 Results
trading strategy . . . . . . . . .
testing price process I(1) . . . .
Engle-Granger cointegration test
Johansen cointegration test . . .
8 Conclusion
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
109
. 110
. 117
. 120
. 126
.
.
.
.
133
. 133
. 137
. 138
. 140
143
9 Alternatives & recommendations
145
9.1 Alternative trading strategies . . . . . . . . . . . . . . . . . . 145
9.2 Recommendations for further research . . . . . . . . . . . . . 150
Bibliography
152
2
Chapter 1
Introduction
1.1
IMC
IMC, International Marketmakers Combination, was founded in 1989. IMC
is a diversified financial company. The company started as a market maker
on the Amsterdam Options Exchange. Apart from its core business activity
trading, it is also active in asset management, brokerage, product development and derivatives consultancy. IMC Trading is IMC’s largest operational
unit and has been the core of the company for the past 17 years. IMC
Trading trades solely for its own account and benefit. IMC is active in the
major markets in Europe and the US and has offices in Amsterdam, Zug
(Switzerland), Sydney and Chicago. By trading a large number of different
securities in different markets, the company is able to keep its trading risk
to a minimum.
The dealingroom in Amsterdam is divided in two main sections: Marketmaking and Cash. Marketmaking’s main focus is on option trading, a market
maker for a certain option will quote both bid and offer prices on the option
and make profits from the bid-ask spread. The Cash or Equity desk is dedicated to the worldwide arbitrage of diverse financial instruments. Arbitrage
is a trading strategy that takes advantages of two or more securities being
mispriced relative to each other. Pairs trading is one of the many trading
strategies with Cash.
3
1.2
Pairs trading
History Pairs trading or statistical arbitrage was first developed and put
into practice by Nunzio Tartaglia, while working for Morgan Stanley in the
1980s. Tartaglia formed a group of mathematicians, physicists and computer scientists to develop automated trading systems to detect and make
use of mispricings in financial markets. One of the computer scientists on
Tartaglia’s team was the famous David Shaw. Pairs trading was one of the
most profitable strategies that was developed by this team. With members
of the team gradually spreading to other firms, so did the knowledge of pairs
trading. Vidyamurthy [15] presents a very insightful introduction to pairs
trading.
Motivation The general ’rule of thumb’ in trading is to sell overvalued
securities and buy undervalued ones. It is only possible to determine that
a security is overvalued or undervalued if the true value of the security is
known. The true value can be very difficult to determine. Pairs trading is
about relative pricing, so that the true value of the security is not important.
Relative pricing is based on the idea that securities with similar characteristics should be priced more or less the same. When prices of two similar
securities are different, one security is overpriced with respect to its ’true
value’ or the other one underpriced or both.
Pure arbitrage is making risk-less use of mispricing, which is why one could
call this a deterministic moneymaking machine. The most pure form of
arbitrage is profitably buying and selling the exact same security on different exchanges. For example, one could buy a share in Royal Dutch on the
Amsterdam exchange at ¿ 25.75 and sell the same share on the Frankfurt
exchange at ¿ 26.00. Because shares in Royal Dutch are inter-exchangeable,
such a trade would result in a flat position and thus risk-less money.
Although pairs trading is called an arbitrage strategy, it is not risk-free at
all. The key to success in pairs trading lies in the identification of pairs and
an efficient trading algorithm. Pairs trading is an arbitrage strategy that
makes advantage of a mispricing between two securities. It involves putting
on positions when there is a certain magnitude of mispricing, buying the
lower-priced security and selling the higher-priced. Hence, the portfolio consists of a long position in one security and a short position in the other. The
4
expectation is that the mispricing will correct itself, and when this happens
the positions are reversed. The higher the magnitude of mispricing when
positions are put on, the higher the profit potential.
Example To determine if two securities form a pair is not trivial but there
are some securities that are obvious pairs. For example one fundamentally obvious pair is Royal Dutch and Totalfina, both being European oil-producing
companies. One can easily argue that the value of both companies is greatly
determined by the oil price and hence that movements of the two securities should be closely related to each other. In this example, let’s assume
that historically, the value of one share Totalfina is at 8 times a share Royal
Dutch. Assume at time t0 it is possible to trade Royal Dutch at ¿ 26.00 and
Totalfina at ¿ 215.00. Because 8 times ¿ 26 is ¿ 208, we feel that Totalfina
is overpriced, or Royal Dutch is underpriced or both. So we will sell one
share in Totalfina and buy 8 shares in Royal Dutch, with the expectation
that Totalfina becomes cheaper or Royal Dutch becomes more expensive or
both. Assume at t1 the prices are ¿ 26.00 and ¿ 208, we will have made a
profit of ¿ 215 minus ¿ 208 is ¿ 7. We would have made the same profit if
at t1 the prices are ¿ 26.875 (215 divided by 8) and ¿ 215.00 respectively.
In conclusion, this strategy does not say anything about the true value of
the stocks but only about relative prices. In this example a predetermined
ratio of 8 was used, based on historical data. How to use historical data to
determine this ratio will be discussed in paragraph 2.4.
1.3
Graduation project
The goal of this project is to apply statistical techniques to find relationships
between stocks in all markets that IMC is active in, based solely on the history of the prices of the stocks. The closing prices of these stocks, dating
back two years, is the only data that will be used in this analysis. The goal
is to find pairs of stocks whose movements are close to each other.
IMC is already trading a lot of pairs which were found by fundamental analysis and by applying their trading strategy to historical data (backtesting).
No statistical analysis was made. From trading experience, IMC is able to
make a distinction between good and bad pairs based on profits. IMC has
provided a selection of ten pairs that are different in quality.
5
The main focus of this project will be modeling the relationships between
stocks, such that we can identify a good pair based on statistical analysis
instead of fundamental analysis or backtesting. The resulting relationships
will be put in order of the strength of co-movement and profitability.
Although one could study pairs trading between all sorts of financial instruments, such as options, bonds and warrants, this project focuses on trading
pairs that consist of two stocks.
1.4
Outline
In the next chapter a trading strategy for pairs will be derived, it illustrates
how money is made and what properties a good pair has. In chapter 3
some basics of time series analysis is briefly stated, which we will need for
the concept of cointegration. Chapter 4 discusses cointegration and two
methods for testing, the Engle-Granger and the Johansen method. Also in
this chapter a start is made with an alternative method. The Engle-Granger
method makes use of an unit root test named Dickey-Fuller, the properties of
this unit root test will be derived in chapter 5. The properties of the EngleGranger method are found by simulation in chapter 6. IMC has provided 10
pairs for investigation. The results of the trading strategy and cointegration
tests are stated in chapter 7, the pairs are also put in order of profitability
and cointegration. After the conclusions in chapter 8, some suggestions for
alternative trading strategies are made in chapter 9. In this chapter we will
also give some recommendations for further research.
6
Chapter 2
Trading strategy
IMC first started to identify pairs of stock based on fundamental analysis,
which means they have investigated similarities between companies in products, policies, dependencies of market circumstances, etcetera.
When a pair is identified, the question remains how to make money. In
this chapter, a trading strategy is explained that is quite similar to the strategy used by IMC. It is not exactly the same strategy because IMC does not
want to give away a ready-to-go-and-make-money trading strategy but also
because essential parts of their strategy, like the selection of parameters, are
based on ’gut-feeling’ and is in the hands of the trader. That makes it at
least very difficult to write down a general model of their trading strategy.
7
2.1
Introductory example
Assume we have two stocks X and Y that form a pair based on fundamental
analysis. Also available are the closing prices of these stocks dating back 2
years, which form times series {xt }Tt=0 and {yt }Tt=0 as shown in figure 2.1. In
one year there are approximately 260 trading days, so two years of closing
prices form a dataset of approximately 520 observations for each stock.
60
40
¿
20
.
.........
... ..... ..
.. . . ........ .............
....................
. .. ..
.
.
...
.. .
... .....
. ......
..............
.
.
...
.
.
.
.
.
.
.
.
.
.
.
. .........
............ .................
.... ... ......... .......... .. ...
.. .......
.... .. .................
.........
.... ... . ..... ..
.
............
........ ...... ..........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......
.........
...
.........................................
... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........ ...............
.......
....... ..................
...... ................. .. . ...........
........................................ ..
........ ......
.
......... .... .................
................. .... .....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............ ... ........
.
.............................
0
0
100
200
300
400
500
t
Figure 2.1: Times series xt and yt .
The first half of observations are used to determine certain parameters of the
trading strategy. The second half are used to backtest the trading strategy
based on these parameters, i.e., to test whether the strategy makes money
on this pair.
The average ratio of Y and X of the first 260 observations,
259
1 X yt
r̄ =
,
260 t=0 xt
in this example is 1.36, which means that 1 stock of Y is approximately 1.36
stock of X during this time period. Although the average ratio is probably
not the best estimator, we will use it in the trading strategy to calculate a
quantity called spread for each value of t:
st = yt − r̄xt .
8
If the price processes of X and Y were perfectly correlated, that is if X and
Y changes in the same direction and in the same proportion (for every t > 0,
yt = αxt for some α > 0, so the correlation coefficient is +1), the spread is
zero for all t and we could not make any money because X nor Y are ever
over- or underpriced. However, perfect correlation is hard to find in real life.
Indeed, in this example the stocks are not perfectly correlated, as we can see
in figure 2.2.
3
¿
0
−3
−6
.....
........
...
......
..
...............
.......................
...... .... .........
.
.
.
.
.
.. .
.
.. ... ......... .....
.... ...... ................... ....
... ....
. .... ........
...
...
..........
.........
. .... ...
.
..
....... ...... ...
.
..... ..... .
.
.
.....
.
.
.
.
..........
.. . . ......
. .
....
...... ..... ..........
..
.... .. ........
......... ......... ....
........ .... ...
.... ...
... ......... ...
...... .... .. ....... .......
.. . . .......................... ... .... ......
................ . ...............
...
... ... ........
.
.
.
.
.
.
...... ....... ........ ..
...
.. ... ...
.
.
.
........... .............. . ....................... ........
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........ .... ... .. .
.... ... ..
..
... ...
..... . ..... ..
..
.
.
.
.
.
.
.
.
......... ......... ..
......
... ... ... ..
. ... .....
.
.
. ... ...
...
.... ..
..............
............
........
....
...
.
0
100
200
300
400
500
t
Figure 2.2: Spread st .
As mentioned before, we like to buy cheap and sell expensive. If the spread
is below zero, stock Y is cheap relative to stock X. The other way around,
if the spread is above zero stock Y is expensive relative to stock X (another
way to put it is that X is cheap in comparison with Y ). So basically the
trading strategy is to buy stock Y and sell stock X at the ratio 1:1.36 if the
spread is a certain amount below zero, which we call threshold Γ. When the
spread comes back to zero, the position is flattened, which means we sell Y
and buy X in the same ratio so there is no position left. In that case, we
have made a profit of Γ. An important requirement is that we can sell shares
we do not own, also called short selling. In summary, we put on a portfolio,
containing one long position and one short, if the spread is Γ or more away
from zero. We flatten the portfolio when the spread comes back to zero. Just
like the average ratio, Γ is determined by the first half of observations. In
this example we determined a Γ of 0.40. The way Γ has been calculated will
be discussed in paragraph 2.4.
9
After determination of the parameters, the trading strategy is applied to the
second half of observations in the dataset. This results in 13 times making
a profit of Γ. In other words, the spread moves 13 times away from 0 with
at least Γ and back to 0. Note that this involves 26 trading instances, since
putting on and flattening a position requires two. Figure 2.3 and table 2.1
shows all 26 trading instances. The profit, made here, is at least 13Γ: We
use closing prices instead of intra-day data, so we do not trade at exactly
−Γ, 0 and Γ as we can see in table 2.1.
3
¿
0
−3
−6
....
..........
...... .....
.
....... ..... .
.
..
........ ......................
.
.
.
.. ...... ..
..
.
...
.. .....
.........
......... ............. ...................
...
...
. .. . ..
.... ..
....
...........
..
... ........
... ...... .............. ....
.
....... ........
.
.
... ...
... .. ...
.... ........... ..
.
..
... . .. .
.................
... .... ...
..
.....
.....
.....
.... .........
.
.
.
.
.
... ....
.
.
.
.
.
.
.
...
...... ...
..
....
...
. ....... ..
... ...
.. .. .
... ........
..... ...
...... ...
....... ...
......
. ..
.... .. ... .....
........
.
.
... ...... ................. . . . ...... . ...... . . ............... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ....... .. . . ............ ...... .................. . . . . . ...... ... . . . . . . . . . . . . .... . . .... .
.......................................................................................................................................................................................................................................................................................................................................................................................................................
. . . . . . . . ............ . . . ...... . ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................. . ...... . . ........ . ...... . . . . ...... . . . . . . . . . . . . . . . . . . .
.
...........
....
.... ...
...
.
. ...
......
... .
..
......... . ...
.
.
.
.
..... .. .... ..
...... ... ........
.. .....
.....
......
....
.
300
350
400
t
Figure 2.3: Spread st and Γ.
10
450
500
trade
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
t
268
282
284
289
293
300
302
310
311
420
423
428
429
432
434
435
437
440
444
445
446
449
450
467
468
519
Table 2.1: Trading instances strategy I.
st
position (Y ,X) price Y price X
0.69
(-1,+1.36)
31.49
22.63
-0.07
flat
31.37
23.11
-0.47
(+1,-1.36)
30.54
22.79
0.01
flat
31.43
23.10
0.55
(-1,+1.36)
32.05
23.15
-0.16
flat
32.81
24.23
-1.05
(+1,-1.36)
33.57
25.44
0.17
flat
33.56
24.54
0.45
(-1,+1.36)
33.58
24.34
-0.30
flat
40.33
29.85
-1.15
(+1,-1.36)
40.79
30.82
0.08
flat
43.15
31.65
0.65
(-1,+1.36)
43.43
31.44
-0.19
flat
42.60
31.45
-0.47
(+1,-1.36)
42.16
31.33
0.04
flat
42.61
31.28
0.82
(-1,+1.36)
42.79
30.84
-0.25
flat
44.01
32.52
-1.33
(+1,-1.36)
46.53
35.17
0.12
flat
46.17
33.84
1.24
(-1,+1.36)
46.32
33.13
-0.17
flat
45.89
33.85
-0.63
(+1,-1.36)
45.46
33.87
0.05
flat
46.19
33.91
0.48
(-1,+1.36)
47.16
34.31
-0.21
flat
44.95
33.19
total profit
11
profit
0.76
0.48
0.71
1.22
0.75
1.23
0.84
0.51
1.07
1.45
1.41
0.68
0.69
11.80
Rather than closing the position at 0, one could also choose to reverse the
position when the spread reaches Γ in the other direction. Assume we have
sold 1 Y and bought 1.36 X, because the spread was larger than Γ, we could
now wait until the spread reaches −Γ and buy 2 times Y and sell 2 times
1.36 X. As a result, we are now left with a portfolio of long 1 Y and short
1.36 X. This results in one initial trade and 12 trades reversing the position.
Note that the profit of reversing the position is 2Γ, so the total profit is at
least 12 times 2Γ. These trades are shown in table 2.2.
Table 2.2: Trading instances strategy II.
trade
t
st
position (Y ,X) price Y price X
1
268 0.69
(-1,+1.36)
31.49
22.63
2
284 -0.47
(+1,-1.36)
30.54
22.79
3
293 0.55
(-1,+1.36)
32.05
23.15
4
302 -1.05
(+1,-1.36)
33.57
25.44
5
311 0.45
(-1,+1.36)
33.58
24.34
6
423 -1.15
(+1,-1.36)
40.79
30.82
7
429 0.65
(-1,+1.36 )
43.43
31.44
8
434 -0.47
(+1,-1.36)
42.16
31.33
9
437 0.82
(-1,+1.36)
42.79
30.84
10
444 -1.33
(+1,-1.36)
46.53
35.17
11
446 1.24
(-1,+1.36)
46.32
33.13
12
450 -0.63
(+1,-1.36)
45.46
33.87
13
468 0.48
(-1,+1.36)
47.16
34.31
total profit
profit
1.16
1.02
1.60
1.50
1.60
1.80
1.12
1.29
2.15
2.57
1.87
1.11
18.79
This change of strategy reduces the number of trading instances on average
by a factor of 2. In doing so, we reduce trading costs. More important, if the
spread moves around 0 back and forth, strategy II will be more profitable.
For example, with the first trade the spread has moved above Γ, so we sell
1 Y and buy 1.36 X. When trading according to strategy I, we will flatten
our position at 0 and have zero position while moving from 0 to −Γ and not
profit from this movement. When trading according to strategy II, we will
still be short Y and long X while the spread moves to −Γ (eg. X becomes
more expensive relative to Y ). This is shown is figures 2.4 and 2.5.
12
...
........
......
.....
........ ....
....
. . . . . . ... . . ... ............ ...... . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .
...
...
.....
..... ... ...
...
...
..... ......
. ... ... ......
.. ...........
...
...
....... ... ... ...
...... ....... ...
...
.
.
........ .... ........
.
.
.
.
...... ..... ..
... .......
. .. ..... .....
. ....... ...
......... ... ....
... ...
... ...
..
......
... .......
..
.
... ......
.
....
.
...
... ....
.......
.....
...
.. ...
... .
... ..... ... ..........
.............
..... ... . . . .
........ ............ ........ ....
...... ... ....
...
...
... .. ....
...
...
.. ... ..... .....
...
... .
... ... ..... .
...
.. .........
.........
.
.
...
. ...
......... ...
.
.
..
.
.. .
.
. . . . . . . . . . . . . . . . . . . . . . . . ................ . . ... . . . ..... . . . . . . . . . ... . . . . . . .
.
..... ..... .....
.
. ........ ........ . .... ....
........ ........ ..... ..
..... ......................
. .... .... .....
..
❢
❢
❢
❢
❢
...
........
......
.....
........ ....
....
. . . . . . ... . . ... ............ ...... . . . . . . . . . . ..... . . . . . . . . . . . . . . . . . . . . . . ...... ....
...
...
..... .....
..... ... ...
...
...
..... ...... ......
. ... ... ......
.. ........... ... .
...
...
....... ... ... ...
...... ....... ... ...
...
.
.
........ .... ........
.
..
.
.
.
.
...... .....
... .......
...
....
. .. ..... .....
. .......
.......... ... ....
... ...
...
...
...
.... ..
...
....
...
...
.
...
.
...
.
...
.
..
...
.......
.
.
.
...
.....
...
..
.
.
.
...
.
.
... .
..
... ....... ...
.
.
...
.
.
............
.
.. .
........ ............ ........
...
....
...... ... ....
...
...
...
...
...... ... .......
.
...
.
.
.
.
.
.
.
...
.
.
.
.
... . ...
.. .......... .
.
...
. .....
......... . ... .
.
.
...
. ...
......... ..........
.
.
..
.
.
.
. . ..
.
. . . . . . . . . . . . . . . . . . . . . . . . ..................... . ... . . . ..... . . . . . . . . . . . . . . . . ....
.
..... ..... .....
.
. ........ ........ . .... ....
........ ........ . ..... ..
..... ......................
. .... ..... .....
.
❢
❢
❢
Figure 2.5: Trading strategy II.
Figure 2.4: Trading strategy I.
Unfortunately, it involves a certain opportunity of loss as well. If a pair
has a tendency to move between 0 and +Γ or between 0 and −Γ, we might
not be reversing our position at all, whereas strategy I will take on and flatten a position time and again and make money. This is shown in figures 2.6
and 2.7.
.
...
.....
. ......
................
..... ...
... .......
... ... ........
..
..... .....
...... ... ......
..... ......
.
.
.
...
.... ..........
...... ...
............ ......
. . ... ....................... . . . . ..... . . . . . ................ .......... ..... . . . ... . . . ................ . . . ...... . . ..... . . . . . . . . . .
...
...
..
...
...... ..... ... ....
..
........ ...... ...
.
...
.......
...
...
.......... ..... ..................
............. . ....
.
..... .....
..... ...
...
......
........ . ...
....
..
.. .
...
.......... ...
... ........ ...
.. ...... .. ...
..
.
..
.
.
.....
.
.
.
... . .. ...
. ....... ..
......
..... ..... ...
......
.
.
.
.............
.
.
.
........ ...
... .... ... ......
... .... ..
......
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... .....
........... . .. ....
....... ..
.
..... ...
..... .
................. ....
...........
........ ........ ..........
...
......
.... ............... ..
.................
................... ...
...
.....
. ... .. ..
....
..... ..
.............. . ...
..
.. .... ......
....... .... .... ..
... ...
...
.
.
. ...
... . ...
.. ... ...... ........
...
............ ........
........ ...
.........
...
.... ...
......... ..
.....
...
....... ....
.
.
....
... .....
...
.
.....
... ......
...
......
.....
... ......
...
.....
...... ......... .....
...
..... ........
..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. .. . . . . ....
............
...........
............
.......
❢
❢
❢
❢
❢
❢
❢
❢
.
...
.....
. ......
...............
..... ....
... .......
... ... .......
..
..... .....
...... ... ......
..... ......
.
.
.
...
.... ........
...... ...
............ ......
. . ... ........................ . . . . . . . . . . . ................ .......... ..... . . . . . . . ................ . . . ...... . . . . . . . . . . . ... . .
...
..
....... ..... ... ....
..
........ ...... ...
...
.......
.... .
........ ..... ..................
............ . ....
.
.....
..... ...
.....
........ .
... ....
..
.. .
...
..........
..........
.. ...... ..
..
.
..
.
.
.
... .....
.
.
.
.
... .
. .......
......
.
..... .....
.
.
.
.
.... ...........
.
.............
.
.
.
........
... .... ......
... ....
.......
.
... ...
.
.
.
.
.
.
.
.
.
.
.
.
... .....
........... .. ....
.......
.....
......
..... .
.
... ..
.
.
.
.
.
.
.
.
.
.
.....
........ .. ...
........
...
......
.
.
.
.... ....
.
.
.
.
.
.
.
......
..........
.. ... ..
.
.....
...
.
.
.
.
.
.
.
.
...
........ ... ... ..
... ..
..
..........
.
.
.
.
.
.
.
.
.
.
.
... .. .... .......
... ..
...
.............
.
........ ...
.
..........
...
......
.
.
......... ..
.
.
.
.
.
. .
...
....
..... ......
... .... ... ...
...
......
... ...... ...
...
.....
........
... .... ..
... ...... ........ .... .
........ ......... ..........
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................ .. . . .... . .
............
...........
............
.......
❢
❢
Figure 2.6: Trading strategy I.
Figure 2.7: Trading strategy II.
In this report we will use a modified version of strategy II.
13
2.2
Data
The price data which IMC uses is provided by Bloomberg. Bloomberg is a
leading global provider of data, news and analytic tools. Bloomberg provides
real-time and archived financial and market data, pricing, trading, news and
communications tools in a single, integrated package to corporations, news
organizations, financial and legal professionals and individuals around the
world.
Historical closing prices of stocks are easily extracted from Bloomberg to
Excel. One issue has to be considered, namely dividend. Companies normally pay out dividend to its shareholders every year or twice a year, some
companies pay out dividend four times a year. The amount of dividend is
subtracted from the stock price at the day the dividend is paid out, called
going ex-dividend. This usually results in a twist in the price process like in
picture 2.8.
60
55
¿
50
45
.
...........................
.....
.................
...
...
....
...
...............
...
... .......
......................
.
.
.
.
........ .......... .... .....
.
.
......... .. ................
..............
..........
.....
......
.
.....
........
........
...
.......... ................
.....
.
...
.
.
.
...
...
....
...
..
...
...... ...
...
... ......
... ........
..........
...
40
t
Figure 2.8: Dividend.
It is unlikely that different companies go ex-dividend at the same day. So
the closing prices of stocks have to be corrected for dividend, to make a good
comparison with other stocks. In this report we will assume that the dividend is re-invested in the stock. So it is not just adding the dividend up with
the closing price, it is a growing amount proportionally to the growth of the
stock price.
14
Example Consider the following the ex-dividend dates and amounts of a
certain stock.
date
amount
04/28/2003
1.20
04/30/2004
1.40
04/29/2005
1.70
Suppose we want to use data of this stock starting from 03/01/2004. So we
extract from Bloomberg the closing prices from this date forward, actually
we start at 03/02/2004 because the first of March was a Sunday. From
03/02/2004 until 04/29/2004 we use exactly these prices, the first ex-dividend
is not used. On 04/30/20004 the stock is ex-dividend for the amount of 1.40.
We calculate what percentage this is of the stock price and from this date
forward we keep multiplying the closing prices from Bloomberg with this
percentage until the next ex-dividend date. Then we calculate the percentage
of the dividend amount and adding it up to the percentage before, this is
shown in table 2.3.
2.3
Properties of pairs trading
Pairs trading is almost cash neutral, we do not have to invest a lot of money.
We use the earnings of short selling one stock to purchase the other stock.
This usually does not exactly sum up to zero, to be precise it sums up to ±Γ,
a small positive or negative amount compared to the stock prices. An other
aspect that makes pairs trading not entirely cash neutral is short selling.
Short selling is selling something we do not have. The exchange on which
we trade will want to be sure that we will not go bankrupt. We need to put
money, called margin, aside to secure the exchange there are no risks involved
with short selling. Normally, this margin is a percentage of the value of the
short sale, typically between 5 and 50, depending on the credibility of the
short seller. IMC’s costs for short selling are relatively low, so pairs trading
is almost cash neutral.
15
Table 2.3: Calculation of closing prices corrected for dividend.
date
Bloomberg
03/02/2004
44.00
03/03/2004
43.37
..
..
.
.
04/29/2004
43.85
04/30/2004
43.04
05/01/2004
42.90
..
..
.
.
4/28/2005
4/29/2005
4/30/2005
..
.
51.44
50.11
50.64
..
.
dividend
..
.
factor
1
1
..
.
our prices
44.00
43.37
..
.
1.40
..
.
1
1+1.40/43.04=1.03
1.03
..
.
43.85
1.03*43.04=44.33
1.03*42.90=44.19
..
.
1.70
..
.
1.03
1.03*51.44=52.98
1.03+1.70/50.11=1.07 1.07*50.11=53.62
1.07
1.07*50.64=54.18
..
..
.
.
Pairs trading is also market neutral: if the overall market goes up 10% it
has no consequences for the strategy and profits of pairs trading. The 10%
loss in the short stock is compensated by a 10% gain in the long stock, and
the other way around if the overall market goes down. We do not have a
preference for up or down movements, we only look at relative pricing.
How to make money with pairs trading was explained in the example in
paragraph 2.1. The amount of money made by trading a pair is a measure
for the quality of a pair. Obviously, more money is better! We make profits
if the spread oscillates around zero often hitting Γ and −Γ. An important
issue for the traders is that the spread should not be away from zero for a
long time. Traders are humans and they tend to get a bit nervous if they
have a big position for a long time. There is a chance that the spread will
never return to zero and in that case it costs money to flatten the position.
Example Consider figure 2.9 of the spread of pair X, Y . We put on a position the first time the spread hits −Γ, because there Y is cheap relative to
X in our opinion. We reverse our position at +Γ and again at −Γ, making
a profit of at least 4Γ. Then we like the spread to go to +Γ, but the spread
is going further and further away from zero not knowing if it will ever come
16
back. At this time, our portfolio is worth less than when we put it on: the
value of the long position in Y becomes less because Y is getting cheaper
(relative to X) and the value of the short position in X is getting less because X is more expensive now (relative to Y ). So, if we want to flatten our
portfolio we have to sell Y for less than we bought it and/or buy X for more
money than we sold it.
❢
.
. . . . . . . . . . . . . . . . . . . . . . . . . ...................... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..... . . ....... ............. ............ ...........
....................... ... ................... .......
......... ... ....... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. ........
......... ....
....
. .. . .... ...
......
..... ............ ......
......
... ..
..
. . ....
....
.
. . .................... . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ . . . . . . . . . . . . . . . . .
.
...
......
...
...
.. .
.........
. ...........
......
...
...
...
...
...
... ......
... .......
................. .
.. ................ .
... ...............
. .... .
❢
❢
Figure 2.9: Spread st walks away.
In conclusion, a good pair has a spread that is rapidly mean-reverting and
the price processes of the stocks in the pair are tied together, they can not
get far away from each other.
2.4
Trading strategy
In this section we describe how the parameters in the introductory example
(section 2.1) are determined. Then a few adjustments are made to strategy II, to get the final trading strategy that resembles the strategy from
IMC. Finally, we give the assumptions made for applying this strategy.
Parameters Assume we have two datasets of closing prices of two different
stocks X and Y for a certain period, roughly two years, which are corrected
for dividend:
xt and yt , for t = 0, ..., T.
17
The first half, t = 0, ..., ⌊T /2⌋, is considered as history and is used to determine the parameters ratio r̄ and threshold Γ.
The second half, t = ⌊T /2⌋ + 1, ..., T , is considered as the future and is
used to determine the profit or loss that would be made trading the pair
{X, Y } with these parameters.
The ratio r̄ is the average ratio of Y and X of the first half of observations:
⌊T /2⌋
X xt
1
r̄ =
.
⌊T /2⌋ + 1 t=0 yt
The threshold Γ is determined quite easily, we just try a few on the ’history’
and take the one that gives the best profit based on the ’history’. We calculate
the maximum of the absolute spread of the first half of observations, denoted
as m:
m = max (|yt − r̄xt |, t = 0, ..., ⌊T /2⌋).
t
The values of Γ that we are going to try are percentages of m. Table 2.4
shows the percentages and the outcome for the introductory example of paragraph 2.1, where m = 2.01. Because of rounding to two digits it looks like
there are several values of Γ which give the same largest profit, but Γ = 0.40
gives the largest profit.
The profit is calculated by multiplying number of trades minus one with
two times Γ, except when no trades were made then the profit is just zero.
It is the minimal profit if you always trade one spread, in this example one
Y and 1.36 X. The first trading instance is to put on a position for the first
time, denoted by t1 , then we do not make a profit yet:
t1 = min (t, such that |st | ≥ Γ).
The succeeding trading moments are:
If stn ≥ Γ:
If stn ≤ −Γ:
tn+1 = min (t, such that t > tn , st ≤ −Γ).
tn+1 = min (t, such that t > tn , st ≥ Γ).
18
Table 2.4: Profits with different Γ.
percentage
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
Γ
trades
0.10
15
0.20
9
0.30
9
0.40
7
0.50
3
0.60
3
0.70
3
0.80
3
0.90
3
1.00
3
1.10
3
1.20
3
1.30
2
1.40
2
1.51
2
1.61
2
1.71
1
1.81
0
profit
2.81
3.21
4.82
4.82
2.01
2.41
2.81
3.21
3.61
4.02
4.42
4.82
2.61
2.81
3.01
3.21
0
0
To determine Γ we simply take the one that has the largest profit based
on the history, but in practice we do not take Γ larger than 0.5m. This
profit is a gross profit, no transaction costs are accounted. We neglected
the transaction costs because it turned out they hardly had any influence on
the value of Γ. This is because IMC does not trade one spread, which in
this example was 1 Y and 1.36 X, but they trade a large number of Y and
X, for example 1,000 Y and 1,360 X. The costs that IMC makes consists
of two parts, a fixed amount a plus amount b times the number of traded
stocks. The costs of trading 1,000 Y and 1,360 X would be 2a + 2, 360b. We
always trade the same amount, no matter the value of Γ, so the costs per
trade for all Γ are exactly the same. So the more trades the more costs, but
the costs are really small compared to the profit. When the profits for the
different thresholds are not too close to each other, the Γ when considering
19
the net profits is the same Γ when neglecting the costs. Unfortunately of all
the pairs considered in this report, the pair from table 2.4 is the only one
where accounting transaction costs would have made a difference. There are
three thresholds, 0.30, 0.40 and 1.20, which result in almost the same profits.
Therefor accounting the transaction costs would resulted in the threshold
with the lowest number of trades, Γ = 1.20. In the remainder of this report,
we will neglect transaction costs.
Modified trading strategy There are pairs of stock that work quite well
for a certain time but then the spread walks away from zero and starts to
oscillate around a level different from zero. We can see an example in figure 2.10. If we do not do anything, we are probably going to have a position
for a long time which is not desirable as explained in paragraph 2.3. The
figure shows us that the relation between the stocks in the pair has changed,
the ratio r̄, determined by the past, is not good anymore. It would be a waste
to lose money on these kind of pairs by closing the position or to exclude
them from trading. A better way is to replace the average ratio r̄ with some
kind of moving average ratio.
5
0
−5
..
..
..............
..
....... .... . .
.. .. ..... .. .. ... .....
... ... ............... ...... ..
.............. ........................ ...... ....... ... ...
.....
.... .... .... ................................... ........... .... ........... ........................ ..... .... .... ..........
...........
..... ..... ........ ...... ..................... ..... ... ..... ......... ... .... ... .......
... ..
..
........ .. ..
..
........... ........ ........ ... ... ....... .............. .. . .....
.... .. .........
.... .....
......... ...........
.......
...... .... . ....
. ..
.. ....
........
...... ...... ...............
... ........
... ... .....
.
.
. .....
.
.
.
.
.
. . . . . . ........ . . ....... . ............ ........ . . ...... . . . . . . . . ...... . . . . . . . . . ..... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
...................... ........... .
.. ........ .. .. ........
.............
..
..........................................................................................................................................................................................................................................................................................................................................................................
.. ..
.
. ...
....... ..... ......... ..
...
..
. ....... . . . . . . . . . . . . . . . . . . . . ............. . ........ . . ...... ......... ............... ...... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . ... ..
..... ... ..... ..
..... ..
........... ........
... ..
.... ........ .
..
... ...
... ...
... ..
... ...
.........
.......
....
.
0
100
200
300
400
500
Figure 2.10: Spread oscillates around a new level.
Assume we have a dataset of closing prices, the first half is used in the exact
same way as described before. So we have the average ratio r̄ and threshold Γ.
The backtest on the second half of the data set is slightly different because
we use a moving average ratio r̃t , instead of r̄, to calculate the spread.
20
The moving average ratio we use, is:
r̃t = (1 − κ) r̃t−1 + κ rt ,
t = ⌊T /2⌋ + 1, ..., T,
with r̃⌊T /2⌋ = r̄ and where rt is the actual ratio:
rt =
yt
,
xt
t = 0, ..., T.
The parameter κ is a percentage between 0 and 10% and is determined very
simple with the first half of the data set. We count how many trades were
made in the first half and use table 2.5 to find κ.
Table 2.5: Determining κ.
# trades
>15
10-15
8,9
7
6
5
κ
0
1
2
3
4
5
# trades
4
3
2
1
0
κ
6
7
8
9
10
If there were a lot of trades in the first half of observations we do not expect
to need a moving average ratio, the table motivates this. The use of a moving
average ratio and this way of determining its value, has some disadvantages
which will be discussed later on.
So the first half of the data set determines three parameters: Average ratio r̄,
threshold Γ and adjustment parameter κ. In the second half of the data set,
the new spread is calculated as:
s̃κ, t = yt − r̃t xt .
Trading the pair goes in the same way as described before, the difference
is the position in X is not equal to r̄ anymore but it is equal to r̃t . The
following example will make this more clear.
21
We take the pair from figure 2.10, available are 520 closing prices of the
two stocks. The first half of observations gives us three parameters:
r̄ = 1.86,
Γ = 0.77,
κ = 5%.
First we look at what the strategy without the modification does on the second half of observations, table 2.6 shows the trading instances. Two trades
are made with a total profit of ¿ 1.88. The strategy with the modification
works better, 7 trades with a total profit of ¿ 5.21. Table 2.7 shows all trading instances. The table also shows that the position in stock X is not longer
constant in absolute sense. For example, with trade number 1 we put on a
position of +1 Y and -1.85 X because r̃t at this time is 1.85. With the second
trade we flatten this position and put on a position the other way around, but
now r̃t is 1.81 so in total we sell 2 shares of stock Y and buy 1.85+1.81=3.66
shares of stock X. The profit of these two trades is calculated with the position that is flattened, i.e., (51.81-48.70)+1.85*(26.80-28.06)=0.77.
Table 2.6: Trading instances strategy II.
trade
t
st
position (Y,X) price y price X
1
263 -1.10
(+1,-1.86)
48.70
26.80
2
285 0.78
(-1,+1.86)
52.33
27.74
total profit
profit
1.88
1.88
Table 2.7 also shows that not all profits per trade are larger than Γ, one trade
gave a relatively large loss. This happens because the ratio when the position
was put on, differs a lot from the ratio when this position is reversed. The
ratios differ a lot because the actual ratio rt is moving a lot. We can see
all the ratios in figure 2.11. The solid line is the actual ratio rt , the dashed
line is the moving average ratio r̃t and the straight dotted line is the average
ratio r̄.
22
trade
1
2
3
4
5
6
7
Table 2.7: Trading instances modified strategy.
t
seκ, t position (Y,X) price Y price X
ret
profit
263 -0.99
(+1,-1.85)
48.70
26.80
1.85
281 1.07
(-1,+1.81)
51.81
28.06
1.81 0.77
358 -0.82
(+1,-1.97)
51.52
26.56
1.97 -2.43
392 0.93
(-1,+1.96)
56.38
28.23
1.96 1.57
407 -0.94
(+1,-1.98)
55.45
28.52
1.98 1.52
459 0.97
(-1,+1.98)
55.27
27.47
1.98 1.92
476 -1.31
(+1,-1.99)
57.20
29.38
1.99 1.86
total profit
5.21
2.1
2.0
1.9
1.8
1.7
.............
.
..
......
........... ... . .....
.... .. .....
. . ..
..
...
................... ......... .... ....... ... .. ....... ... ............. ..
......
... .. ... .. ...... ... .. ..
..........
... .... .... ....... .... ............ .... .......
.... .... .. ........ ................................................................ .............
... ........ ..........
.
.
.
.
.
.. .......... ... . . ..... .. .....
...... ......... ....... ........
... . .......... ........... ...
....... ...
... ..
. .. ........... ........................... ....... ...............................................
........
... ....
... ... .
........... ...
.... ..
..... .... ..... . ..
......... ...........
.........
....
. ..... ..
..............
.. ...... .
... ......... .........
.
......... . .....
.
.
............
.......... ...
..
..... ................. ...
...
.
... ... . ....
...
.... ..........
.
...
. ....... ...........
.
.
.. .
....... . . . . ...... . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
......... .. ....
... .. .. ..
.... .... ... ....
..... ..........
.... ...
... ...
... ...
... ...
......... ...
.. .....
...
300
350
400
450
Figure 2.11: Ratios rt , ret and r̄.
23
500
Figures 2.12 and 2.13 show the spread calculated with the average ratio r̄
and calculated with the moving average ratio r̃t with κ = 5% respectively.
4
2
0
−2
−4
. .......
...... ..
.................
..
..... ... .......... . .. .... .... .......
........ ...... ... ...
........
................ ..................... ........... ........ ......... .....
.... .... ... ..... ....... ........ ...
....
. . . .. .. . .
...
.......
......
.. ... ... ........... ...... .....
... .. .. .. . ............... ... ..
...
.......
... ....
... .. .. ............. ....... ......... .. .. .. .............. ... ..
... ... ... .........
..... ........ ....... ....... ........ ................................. ......... ..... ....... ....... ...
......... ....
.
..
.... ... ...... ........... .. .
.....
........... ... ....
.....
...... ........
........
.. .... .........
...... .. .. .....
..... ....
....... ..........
.......... .... . ......
......... ..........
... ....
...... ..... .......
.........
..
..... ..... ...............
. ..
... ... ... ..
... .... ...
.
.
.
.
.
. . . . .... . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
...
..
...
............................................................................................................................................................................................
...
.
..
...... . ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..... ..
...... ...
... ...
... ..
... ...
... ...
... ..
... ...
... ...
....... ..
........
.....
.....
..
4
2
0
−2
.....
..................
..............
........... ....
.... ..
...
..... ...
......
.. ..
.....
.. ..
.. ....
... .......
.
. .
.. ..
.... ....
....... .. .....
..
.... .. ........... .... ........ ... ..
...
..
.
. . . ..... . . ....... ................... ..... .... .............. ......... . . . . . ............. .... . .. . . . . ..... . ...................... . . . . . . . . .
..
..
. .... ....
... ......... .... .... .... ..... ..
.
....
. ............... .. ...... . ...... .... ..... .................. ......... ........ ........ .....
... ... ..
..
.....
...... ..... ....... .......... .. .. .. ........ ........ .. .. .. ....
.
.
.
.
.
..................................................................................................................................................................................................................................
. .. ...
... ..... ........ .. .... ... .. ..... ........ ..... ..
...
.
..
.......
.... .......... .
.... ..... .... ........
...
..
... .............
..
..
......... . ...... . . . . . . . . . . . . . ......... ............ . . . . . ........ . . ... . .. . ..... . . . ............. ...... . ................
...... ..
... ...... ...... ....
.
.
.
.... ..
.
...............
...
... ...
.
... ...
... ..
.......
............
.......
−4
300 350 400 450 500
300 350 400 450 500
Figure 2.12: Spread st .
Figure 2.13: Spread s̃κ, t , κ = 5%.
From figure 2.10 it is clear that the average ratio r̄ does not fit anymore,
around t = 300 the stocks in the pair get another relation. Replacing the
fixed average ratio r̄ by an moving average ratio r̃t resolves this. As we saw
in the example we can lose money if the moving average ratio, used to calculate the spread, differs a lot between trades. If there is some fundamental
change, such a trade will happen once or twice and the loss that is made will
be compensated by good trades from that moment on. The advantage of
the modified trading strategy is when the relation between stocks in a pair
changes in some fundamental way as in the example above i.e., the spread
is oscillating around a new level, we are still able to trade the pair with a
profit instead of making a loss by closing the position and exclude the pair
from trading.
When there is no such fundamental change but we use the modified strategy,
with κ > 0, it is possible we throw away money with each trade. This happens if the moving average ratio differs a lot between each two succeeding
trades. We consider an example, suppose we have 520 observations.
24
The first half is used to determine the three parameters r̄, Γ and κ:
r̄ = 1.00,
Γ = 0.62,
κ = 7.
Figures 2.14 and 2.15 show the spread for the second half of observations
calculated with the average ratio r̄, which is the same as κ = 0, and κ = 7
respectively.
2
0
−2
.
..
.
........ ..
..........
.. ..................... .
.. .......................
..................................
.......... ..........................................
.
.
.
. . .
. .....
.........
. ...................................... . ....... ...
... ............
.........
. .........................
... .................... ... .... ..
...
... ..... ... ........ .. .... ....
... .....................
...
..................... .... .........
...
.
..... ........................
.
.
.
.
.
.
...
.. . .
...
..... . .
.................. ..
.....................
.
.
. . .............................. . . . . . . . . . .... . . . . . . . . . . . . . ..................... . . . . . . . . . ..... . . . . . . . . . . . . . .
...
............. ..
........... ...
...
.
.
.
.
. .... .... ..
..
...
.
.
.............................. .
.
..........................................................................................................................................................................................................................................................
...
...................
... .
...
.
.
.
............
...
... ........... ....
..
........ ... ......................
.
.
. . . . . . . . . . . . . . . ...... . . ... . .......................... . . . . . . . . . . . . . . . . .... . . . . . ............................. . .............
...
...
.. .............
..... ...................... ..
.
.
.
.
...
...
. . ..
.... .........
... . .. ................ .....
... . ..............................................
..... .... .......... ....
... .... ..... ............. .... ......
.....................
....... ..............
....................
.........................
...... ..
...... ..
.....
...
...
..
300
400
2
0
−2
500
Figure 2.14: Spread s̃κ, t , κ = 0.
.. ..
. ..... ... .... ... ..
.. .
. ..
. .... ............ .......... ...... ....
..... . .. .. . .. ......
. . .............................................. ............................................... . . . . . . . . . . . ......................................................................................................... .... ...................... .. . . . . . . . ... . . .......... ... . .
... ..... ................ ...... ........... .... .....................................
........ ...................
..................................................................
...................... ............ .. ... ............... ........ ................. ...
.......................... ...... . ...... .... ......
................... ...........
.............................................................................................................................................................................................................................................................................................................................................
........
..
. . ..... ..
...... .....
. ... .
.
............................ .
.
.......
.
.
.
.
.... ...
...
.......
........... ...
. . . . . . . . . . . . . . ....... . . . ................. .. . .. . . . . . . . . . . . . . . . . . ..... ... . . . ....... . . . . . . . . .
...
.....
......
. ...
.
.
.
.
.
... .....
...
..
.... ...........
.... .........
.... .......
.... .........
... ...
... ...
... .....
... .....
... .....
... ......
.......
... ....
............
......
..........
......
.
......
.....
..
.
....
...
300
400
500
Figure 2.15: Spread s̃κ, t , κ = 7.
Trading the spread with κ = 0 results in four trades with a total profit of
¿ 5.69. However trading the spread with κ = 7 results in five trades with a
total loss of ¿ 4.03, table 2.8 shows the corresponding trading instances.
In this example there is a loss with every trade if we use κ = 7, but we make
a substantial profit when we use κ = 0. This is a bit of an extreme example
but what is often seen is that when there is no fundamental change between
the stocks in the pair, the profit is less when using the modified strategy
(κ > 0) then the original strategy (κ = 0). This is a big disadvantage of
the modified strategy, it is at least very difficult to determine if the relation
between the stocks is fundamentally changing. In spite of this disadvantage
we use the modified strategy because we do not want to exclude pairs like in
figure 2.10, we are willing to give up some profit on pairs who do not change
much.
25
trade
1
2
3
4
5
Table 2.8: Trading instances modified strategy.
t
seκ, t position (Y,X) price Y price X
ret
profit
270 0.71
(-1,+1.01)
10.72
9.94
1.01
328 -0.63
(+1,-1.18)
10.80
9.72
1.18 -0.30
378 0.72
(-1,+0.92)
9.93
10.50
0.92 -1.25
449 -0.87
(+1,-1.18)
11.59
10.55
1.18 -1.21
487 0.63
(-1,+0.90)
9.54
9.88
0.90 -1.27
total profit
-4.03
Assumptions We apply the trading strategy to historical closing data to
see if trading a pair of two stocks would have been profitable. This assumes
that we could have traded on the closing price and that there was no bidask spread. It also assumes we could have traded every amount we wanted,
including fractions. If it is decided to start trading a specific pair, it is going
to be traded intra-day, so it would probably be better to apply the trading
strategy to intra-day data but that kind of data is difficult to get and is
difficult to handle. With real life trading the number of stocks have to be
integers. The assumption that we are allowed to trade fractions is not that
bad because when trading a pair it is about large quantities so we can round
the number of stocks to an integer without completely messing up the ratios.
2.5
Conclusion
In this chapter we have derived a trading strategy that resembles the strategy
IMC uses. It is not necessary anymore to do a fundamental analysis to find
out if a pair of two stocks is profitable to trade as a pair. We can apply the
trading strategy on historical data and see if we would have made a profit
if we actually traded the pair. In this way IMC identified a lot of pairs.
We would like to see if we can identify pairs in a more statistical setting,
again using historical data of two stocks, not to estimate profits, but to see if
the two time series exhibit behavior that could make them a good pair. We
will examine the concept of cointegration, but first we need some time series
basics.
26
Chapter 3
Time series basics
This chapter discusses briefly some basics of time series which we will need
for later purposes. More information can be found in [2] and [3].
White noise A basic stochastic time series {zt } is independent white noise,
if zt is an independent and identically distributed (i.i.d.) variable with mean
0 and variance σ 2 for all t, notation zt ∼ i.i.d(0, σ 2 ). A special case is Gaussian white noise, where each ut is independent and has a normal distribution N (0, σ 2 ).
Stationarity A time series {zt } is covariance-stationary or weakly stationary if neither the expectation nor the autocovariances depend on time t:
E(zt ) = µ,
E(zt − µ)(zt−j − µ) = γj ,
for all t and j. Notice that if a process is covariance-stationary, the variance of
zt is constant and the covariance between zt and zt−j depends only on lag j.
For example, a white noise process is covariance-stationary. Covariancestationary is shortened by stationary in the remaining of this report.
A stationary process exhibits mean reverting behavior, the process tends
to remain near or tends to return over time to the mean value.
27
MA(q) A q-th order moving average process, denoted MA(q), is characterized by:
zt = µ + ut + θ1 ut−1 + θ2 ut−2 + · · · + θq ut−q ,
(3.1)
where {ut } is white noise (∼ i.i.d(0, σ 2 )), µ and (θ1 , θ2 , . . . , θq ) are constants.
The expectation, variance and autocovariances of zt are given by:
E(zt ) = µ,
γ0 = (1 + θ12 + θ22 + · · · + θq2 )σ 2 ,
½
(θj + θj+1 θ1 + θj+1 θ2 + · · · + θq θq−j )σ 2 if j = 1, . . . , q,
γj =
0
if j > q.
So an MA(q) process is stationary.
AR(1) A first-order autoregressive process, denoted AR(1), satisfies the
following difference equation:
zt = c + φzt−1 + ut ,
(3.2)
where {ut } is independent white noise (∼ i.i.d(0, σ 2 )). If |φ| ≥ 1 , the consequences of the u’s for z accumulate rather than die out over time. Perhaps it
is not surprising that when |φ| ≥ 1, there does not exist a causal stationary
process for zt with finite variance that satisfies (3.2). If |φ| > 1 the process
zt can be written in terms of innovation in the future instead of innovations
in the past, that is what is meant by ’there does not exist a causal stationary
process’. If φ = 1 and c = 0 the process is called a random walk. When
|φ| < 1, the AR(1) model defines a stationary process and has an MA(∞)
representation:
zt = c/(1 − φ) + ut + φut−1 + φ2 ut−2 + φ3 ut−3 + · · · .
The expectation, variance and autocovariances of zt are given by:
µ = c/(1 − φ),
γ0 = σ 2 /(1 − φ2 ),
γj = (σ 2 φj /(1 − φ2 )),
28
for j = 1, 2, . . .
AR(p) A p-th order autoregressive process, denoted AR(p), satisfies:
zt = c + φ1 zt−1 + φ2 zt−2 + · · · + φp zt−p + ut .
(3.3)
Suppose that the roots of
1 − φ1 x − φ2 x2 − · · · − φp xp = 0,
(3.4)
all lie outside the unit circle in the complex plain. This is the generalization of
the stationarity condition |φ| < 1 for the AR(1) model. Then the expectation,
variance and autocovariances of zt are given by:
µ = c/(1 − φ1 − φ2 − · · · − φp ),
γ0 = φ1 γ1 + φ2 γ2 + · · · + φp γp + σ 2 ,
γj = φ1 γj−1 + φ2 γj−2 + · · · + φp γj−p ,
for j = 1, 2, . . .
If equation (3.4) has a root that is on the unit circle, we call that a unit root
and the process that generates zt a unit root process.
Information Criteria In chapter 4 we want to fit an AR(p) model on a
given dataset, with p unknown. An information criterion is designed to maximize the model fit while minimizing the number of parameters, in our case
minimizing p. The criterion assigns a value to each model depending on the
model fit and the number of parameters in the model. The better the model
fit is, the smaller the value will be. The more parameters are used, the larger
the value will be. The model with the smallest value is most suitable for the
data according to that criterion. There are several information criteria, they
differ in the penalty they give to each extra parameter and therefore have
different properties.
The Akaike information criterion (AIC) formula is:
AIC(k) = −2 log L + 2k,
(3.5)
where k is the number of parameters, and L is the likelihood function. The
likelihood function assumes that the innovations ut are N (0, σ 2 ).
29
The log likelihood for an AR(k) model is given by:
T
T
1
log(2π) − log(σ 2 ) + log |Vk−1 |
2
2
2
1
(zk − µk )′ Vk−1 (zk − µk )
−
2σ 2
T
X
(zt − c − φ1 zt−1 − · · · − φk zt−k )2
−
2σ 2
t=k+1
log L = −
where σ 2 Vk denotes the covariance matrix of (z1 , z2 , . . . , zk ):
E(z1 − µ)2
E(z1 − µ)(z2 − µ) · · · E(z1 − µ)(zk − µ)
E(z2 − µ)(z1 − µ)
E(z2 − µ)2
· · · E(z2 − µ)(zk − µ)
2
σ Vk =
..
..
..
.
.
···
.
E(zk − µ)(z1 − µ) E(zp − µ)(z2 − µ) · · ·
E(zk − µ)2
and µk denotes a (k × 1) vector with each element given by
µ = c/(1 − φ1 − φ2 − · · · − φk ),
and zk denotes the first k observations in the sample, (z1 , z2 , . . . , zk ) and T
denotes the sample size.
The first term in (3.5) measures the model fit, the second term gives a penalty
to each parameter. The Akaike information criterion is calculated for each
model AR(k), with k = 1, 2, . . . , K. The k with the smallest value AIC(k),
is the estimate for the model order.
Two other information criteria are the Schwarz-Baysian and the HannanQuint information criteria.
The Schwarz-Baysian information criterion (BIC) formula is:
BIC(k) = −2 log L + k log(T ),
where T denotes the number of observations in the data set.
The Hannan-Quint information criterion (HIC) formula is:
HIC(k) = −2 log L + 2k log(log(T )).
30
First difference operator The first difference operator ∆ is defined by:
∆zt = zt − zt−1 .
I(d) A time series is integrated of order d, written as yt ∼ I(d), if the series
is non-stationary but it becomes stationary after differencing a minimum
of d times. An already weakly stationary process is denoted as I(0). If a
time series generated by an AR(p) process is integrated of order d, than its
autoregressive polynomial (equation (3.4)) has d roots on the unit circle.
Unit root test Statistical tests of the null hypothesis that a time series
is non-stationary against the alternative that it is stationary are called unit
root tests. In this paper we consider the Dickey-Fuller test (DF) and the
Augmented Dickey-Fuller test (ADF).
Dickey-Fuller test The Dickey-Fuller test tests whether a time series is
stationary or not when the series is assumed to follow an AR(1) model. It is
named after the statisticians D.A. Dickey and W.A. Fuller, who developed
the test in [4].
The assumption of the DF test is that the time series zt follows an AR(1)
model:
zt = c + ρzt−1 + ut ,
(3.6)
with ρ ≥ 0. If ρ = 1, the series zt is non-stationary. If ρ < 1, the series zt is
stationary. The null hypothesis is that zt is non-stationary, more specific zt
is integrated of order 1, against the alternative zt is stationary:
H0 : zt ∼ I(1) against H1 : zt ∼ I(0),
which can be restated in terms of the parameters:
H0 : ρ = 1 against H1 : ρ < 1,
under the assumption that zt follows an AR(1) model.
31
The test statistic of the DF test S is the t ratio:
S=
ρ̂ − 1
,
σ̂ρ̂
where ρ̂ denotes the OLS estimate of ρ and σ̂ρ̂ denotes the standard error for
the estimated coefficient.
The t ratio is commonly used to test whether the coefficient ρ is equal to
ρ0 when the time series is stationary, i.e. ρ < 1. Then the test statistic
ρ̂ − ρ0
,
σ̂ρ̂
has a t-distribution. But we do not assume that the time series is stationary,
because the null hypothesis is that ρ = 1. So, the test statistic S does not
need to have a t-distribution. We need to distinguish several cases to derive
the distribution of the DF test statistic.
Case 1 :
The true process of zt is a random walk, i.e. zt = zt−1 + ut , and we estimate
the model zt = ρzt−1 +ut . Notice that we only estimate ρ and not a constant c.
Case 2 :
The true process of zt is again a random walk and we estimate the model
zt = c + ρzt−1 + ut . Notice that now we do estimate a constant but it is not
present in the true process.
Case 3 :
The true process of zt is a random walk, but now with drift, i.e. zt =
c + zt−1 + ut , where the true value of c is not zero. We estimate the model
zt = c + ρzt−1 + ut .
Although the differences between the three cases seem small, the effect on
the asymptotic distributions of the test statistic are large, as we will see in
chapter 5.
32
Augmented Dickey-Fuller test The Augmented Dickey-Fuller test tests
whether a time series is stationary or not when the time series follows an
AR(p) model. One of the assumptions of the Augmented Dickey-Fuller test
is that the time series zt follows an AR(p) model:
zt = c + φ1 zt−1 + · · · + φp zt−p + ut .
(3.7)
Like the regular Dickey-Fuller test, we test:
H0 : zt ∼ I(1) against H1 : zt ∼ I(0).
The null hypothesis is that the autoregressive polynomial
1 − φ1 x − φ2 x2 − · · · − φp xp = 0,
has exactly one unit root and all other roots are outside the unit circle.
Then the unit root cannot be a complex number, because the autoregressive
polynomial is a polynomial with real coefficients and if x = a + bi is a unit
root than so is its complex conjugate x̄ = a − bi. This contradicts the null
hypothesis that there is exactly one unit root. Two possibilities remain, the
unit root is -1 or 1. The first possibility gives an alternating series, which is
not realistic for modeling the spread (this becomes more clear in the chapter
of cointegration). Thus the single unit root should be equal to 1, which gives
us
1 − φ1 − φ2 − · · · − φp = 0.
(3.8)
The AR(p) model (3.7) can be written as:
zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut ,
(3.9)
with
ρ = φ1 + · · · + φp ,
βi = −(φi+1 + · · · + φp ),
for i = 1, . . . , p − 1.
The advantage of writing (3.7) in the equivalent form (3.9) is that under
the null hypothesis only one of the regressors, namely zt−1 , is I(1), whereas
all of the other regressors (∆zt−1 , ∆zt−2 , . . . , ∆zt−p+1 ) are stationary. Notice
33
that (3.8) implies that coefficient ρ is equal to 1. This leads to the same
hypotheses as with the regular Dickey-Fuller test:
H0 : ρ = 1 against H1 : ρ < 1,
and the same test statistic:
S=
ρ̂ − 1
.
σ̂ρ̂
To derive the distribution of the ADF test statistic we need to distinguish
the same three cases as above, but now in the appropriate AR(p) form. As
we will see in chapter 5, the distributions are the same as DF distributions
without any corrections for the fact that lagged values of ∆y are included in
the regression.
One last note: If the null hypothesis that zt is non-stationary cannot be
rejected, it does not necessarily mean that zt is generated by a I(1) process. It may be non-stationary because it is generated by a I(2) process
or by an integrated process of an even higher order. The next step could
be to repeat the procedure but this time using ∆yt instead of yt . That is,
to test H0 : ∆yt ∼ I(1) against H1 : ∆yt ∼ I(0) which is equivalent to
H0 : yt ∼ I(2) against H1 : yt ∼ I(1), and so on.
34
Chapter 4
Cointegration
Empirical research in financial economics is largely based on time series.
Ever since Trygve Haavelmos work it has been standard to view economic
and financial time series as realizations of stochastic processes. This approach allows the model builder to use statistical inference in constructing
and testing equations that characterize relationships between economic and
financial variables. The Nobel Prize of 2003 for economics has rewarded two
contributions, the ARCH model and cointegration from Robert Engle and
Clive Granger.
This chapter discusses the concept of cointegration and two methods for testing for cointegration, the Engle-Granger and the Johansen method. Other
methods are described in, for example, [13] and [14]. In the last section
of this chapter a start is made with an alternative method. In this report
this alternative method is used for generating cointegrated data but not for
testing for cointegration, although this is possible.
4.1
Introducing cointegration
An (n × 1) vector time series yt is said to be cointegrated if each of the series
taken individually is I(1), integrated of order one, while some linear combination of the series a′ yt is stationary for some nonzero (n × 1) vector a,
named the cointegrating vector.
35
Cointegration means that although many developments can cause permanent
changes in the individual elements of yt , there is some long-run equilibrium
relation tying the individual components together, represented by the linear
combination a′ yt .
A simple example of a cointegrated vector process with n = 2, which was
taken from [1], is:
xt = wt + ǫx, t ,
yt = wt + ǫy, t ,
wt = wt−1 + ǫt ,
where error processes ǫx,t , ǫy,t and ǫt are independent white noise processes.
The series wt is a random walk, so xt and yt are I(1) processes, though the
linear combination yt − xt is stationary. This means yt = (xt , yt ) is cointegrated with a = (−1, 1).
Figure 4.1 shows a realization of this example of a cointegrated process, where
the error processes are standard Gaussian white noise. Note that xt and yt
can wander arbitrarily far from the starting value, but xt and yt themselves
are ’tied together’ in the long run. The figure also shows the corresponding
spread yt − xt of the realization.
.
.
.....
.....
.....
.. ..
....
.....
.. ................... .
.............
.........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. . .. . ...
... ..........
. ..... ..........
....................... ........... ...
.......... ..... ...... .
... ....
... ... .......
............
.
......... . .. .......... ....... .......... .... ....... ...... ...............
........ .
...... . ........ .... .....
...
...... ...... ......... ..............
....
.
.
........ .. ... ...
............... .....
. ............. .... ...........
.
..... .. .....
..... ..... ......
.... .... .........
.... ........
.
.
.
.
.
.
.
.
.
.
.
.... ....
.
.
. .....
..
...... .....
...........
.....
.... .. ..
..................
. .....
.........
. ..... .... ...
.
.....
.........
......
......... ..... ....... .... ..
.... ..
......
.
................... ......................... ....
.... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ....... ... ..... ............ . ........ ........
.
.
.
.
.
.
.
.
.
.
.
.
.
.. .......
..... . ...................... ..
....... ......
.................
.........
.
.
.
.
.
.
.
.
.
.
...... .
.........
.....
..
.
....
.
.
...
..
...
.....
.
..
.....
.
.....
.... ....
....
..
....
......
..
.
.
.
........
........ ........
.....
.
.
...
......
......
....
......
......
..... ........
......
....... ......
.
.
.
.
.
.... .....
.....
....... ... .
.... .....
.......
.......
....... ......
..
......
..... ....
.....
.... . ..........
..... ..... .. ... ..... . .....
.....
... ..
....
.. ..... ....
.... ... .. .... ..
....
.... .... .... .... .... .. .. ...
....
. .. .. ...
......
..... .... ...............
........ ................ ... .....
....... ......... ............ ..... .............. ... ....
.......
.... .... ..... ........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . ... . ....
. ... .. . . . .
. . ... . . . .
.
.. . .
. ..
...... ... ... ........ ... ... ......... ... ... .. ... ... ...... ... .
...... ... ......... ... .. .... ... ... ... ...... ....... .... ...... ........... ..... ...... ... ... ... ...
...... ... ... .. ... ... ......... ... ... ..... ... ........ ..
...... .. ....... .. .. ... ... ..... .. .. .... .... .. . ..... .. ..... ..... ... ... ... ...
.............................................................................................................................................................................................................................................................................................................................................................................
....... .... ....... .... ... ........ .............. .... .......... ........ ........ .. ............ ... ........ ........ ........ ........ ... ....... ....... .... .... ....... ........ .... ........ .... ........ .... ....
..... .... .. .. .... ... ......... .. .. .... .... .... .. .... ... .. ..... .... .........
..... ..... .... ... .. ....
....... .... .. ..
...... .... ..... ...... ..... ..... . ..... .. ... .. .... .........
...... ......
.... ... ...... ......
...
......
......... ...... ......
...... ..
... ..
. ....
.. .... ....
.... . ..... ..... ..... ......
...
....
... .
...
........... ....... .
..... ..... ..... ....
.... ......
.....
.
..
..
...... ... ...... ......
...
......
... ........ ...
...... .. ..... ...
.
.
....
... .
......
...
..
...
..
Figure 4.1: Realization of cointegrated process and spread of realization.
36
Correlation Correlation is used in analysis of co-movements in assets but
also in analysis of co-movements in returns. Correlation measures the strength
and direction of linear relationships between variables. If xt denotes a price
process of a stock, the returns ht are defined by
ht =
xt − xt−1
,
xt−1
with log(1 + ǫ) ≈ ǫ as ǫ → 0, we can approximate this by:
¶
µ
xt − xt−1
xt
xt
.
=
− 1 ≈ log
xt−1
xt−1
xt−1
Correlation can refer to co-movement in the stock returns and in the stock
prices themselves, cointegration refers to co-movements in the stock prices
themselves or the logarithm of the stock prices. Cointegration and correlation are related, but they are different concepts. High correlation does not
imply cointegration, and neither does cointegration imply high correlation.
In fact, cointegrated series can have correlations that are quite low at times.
For example, a large and diversified portfolio of stocks which are also in
an equity index, where the weights in the portfolio are determined by their
weights in the index, should be cointegrated with the index itself. Although
the portfolio should move in line with the index in the long term, there will
be periods when stocks in the index that are not in the portfolio have exceptional price movements. Following this, the empirical correlations between
the portfolio and the index may be rather low for a time.
The simple example at the beginning of this section shows the same, that
is, cointegration does not imply high correlation. For illustration purposes it
is convenient to look at the differences, ∆xt and ∆yt , instead of the returns
or xt and yt themselves because in this example they do not have constant
variances. The variance of ∆xt is
Var(∆xt ) = Var(xt − xt−1 )
= Var(ǫt + ǫx, t + ǫx, t−1 )
= σ 2 + 2σx2 ,
where σ 2 , σx2 , and σy2 denote the variances of ǫt , ǫx, t and ǫy, t respectively.
37
In the same way,
given by
Var(∆yt ) = σ 2 + 2σy2 . The covariance of ∆xt and ∆yt is
Cov(∆xt , ∆yt ) = E(∆xt ∆yt ) − E(∆xt )E(∆yt )
= E(ǫ2t ) − 0
= σ2.
The correlation between the difference processes is
Cov(∆xt , ∆yt )
Corr(∆xt , ∆yt ) = p
Var(∆xt )Var(∆yt )
σ2
.
= q
(σ 2 + 2σx2 )(σ 2 + 2σy2 )
The correlation between ∆xt and ∆yt is going to be less than 1, and when
the variances of ǫxt and/or ǫyt are much larger than the variance of ǫt the
correlation will be low while xt and yt are cointegrated.
The converse also holds true: there may be high correlation between the stock
prices and/or the returns without the stock prices being cointegrated. Figure 4.2 shows two stock price processes which are highly correlated, namely
0.9957. The correlation between the returns is even equal to 1. But the price
processes are clearly not cointegrated, they are not tied together, instead
they are diverging more and more as time goes on. So, correlation does not
tells us enough about the long-term relationship between two stocks: they
may or may not be moving together over long periods of time, i.e. they may
or may not be cointegrated.
Looking from a trading point of view, the ’pair’ in figure 4.2 is not a good
one. Figures 4.3 and 4.4 show the spread calculated with the average ratio
r and calculated with a 10% moving average ratio ret respectively. In figure
4.3 it is clear that this ’pair’ is not a good one, because the spread is not
oscillating around zero. Figure 4.4 looks better, but actually we are loosing
money with nearly every trade because the ratios when positions were put
on differ a lot from the ratios when the positions were reversed. The ratios
differ a lot because the actual ratio rt is moving a lot, which is due to the
divergence between the stock prices. So, correlation is not a good way to
identify pairs.
38
100
50
.. ....
.......... ............
.....
.......
.......... ..............
................. . .
.
.
.
.
.
..... .. ...
....... ... ..
.
........................................
....
.......................
.
..
...... ....
.
.
.
.
.
.
.
.
.
......... ............. ...........
..............
............ ........... ............
..........
..................... . ....
. ....
.........................
........ ...............................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...... ..
.... ...........
.......................... ......
. ........... ....
..............................................................................
.
0
Figure 4.2: Highly correlated stock prices.
5
0
−5
...
.........
.... .
..............
....
........ .
............
. .....
.......
............. ...
...... .......
.................
..........
..... ..... .
........ ......
...
........
. ...
..
......................
. ... ..... .......
..... ........ . .....
............. ...
.. ... ..
... .
........................
. ..
...... .
.........
1
0
−1
Figure 4.3: Spread st , r = 0.76.
..
....
.......
..
........
..
...
..
.
.... ... ....
...
..
............. .............. ........ ............ ........... .... ....
...... ...
..
..
....... .... .......... ......... ... .......... ......... .....
..... ..
... ..
..
.................... ..... ... ... ............... . .......... ..... .................. ................ .... ........ ........ .... .....
...
.
.
...... ... .. . ..... ..... ....................................................... ...... . .... .... ...... ......... ... ................ ... ...... ... ... ........ ... ...
......... ................. ..................... ........ ...... ...... ........... ................... ............. ..... ...... ......... ............ .............. ...... ... .... . ... ......
..........
..... ... .... ....... ..... . ............ .. ..... ... .......... .......... .. ..... ............ ......... .... ... ......
... ..
..... ....
.................
........... ... ... ... ........ ............ ... ...............
.......... ........
.
.
... ......
...
.
...........
..............
...... ..........
...
..
. ........
.. ....
..........
.
.
.
.
.......
.
.
.
.......
.....
.
.
.
.....
...
..
Figure 4.4: Spread seα,t , α = 10%.
A better way to identify pairs is with cointegration, because we would like
the stock prices in a pair to be tied together. If two stocks in a pair are cointegrated, a certain linear combination of the two is stationary. This implies
that the spread, defined with the cointegrating vector a instead of the average ratio r̄ or moving average ratio r̃t , is mean-reverting. In paragraph 2.3
was explained that this property is an important one.
4.2
Stock price model
In the preceding section cointegration was introduced, the question remains
how to test for cointegration. The test should be preceded by examining if
each component of yt is I(1) because that is a requirement in the definition
of cointegration. Several books and articles are written about modeling stock
prices. In this section we will derive a commonly used model which can be
39
found in, among others, [7]. This model is famous for the use in option valuation. We will use this model to show that the logarithm of stock prices are
integrated of order one and to show that it is more or less justified to assume
stock prices themselves are integrated of order one.
In figure 4.5 the daily closing prices of Royal Dutch Shell are plotted. The
figure shows the jagged behavior that is common to stock prices.
30
¿
25
20
.
.....
.... ...
....
...
.....
. .. ......
.
.......
...... ...........
..........
... ... ....
.
.. .
..
... ...
........ .... ....... ....
...................... ............... .............
.
......... .......
... .......
. ....... .... ......
. .. ... ..... ... ..... ..
.... .... . ................
...... ...
........ ... ...
. .......
..... .... ...... ..... .... ...... ...... ....
...
... ..
..... ..
..
.....
......
.. ...
...
....
....
.
.
.......
...
.
.
.
.
.
...... ...
...
....
..... ........ ....... .. ....
..... . ........... ...........................
.
.......
...
.
.....
.
..
.... ..... ..... ................................... ...................
......
.... . .
..... ... .. . ..... ...... ..... .......... ............
... ..... ................................
Figure 4.5: Daily Royal Dutch Shell stock prices.
We first examine the returns of the Royal Dutch Shell stock. Figure 4.6
shows the estimated density of the daily returns with the N (0, 1) density
superimposed, figure 4.7 the empirical distribution function and figure 4.8
the normal QQ-plot. The daily returns were normalized to
ĥt =
ht − µ̂
,
σ̂ 2
where µ̂ and σ̂ 2 are the sample mean and sample variance. These figures
suggest that the marginal distribution of daily returns of the Royal Dutch
Shell stock is Gaussian. The QQ-plot indicates that the match is least accurate at the extremes of the range, the returns have fatter tails than the
normal distribution. Figure 4.9 shows the sample autocorrelation function of
the daily returns. The bounds ±1.96T −1/2 are displayed by the dashes lines,
here T = 520. The figure strongly suggests that the returns are uncorrelated.
Although uncorrelated does not implies independence, we suggest that for
modeling xt we take the returns as normally distributed i.i.d. samples, because the autocorrelation function for a sample from an i.i.d. noise sequence
looks similar as figure 4.9.
40
0.4
0.2
0.0
.......
.... .........
...
.
.. .. ....... .......
.......
.......
.
.
...
.
........
......
....
....
.....
.......
......
.
.
.
... .
........
......
.
.
.
....
.
... ..
.........
.
... ...
.
.
........
... .
.
.
.....
... ...
.
.... .
.
.....
.
..... .
.
.
.
.........
....
.
.
.
.....
......
......
.
.
.
........
.
.
.
.
.
.
....
..................
.
.
.
.....
.
.
.
.
.
.
.
........................
.
.
.
.
.
.
.
.
.
.
.
.
........ .....
....................................................
............................................... ..
−2
0
2
4
Figure 4.6: Estimated density.
1.0
.........................................................
....................
................
.......
............
.
.
.
.
.
...........
..... .
..........
......
......
.
.
.
.
...
........
.......
....
............
.
.
.............
...
...........
........
............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..............................
0.5
0.0
−2
0
2
4
Figure 4.7: Empirical distribution function.
..... .....
.....................
.................
.............
..............
..................
.
.
.
.
.
.
.
.
.
.............
.........
...............
...............
................
.
.
.
.
.
.
.
.
.
.
..............
.............
........
........
.........
.
.
.
.
.
.
.
.
.
.
.
...............
...........
...........
..........
.........
.
.
.
.
.
.
.
.
.........
...............
......................
........ ........
................
.
.
.
.
.
.
.
.
.
................
2
0
−2
−2
1.0
0.5
0.0
0
2
4
Figure 4.8: Normal QQ-plot.
...
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
....... .......... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
... .. ..
. .
.................................................................................................................................................................................................................................................................................................................
..
..
..
.
.. ..
.
.
.
....... ....... ....... ....... ....... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... . ....... ....... ....... ....... ....... ....... .......... .......
0
5
10
15 lag 20
Figure 4.9: Autocorrelation function.
41
25
Given the stock price x(0) = x0 at time t = 0, we like to come up with
a process that describes the stock price x(t) for all times 0 ≤ t ≤ T . As
a starting point for the model we note that the value of a risk-free investment D, like putting money on a savings account, changes over a small time
interval δ as
D(t + δ) = D(t) + µδD(t),
where µ is the interest rate.
There is something that is called the efficient market hypothesis that says
that the current stock price reflects all the information known to investors,
and so any change in the price is due to new information. We may build this
into our model by adding a random fluctuation to the interest rate equation.
Let t = iδ, the discrete-time model becomes:
√
x(ti ) = x(ti−1 ) + µ δ x(ti−1 ) + σ δ ui x(ti−1 ) ,
(4.1)
where the parameter µ > 0 represents an annual upward drift of the stock
prices. The parameter σ ≥ 0 is a constant that determines the strength of
the random fluctuations and is called the volatility. The random fluctuations
u1 , u2 , . . . are i.i.d N (0, 1). Notice that the returns [x(ti ) − x(ti−1 )]/x(ti−1 )
indeed form a normal i.i.d sequence.
We consider the time interval [0, t] with t = Lδ. Assume we know x(0) = x0 ,
the discrete model (4.1) gives us expressions for x(δ), x(2δ), . . . , x(t). To derive a continuous model for the stock price, we let δ → 0 to get a limiting
expression for x(t).
The discrete model says that
√ over each time δ the stock price gets multiplied by a factor 1 + µ + σ δui , hence
x(t) = x0
L
Y
√
(1 + µδ + σ δui ).
i=1
Dividing by x0 and taking logarithms gives
log
µ
x(t)
x0
¶
=
L
X
√
log(1 + µδ + σ δui ).
i=1
42
We are interested in the limit δ → 0, we exploit the approximation
log(1 + ǫ) ≈ ǫ − ǫ2 /2 + · · · for small ǫ.
log
µ
x(t)
x0
¶
≈
L−1
X
i=0
√
1
µδ + σ δui − σ 2 δu2i .
2
This is justifiable because E(u2i ) is finite. We have ignored terms that involve
the power of δ 3/2 or higher.
The expectation and the variance are:
√
1
1
E(µδ + σ δ − σ 2 δu2i ) = µδ − σ 2 δ ,
2
2
√
1 2 2
var(µδ + σ δui − σ δui ) = σ 2 δ + higher powers of δ .
2
The Central Limit Theorem, which can be found in section 5.1, suggest that
log(x(t)/x0 ) behaves like a normal random variable:
¶
µ
¶
µ
1 2
x(t)
2
∼ N (µ − σ )t, σ t .
log
x0
2
The limiting continuous-time expression for the stock price at fixed time t
becomes:
1
x(t) = x0 e(µ− 2 σ
2 )t+σ
√
tW
,
where W ∼ N (0, 1).
For non-overlapping time intervals, the normal random variables that describe the changes will be independent. We can describe the evolution of the
stock over any sequence of time points 0 = t0 < t1 < t2 < · · · < tm by
1
x(ti ) = x(ti−1 )e(µ− 2 σ
√
2 )(t −t
i
i−1 )+σ ti −ti−1
Wi
.
(4.2)
This model guarantees that the stock prices is always positive, if x0 > 0.
Model (4.2) is used a lot and is often referred to as geometric Brownian motion.
We like to model the daily closing prices, we assume that the time intervals ti − ti−1 are equally spaced. That is, we set the time between Friday
43
evening and Monday evening equal to the time between Thursday evening
and Friday evening. We can write (4.2) as
1
xt = xt−1 e(µ− 2 σ
2 )δ+σ
√
δ ut
,
(4.3)
with δ equal to 1/260, because there are approximately 260 trading days in
a year. This is basically the same as the discrete model (4.1).
From this model follows that log xt is integrated of order one, because
√
1
log xt = log xt−1 + (µ − σ 2 )δ + σ δ ut ,
2
hence
√
1
log xt − log xt−1 = (µ − σ 2 )δ + σ δ ut
2
= constant + Gaussian white noise.
This difference process of log xt is I(0) and because the process log xt itself
is not, it follows that log xt is I(1). This is one of the reasons why cointegration tests are also applied to the logarithms of stock prices. Unfortunately,
translating cointegration between the logarithm of two stocks into a trading strategy is less intuitively clear then translating cointegration between
two stock prices themselves. When there is cointegration between the stock
prices, trading the pair is very obvious. Let yt = (xt , yt ) be two stock prices
processes which are cointegrated with cointegrating vector a. We ’normalize’
this vector to (−α, 1), so yt − αxt is a stationary process with mean zero,
which means that yt is approximately αxt . It could be that there is a constant in the cointegrating relation, than yt − αxt does not have mean zero.
This will be discussed in the next section, for now we assume that the mean
is zero. We treat yt − αxt as our spread process described in chapter 2, so we
trade pair (x, y) in the constant ratio α : 1. This is exactly the same as the
trading strategy, if we do not use the average ratio to calculate the spread
but the least squares estimator.
If the logarithms of the stock prices xt and yt are cointegrated with cointegrating vector b, we normalize this to (−β, 1), then log yt − β log xt is a
stationary process. So log yt is approximately β log xt , we cannot trade logarithms of stocks so we like to know the relation between xt and yt .
44
Let εt denote the residual process:
log yt − β log xt = εt .
The relation between xt and yt becomes:
yt = xβt eεt .
It is not clear how we can trade this relation, not with the strategy from
chapter 2. This is the reason why we want to test for cointegration on the
stock prices and not on their logarithms, in order to do that we need xt and yt
to be integrated of order one. In chapter 9 we will make an attempt to come
up with a trading strategy if we have cointegration between the logarithms
of the stock prices.
Model (4.3) does not imply that xt is I(1), this is more easily seen in (4.1).
The difference is
√
xt − xt−1 = µδxt−1 + σ δut xt−1 ,
this has not got a constant expectation, so according the derived stock price
model the difference process is not I(0). Fortunately, we look at the stock
prices {x}Tt=0 for fixed T , µ is a small number between 0.01 and 0.1 and
typical values of σ are between 0.05 and 0.5, so it is not likely that xt−1
becomes very large or very small. That is why the differences divided by the
mean value of xt−1 look a lot like the returns:
xt − xt−1
xt − xt−1
≈
.
xt−1
x̄t−1
The returns are I(0), this indicates that the difference process ∆xt are also
more or less I(0) and stock price process xt more or less I(1). We consider
a realization of model (4.3), where we take µ = 0.03, σ = 0.18 and x0 = 20
shown in figure 4.10. The differences of this realizations is shown in figure
4.11, which looks like pretty stationary. This indicates that realizations of
model (4.3) behave like they are I(1), while strictly under the model they
are not.
45
22
20
18
.........
.
....
.............
......
......... ...
.........
.... ..... .... ..
.... ... ....
.. .. .... .
.. .... ........ .......
. ....... ..
...
....
... ...........................
..... ... ....
..
..
...................... . ..........
....
.... ....
..........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. .
.
... .
.. .
............ .......
.. .
..... ..
...... .....
........
.......
... .......... ....................... ...... ....
...............
...................
..
. .....
............ ........... .... ......
....... .....
.. .....
... ......
........
.... ....
......... ..........
... .... ..
..
.
.
............
.
.
.
.
...... ......... .... ..... ...
.. ..
.
.
.
.
.
.
....
.
.
.
.. .
.
.
.
..... ...
...
.... .... .. .............
....... .. .....
... .. . ...... ........
..
...
. .. ... .. ..
...
... .. ....
. ..
.... ......... ..... .
...
..
... ....
...
... ...........
.....
...
....... .. .....
..
....
............
...... ...... ... ............ .. ...........
..
......... ..........
... ...
...... .......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. ... ...
................... ..
.....
... ..
... .......... . ..... .. ........ ..... ..................
..
.... .
...
.. ..................
... .......
...... ................................. . .................. ...... ........
.
.... . . ........... ....... ... . ... .....
...
.
...
.......
..
.
. ........
... ................ ....
...........
..... . ..... ........ ..
.. .................. . ... ...
.. ... .. .. ..
.....
... ......
.....
........
..
Figure 4.10: Realization of model (4.3), µ = 0.03, σ = 0.18.
1
0
.
...
...
.
....
.. ...
.
...
...
.
... ...
...
... ...
...
...
.
......
...... ...
.. ....... ...
..
...
.
.... . ... ...
.
.
.
.
....
.... ...
....
.
........ ....
.. ..
.
.. .. ... ..
..
. . .
. ... ... .... .
....
.... ..
..... .. . . .. .. ...... .. .
.. ..... ..
. . ... .....
...
.. .. . .. . .
.. . ....... ... . ... ..... . . ... ... .. .
....... .. .... .... . ... .......... .. .......... ....... ......... .... .................. ....... .. .... ... ...... . ... ... ......
....................... ... . ..... ... .. ....................... ... ..... ..... ............ ....... ... ................... ... ...................... .. ......... ......
...... ........ ........... ........ ... .. ...... ............ ....... ................. .......... ............ ...... .. .................... .......................... ............. ....... ...... ... .... ....................... ... .. .................................... ....... ... .............. ..... ........................... . ..... ...... ... ............... ........... .... ............................... ....................................... ... ............. ......
............... ................................................. ............ ..... .............................................. ......................... ............................................... ............... .............................................. ........... ................................ ........ ..................................................................................................... ....... .............................................................................................................................................. ..................... ........ ........................................... ................................................................ .................................
............ ...................................................................... ............................ ................................. .......... ......................... ..... ...... .......................... .............. ................................................. ................................... .... .................................................................... .............. ..... ................... ...... .............. ...........................................
..................... ................................................ ............................... ............................................................................. ...... ..................... ......... ................................ ................. ............. ........................................ ........................................................................... ....... ........................................ ............................................................ ........ ......................... .............................. ...... ....................................................
................... ................ ....... ...... ..... ...... .......... .... ................................... ............ ............... .. ............. ...... ... ...................................... ....................... ...... ................ .......................... ................................................................ ....... .... ..... ......................... .......................... ......................................................... ....... ....... ........ ........ .....................................................
.... ... ........ ........ ..... ...... ... ..... ............. .. ........ .... .. .. . .. . .. ...... .. .. .. .... ..... . . ...... .. . .... ... ........ ......... ..
......................... ....... . ..... .... .... ...... .... .... ........ ....................... ...... ... ........
.... ........... .. .... ... .. . .. .... . .. .. .... ... ..... . ... .. ......... ....... ..
... ... ........ ............. ... ......
. .. . . . . . ... . . .... .. ...
...
.....
..
. ......... .....
...
.. ..
....
... ........
.. ........... .
.... ... ..
. . ... .. ........ .. ...
.
...
.....
....
...
..
. ..
. ...
... ..........
... ...
...
....
...
.
...
... .
.. ...
..
...
...
..
... .
..
.
...
.
...
.
−1
Figure 4.11: Differences of realization of model (4.3).
An other way to show that it is justifiable to assume the stock prices are
integrated of order one, is to examine the differences instead of the returns.
At the beginning of this section we examined the returns of Royal Dutch
Shell, let us do the same for the differences ∆xt = xt − xt−1 . Figure 4.12
shows the estimated density of the daily returns with the N (0, 1) density
superimposed, figure 4.13 the empirical distribution function and figure 4.14
the normal QQ-plot. The daily differences were normalized to
c t = ∆xt − µ̂ ,
∆x
σ̂ 2
where µ̂ and σ̂ 2 are the sample mean and sample variance of the differences.
Figure 4.15 shows the sample autocorrelation function of the differences.
These figures look pretty much the same as the figures for the returns, this
suggests that it is justifiable to see the differences of a stock price process as
normal distributed i.i.d samples. This implies that the differences are I(0)
and the stock prices I(1).
46
0.4
0.2
0.0
...............
....
...
.
...
......... ....... .......
.
.
.......
...
...
.....
......
....
.. ..
.....
.......
......
.
.
...
.
......
... ....
.
.....
.
. .
... .
.......
.......
.
.
.
...
........
.
.
... ...
......
... ..
.
.
.....
.... .
.
.
.
.
.
........
...
.
.
......
.
.
.......
......
.
.
.......
..
.
.
.
........
.
.
......
.
..............
.
.
.
.....
.
.
.
.
......................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............ .....
...................................................
....... ......................... ....... ..
−2
0
2
4
Figure 4.12: Estimated density.
1.0
........................................................................
.................
.................
....... .
............
.
.
.
.
...........
.... ..
.........
........
.......
.
.
.
.
..
......
............
.. ...
...........
.
.
...........
.........
.............
.........
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...........................
0.5
0.0
−2
0
2
4
Figure 4.13: Empirical distribution function.
.....
.......
...........................
.................
................
..............
.....................
.
.
.
.
.
.
.
.
.
.......
..........
................
..............
....................
.
.
.
.
.
.
.
.
.
.. ..
..............
................
.........
.........
............
.
.
.
.
.
.
.
..............
.................
................
........
..........
.
.
.
.
.
.
.
.
....
............
........................
......... .......
....... .......
........ .............
.
.. ........
2
0
−2
−2
1.0
0.5
0.0
0
2
4
Figure 4.14: Normal QQ-plot.
...
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
....... .......... ........ ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
... .. ...
. .
.......................................................................................................................................................................................................................................................................................................................
..
..
.
.
. .. .
..
.
....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .. ....... ....... ....... ....... ....... ....... .......... .......
0
5
10
15 lag 20
Figure 4.15: Autocorrelation function.
47
25
So far we have discussed why it is likely that stock price processes are integrated of order one, but we can also do unit root tests on the data we
want to test for cointegration. The unit root test we use in this report is the
(Augmented) Dickey-Fuller test, introduced in chapter 3. The first test is:
H0 : xt ∼ I(1) against H1 : xt ∼ I(0).
The outcome should be to not reject H0 . The second test is:
H0 : xt ∼ I(2) against H1 : xt ∼ I(1),
which is equivalent to:
H0 : ∆xt ∼ I(1) against H1 : ∆xt ∼ I(0).
The outcome of this second test should be to reject H0 , which makes is likely
that the price processes are I(1). Which case of the DF-test should be used
is discussed in the next section, the critical values of these tests are derived
in chapter 5 and the results of these tests for data used in this report are
stated in chapter 7.
4.3
Engle-Granger method
The question remains how to test for cointegration. There are several methods for testing for cointegration. R.F. Engle and C.W.J. Granger were the
first to develop the key concepts of cointegration, which can be found in [5].
They received the nobel prize in economics in 2003 for their work on cointegration and ARCH models. The approach of testing for cointegration will be
to test the null hypothesis that there is no cointegration among the elements
of an (n × 1) vector yt . Rejection of the null hypothesis is then taken as
evidence of cointegration.
The Engle-Granger test is a two-step process, which should be preceded by
examining if each component of yt is I(1) which was discussed in the previous
section. Let us assume that this condition is fulfilled. A vector process yt is
cointegrated if there exists a linear combination of its components a′ yt that
is stationary. The first step in the Engle-Granger test is to estimate a, this is
done with an OLS (Ordinary Least Squares) regression. The second step is
48
to test whether the residuals of the regression are stationary using a DickeyFuller test. Because if the residuals are stationary, the linear combination
a′ yt is stationary, which means yt is cointegrated with cointegrating vector a.
Looking from a pairs trading point of view we have two stock prices processes, yt = (xt , yt ). We like xt and yt to be cointegrated such that the
spread εt = yt − αxt oscillates around zero, again we have ’normalized’ the
cointegrating vector a to (−α, 1). A stationary process has constant expectation but it is not necessarily equal to zero. In order to get a stationary process
with mean zero, we can include a constant in the cointegration relation such
that the spread becomes:
εt = yt − αxt − α0 .
For example, consider the pair (xt , yt ) which is generated with the relation:
yt = 2xt + 20 + εt ,
so α = 2 and α0 = 20. Figure 4.16 shows xt and yt .
150
100
50
.
....... .......
. ....... ........ ..... .......
........................ ....... ...
........
.........
.. .. ..
... .........
.......................
.............
.
.
.
.
.
.
.
.
.
.
.
.
...
....
..
.................................
.........................
... ...
............. ....... .......
.......
...
.
.
.
.
.
.
.
. ........... ..... ................. .........................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...... ...........
.................. .........
.................... ..........
.
.
..............
.
.
.
.
.
.
.
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............. ......
..
.
. . ... ....... .......................
. . ....
........................... ............ .. .............................
................
.............................. .............................................................. ........................ ................... .............
............................ ................ .....................
.........................
...... ...........
.
... ......... ...............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................
...........................................
0
Figure 4.16: Paths xt and yt for α0 = 20.
Normally we do not know the exact value of α and α0 , so we have to estimate
them. According to the Engle-Granger method we do this with OLS. We have
two possibilities, regression with and without intercept. In the first situation,
regression with intercept, we get
α̂α0 = 2.01 α̂0 = 19.67.
In the second, regression without intercept:
α̂¬α0 = 2.41.
49
1
0
−1
..
..
...........
.
......
............
.
......... .
..............
......................
.
.................
...
.
.........................
... ..
....... .............
.
.....
......... ........
............. ..........
..
........ .........
....... ...
.
... .....
.
.
.
... .....
. ..
........... ...........
.....
..... .......
.... ......
......
.............
.......
.....
......... ..........
..
.....
....
..
....
...
......
....
...
.........
..
...
.
.
...
......
.
.
.
..........
.
...
....
.
..
..
....
...
..
.
......
......
....
...
.....
....
.........
.....
.....
..
....
...
.......
...
.........
...
...
.........
...
.......
..
.....
.
........
......
.........
.
.........
.
....
.
..
.......
..
.....
.....
......
......
.....
. ...
......
......
.....
......
.....
.....
.
...
....
..
....
.....
...
..
..
....
.....
........
...
.
.
...
.
.
.
.
.
......
...
.
..
....
...
.........
...
...
......
.....
....
...
....
.......
.....
........
........
......
.......
......
....
...
......
.....
...
..
..
.....
.
.
.
.
.
.
.
.
.
.
.
....
.
........
.
.......
..
........
...
............. .........
........ ........
....... .....
...... .........
....... .....
.... .
..... .........
........ ............
..... ..........
....... .........
... ....
....... ..............
......... .................
........ ......
.......................
..................... ....
............ ......
...............
...............
..................
......
....
................
.....
.
....
..
.
0
−5
−10
.
....... .. .
.. ... ....................
... ....... ......... ....
.... .............. ............
.. .. ....
.. ...
...
..
... ..
.....
................
...........
............
..
..
..
..
...
.. ..
..
..........
......
.. ...........
...
......................
..
.... . ......
... . .. .............. ...............
.
.
.........
............... . ...... .............. ......
.....
...
............. ..... ...... ..
....
...
...... ....... ..... ...
.....
..
...
.......
.
.
...
.. .....
..
.......
.
...........
.
....
....
.
.
.....
.
.
.
...... ...
..
...
... ..
.
..
.....
.....
.
.
...
..
............... .....
......
....
.................... ...
.......
..
......... .........
...
.......
.
......
....
.....
..........
.......
.......
.......
...
......
..... ...... .........
...... ........... ....
....... ................
... ..............
........ ......
...... .
.......
..
Figure 4.17: Spread with and without α0 , α0 = 20.
Figure 4.17 shows the corresponding spread processes. The left is yt − α̂α0 xt −
α0 , the right is yt − α̂¬α0 xt . The left figure of 4.17 looks a lot better but there
is a disadvantage. In section 2.3 about the properties of pairs trading was
described that pairs trading is more or less cash neutral. The trading strategy is cash neutral up to Γ if we neglect costs for short selling, in other
words each trade costs or provides us with Γ. The cash neutral property is
a property we like to keep. If we trade the spread from the left figure of
4.17 it is not cash neutral anymore. Assume that the predetermined threshold Γ is equal to 1. The first time the spread is above 1, the value of x is
¿ 43.73 and the value of y is ¿ 108.59, which gives that the spread at this
time equal to 108.59-2.01*43.73-19.67=1.02. Then we sell y which provides
us with ¿ 108.59 and buy 2.01 x which costs us 2.01*43.73=87.90. So we are
left with a positive difference in money of ¿ 20.69. The first time the spread
is below -1, the value of x is ¿ 52.66 and y is ¿ 124.51, which gives that the
spread at this time equal to 124.51-2.01*52.66-19.67=-1.01. Then we buy y
which costs us ¿ 124.51 and sell 2.01 x providing us 2.01*52.66=105.85. So
this trade costs us ¿ 18.66. This way of trading is not cash neutral, each
trade costs or provides us approximately α0 .
50
A possibility to resolve this is to neglect α0 , so we trade the spread from the
right figure of 4.17. In this example it is probably not worthwhile, because
the spread has a clear downward trend. Let us consider the two different
spreads for α0 = 1:
εt = yt − 2xt − 1,
shown in figure 4.18.
1
0
−1
.
...
............
......
.....
............
..
.............
............
.............
.... ..........
........... .......
....
.....................
..... .......
...
... .....
............ ........
............ .. ..........
..
. .....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
..
.. . ........
...... .......
.........
.......
.......
...
......
...... ......
.......
...
.......
...
...
...
....
...
....
....
..
......
......
......
......
..
.....
..
.
.
.
...
.
.
.
.
.
.
.
.
.
.
...
.....
.
.......
..
....
....
...
....
.......
........
........
..
...
....
..
.....
........
.......
......
.....
....
...
....
.....
......
.....
.......
.....
....
......
......
......
.......
. ...
.....
....
.
.
.
..
...
....
..
..
.......
...
..
.....
....
..
....
.....
......
.....
....
...
.....
......
.....
......
...
......
.......
......
.........
........
.
.
.
.
.
.
.
.
.......
.....
......
..
..
.....
....
....
......
..
........
....
...
........
......... ......
........... ........
....... ...........
........ .......
..... ..
.
.
... .......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.... .........
....... ..........
.... .....
........................
......... .............
........ ....
................. ..
....................
...................
............
........
..................
..
.....
.......
.
1
0
−1
..
...........
............................
................ .......
....
. ..
... ....
.
....
......
....... ...........
.............
.............
.... .........
...... ......
......................
........... ...............
.
.
..
.
.
.
.
.
.
.
.....
..... ....
...
..... . ....
...
....
....... ....
...
...... ........
....
...... ................
...
...
....
...
...
........
.
.
.
.
......
.
.
.
.
...
.....
...
.
..
.....
.......
.....
...
......
......
....
..
.......
.....
.......
...
.....
......
.....
..
........
..
...
...
....
.....
.......
.....
...
....
...
.......
.
.
.
.
.
.
.
.
.
.......
.....
..
...
...
....
....
....
........
......
....
.....
...
.......
......
..
.......
.......
..
...
....
.......
.....
...
..
.
....
.....
...
.
.....
.
.
.
.
.
.....
.
.....
.....
.........
........
......
....
.............
..
.....
..
......
....
...
...... ....
...
...
....
.......
.. .......
...
...
..
......
...... .......
.
.
.
...
..
.
........
......... .........
....
....
.... ......
.......... .......
......
.... ....
.......
...................
... ..
...
..
.......
......
..... .............
....
.
......
.
........ ............
........ ...........
.
........................
...... ...........
.........
........ ..............
.......
....................
..
.............. .
.......
..
..
Figure 4.18: Spread with and without α0 , α0 = 1.
Now the spread where α0 is neglected looks almost as good as the spread with
α0 . In conclusion, in order to keep the cash neutral property and the trading
strategy from chapter 2, α0 should be close to zero such that neglecting it still
gives a stationary spread process. So when testing real stock price processes
xt and yt for cointegration, we only estimate α and test the residual process
yt − α̂xt for stationarity. A suggestion for an alternative trading strategy is
stated in chapter 9, which is able to trade a pair when α0 cannot be neglected.
It is not standard to do OLS regression on non-stationary data. OLS regression applied to non-stationary data is quite likely to produce spurious
results. There is only one circumstance when the OLS estimation gives a
consistent estimate of the cointegrating vector and that is when there is a
cointegrating relation. Note that if εt = yt − αxt is stationary, then
T
T
1X
1X 2
P
εt =
(yt − αxt )2 → E(ε2t ).
T t=1
T t=1
51
(4.4)
By contrast, if (−1, α) is not a cointegrating vector between x and y, then
yt − αxt is I(1) and from proposition 2 in section 5.6,
Z 1
T
1 X
2 D
2
W (r)2 dr,
(yt − αxt ) → λ
T 2 t=1
0
where W (r) is standard Brownian motion which will be defined in chapter
5 and λ is a parameter determined by the autocovariances of ∆εt . Hence,
if (−1, α) is not a cointegrating vector, the statistic in (4.4) would diverge
to +∞. This suggests that we can obtain a consistent estimate of a cointegrating vector by choosing α so as to minimize (4.4). It turns out that the
OLS estimator for α, also when α0 is included in the regression, converges at
rate T . This is analyzed by Philips and Durlauf in [12].
Now that we have a method for estimating the cointegrating vector, the
second step in the Engle-Granger method is examining the residuals with a
Dickey-Fuller test. In chapter 3 was described that there are several cases,
so the question remains which case do we use when testing for cointegration.
In most literature about cointegration, it is not stated which case is used and
why, but from the critical values used can be seen that case 2 is used most
often. One discussion found in Hamilton [6], is the following:
Which case is the ’correct’ case to use to test the null hypothesis
of a unit root? The answer depends on why we are interested in
testing for a unit root. If the analyst has a specific null hypothesis about the process that generated the data, then obviously this
would guide the choice of test. In the absence of such guidance,
one general principal would be to fit a specification that is a plausible description of the data under both the null and the alternative.
This principle would suggest using case 4 for a series with a obvious trend and the case 2 for series without a significant trend. For
example, the nominal interest rate series used in the examples in
this section. There is no economic theory to suggest that nominal
interest rates should exhibit a deterministic time trend, and so a
natural null hypothesis is that the true process is a random walk
without trend. In terms of framing a plausible alternative, it is
difficult to maintain that these data could have been generated by
it = ρit−1 + ut with |ρ| significantly less than 1. If these data were
52
to be described by a stationary process, surely the process would
have a positive mean. This argues for including a constant term
in the estimated regression, even though under the null hypothesis
the true process does not contain a constant term. Thus, case 2
is a sensible approach for these data.
We do not have a specific null hypothesis, so according to this quote we
should use case 2 because there is no trend in spread processes. In the next
chapter we investigate the power of the three different tests, case 1 through
case 3, maybe we can find another reason to use Dickey-Fuller case 2.
So far we have looked at cointegration between two stocks because of pairs
trading. However, pairs trading with three of more stocks in a ’pair’ is very
interesting. Cointegration is defined for a (n × 1) vector yt and the trading
strategy is easily extended for three or more stocks. For example, consider a
pair of three stocks yt = (xt , yt , zt ) who are cointegrated with cointegrating
vector (−α1 , −α2 , 1). Then we calculate the spread as
st = zt − α1 xt − α2 yt ,
which we trade the same way as before. When the spread reaches Γ we sell
1 z and buy α1 times x and α2 times y. When the spread goes below −Γ we
reverse our position and lock in a profit of at least 2Γ. The threshold Γ is
determined in the same way as in chapter 2, we just try a few on historical
data and take the best one.
If the number of stocks in a pair is greater than two, n > 2, the Engle-Granger
method has a disadvantage. We estimate the cointegrating vector with OLS
regression, if yt = (y1t , y2t , . . . , ynt ) we regress y1t on (y2t , y3t , . . . , ynt ). So the
first element of the cointegrating vector is set to be unity. This normalization
is not harmless if the first variable y1t does not appear in the cointegrating
relation at all, in other words, its coefficients is equal to zero but is set to one.
A second disadvantage, which exist also when n = 2, is that the method
is not symmetric. Suppose n = 2, we regress y1t on y2t :
y1t = αy2t + ut .
We might equally well have normalized the coefficient of y2t , so the regression
would be
y2t = βy1t + vt .
53
Then the OLS estimate β̂ is not simply the inverse of α̂, meaning that these
two regression will give different estimates of the cointegrating vector. Thus,
choosing which variable to call y1 and which to call y2 might end up making
a difference for the evidence one finds for cointegration.
For these reasons we discuss the Johansen method in the next section. First
a summary is given for testing on cointegration with the Engle-Granger
method:
Given is (n × 1) vector yt = (y1t , y2t , . . . , ynt ).
Examine or assume that each individual variable yit is I(1).
Then yt is cointegrated if a′ yt is I(0) for some nonzero vector a.
Regress y1t on (y2t , . . . , ynt ), a constant maybe included but with our
trading strategy we do not want to include this constant. This regression gives the estimation â.
Then the residuals of this regression, which is our spread process, are
given by
e = y1t − â2 y2t − · · · − ân ynt ,
which resembles the real error process:
εt = y1t − a2 y2t − · · · − an ynt .
The Dickey-Fuller test assumes that εt follows an AR(p) model, with
an unit root and with or without a constant term. If we use case 2,
which is suggested by the above quote, we assume that the true model
does not have a constant but we include a constant in the estimated
model. So the Dickey-Fuller case 2 assumes the true model of εt is
εt = ρεt−1 + β1 ∆εt−1 + · · · + βp−1 ∆εt−p+1 + ηt ,
with ρ = 1.
To estimate p we fit an AR(k) model with OLS on et for k = 1, . . . , K.
The value of k with the smallest information criteria AIC(k), BIC(k)
and HIC(k) is the estimate for the model order p̂. If the information
criteria give different values, we take the rounded mean.
54
We use the AR(p̂) fit
et = ĉ + ρ̂et−1 + β̂1 ∆et−1 + · · · + β̂p̂−1 ∆et−p̂+1 + nt ,
to calculate the Dickey-Fuller test statistic
ρ̂ − 1
,
σ̂ρ̂
where ρ̂ is the OLS estimate of ρ and σ̂ρ̂ is the standard error for the
estimated coefficient.
Compare the outcome with the critical values of the Dickey-Fuller test.
The critical values of the Dickey-Fuller test will be derived and simulated in
the next chapter. Engle-Granger is a two-step method, first we do an OLS
regression and then a Dickey-Fuller test. In chapter 6 we will examine if the
first step influences the critical values, in other words, are the critical values
for Engle-Granger really the same as for Dickey-Fuller.
4.4
Johansen method
The Johansen method also known as ’full-information maximum likelihood’
was developed by Søren Johansen in [8] and [9]. This method allows us to test
for the number of cointegrating relations. An (n × 1) vector yt has h cointegrating relations if there exists h linearly independent vectors a1 , a2 , . . . , ah
such that a′i yt is stationary. If such vectors exist, their values are not unique,
since any linear combination of a1 , a2 , . . . , ah is also a cointegrating vector.
With the Engle-Granger method this was resolved by setting the first element in the cointegrating vector equal to one. As mentioned before this has
some disadvantages. In this section the Johansen method is summarized, no
proof or argumentation is given. These can be found in [8] and [9].
Let yt be an (n × 1) vector. The Johansen method assumes that yt follows a VAR(p) model
yt = c + Φ1 yt−1 + · · · + Φp yt−p + εt ,
where c is an (n × 1) vector and Φi is an (n × n) matrix.
55
(4.5)
Model (4.5) can be written as
yt = c + ρyt−1 + β 1 ∆yt−1 + · · · + β p−1 ∆yt−p+1 + εt ,
(4.6)
where
ρ = Φ1 + Φ2 + · · · + Φp ,
β i = − (Φi+1 + Φi+2 + · · · + Φp ) , for i = 1, 2, . . . , p − 1.
Subtracting yt−1 from both sides of (4.6) results in
∆yt = c + β 0 yt−1 + β 1 ∆yt−1 + · · · + β p−1 ∆yt−p+1 + εt ,
(4.7)
with
E(εt ) = 0,
½
Ω for t = τ,
E(εt ετ ) =
0 otherwise.
Johansen showed that under the null hypothesis of h cointegrating relations,
only h separate linear combinations of yt appear in (4.7). This implies that
β 0 can be written in the form
β 0 = −BA′ ,
(4.8)
for B an (n × h) matrix and A′ an (h × n) matrix.
If we consider a sample of T +p observations, denoted (y−p+1 , y−p+2 , . . . , yT ),
and if the errors εt are Gaussian, the log likelihood of (y1 , y2 , . . . , yT ) conditional on (y−p+1 , y−p+2 , . . . , y0 ) is given by
(4.9)
L(Ω, c, β 0 , β 1 , . . . , β p−1 ) =
T
Tn
log(2π) − log |Ω|
−
2
2
T
1X£
−
(∆yt − c − β 0 yt−1 − β 1 ∆yt−1 − · · · − β p−1 ∆yt−p+1 )′
2 t=1
¤
× Ω−1 (∆yt − c − β 0 yt−1 − β 1 ∆yt−1 − · · · − β p−1 ∆yt−p+1 ) .
56
The goal is to choose (Ω, c, β 0 , β 1 , . . . , β p−1 ) so as to maximize (4.9) subject
to the constraint that β can be written in the form of (4.8). The Johansen
method calculates the maximum likelihood estimates of (Ω, c, β 0 , β 1 , . . . , β p−1 ).
The first step of the Johansen method is to estimate a VAR(p − 1) for
∆yt . That is, regress ∆yit on a constant and all elements of the vectors
∆yt−1 , . . . , ∆yt−p+1 with OLS. Collect the i = 1, 2, . . . , n regressions in vector form
∆yt = π̂ 0 + Π̂1 ∆yt−1 + · · · + Π̂p−1 ∆yt−p+1 + ût .
(4.10)
We also estimate a second regression, we regress yt−1 on a constant and
∆yt−1 , . . . , ∆yt−p+1
yt−1 = θ̂ + χ̂1 ∆yt−1 + · · · + χ̂p−1 ∆yt−p+1 + v̂t .
(4.11)
The second step is to calculate the sample covariance matrices of the OLS
residuals ût and v̂t :
Σ̂vv =
Σ̂uu =
Σ̂uv
T
1X
v̂t v̂t′ ,
T t=1
T
1X
ût û′t ,
T t=1
T
1X
ût v̂t′ ,
=
T t=1
Σ̂vu = Σ̂uv ′ .
From these, find the eigenvalues of the matrix
−1
Σ̂−1
vv Σ̂vu Σ̂uu Σ̂uv ,
(4.12)
with the eigenvalues ordered λ̂1 > λ̂2 > · · · > λ̂n . The maximum value
attained by the log likelihood function subject to the constraint that there
are h cointegrating relations is given by
L∗0 = −
h
Tn
Tn T
T X
log(1 − λ̂i ).
log(2π) −
− log |Σ̂uu | −
2
2
2
2 i=1
57
(4.13)
The third step is to calculate the maximum likelihood estimates of the parameters. Let â1 , . . . , âh denote the (n × 1) eigenvectors of (4.12) associated
with the h largest eigenvalues. These provide a basis for the space of cointegrating relations. That is, the maximum likelihood estimate is that any
cointegrating vector can be written in the form
a = b1 â1 + b2 â2 + · · · + bh âh ,
for some choice of scalers b1 , . . . , bh . Johansen suggests normalizing these
vector âi such that â′i Σ̂vv âi = 1. Collect the first h normalized vectors in a
an (n × h) matrix Â:
£
¤
 = â1 â2 · · · âh .
Then the maximum likelihood estimate of β 0 is given by
β̂ 0 = Σ̂uv ÂÂ′ .
The maximum likelihood estimate of c is
ĉ = π̂0 − β̂ 0 .
Now we are ready for hypothesis testing. Under the null hypothesis that there
are exactly h cointegrating relations, the largest value that can be achieved
for the log likelihood function was given by (4.13). Consider the alternative
hypothesis that there are n cointegrating relations. This means that every
linear combination of yt is stationary, in which case yt−1 would appear in
(4.7) without constraints and no restrictions are imposed on β 0 . The value
for the log likelihood function in the absence of constraints is given by
L∗1
n
Tn
Tn T
T X
=−
log(2π) −
− log |Σ̂uu | −
log(1 − λ̂i ).
2
2
2
2 i=1
(4.14)
A likelihood ratio test of
H0 : h relations against H1 : n relations,
can be based on
2(L∗1
−
L∗0 )
= −T
n
X
i=h+1
58
log(1 − λ̂i ).
(4.15)
An other approach would be to test the null hypothesis of h cointegrating
relations against h + 1 cointegrating relations. A likelihood ratio test of
H0 : h relations against H1 : h + 1 relations,
can be based on
2(L∗1 − L∗0 ) = −T log(1 − λ̂h+1 ).
(4.16)
Like with the Dickey-Fuller test, we need to distinguish several cases. There
are also three cases for the Johansen method, but they are different than the
Dickey-Fuller cases:
Case 1 : The true value of the constant c in (4.7) is zero, meaning that
there is no intercept in any of the cointegrating relations and no deterministic time trend in any of the elements of yt . There is no constant term
included in the regressions (4.10) and (4.11).
Case 2 : The true value of the constant c in (4.7) is such that there are
no deterministic time trends in any of the elements of yt . There are no restrictions on the constant term in the estimation of the regressions (4.10)
and (4.11).
Case 3 : The true value of the constant c in (4.7) is such that one or more
elements of yt exhibit deterministic time trend. There are no restrictions on
the constant term in the estimation of the regressions (4.10) and (4.11).
For both tests, which can be based on (4.15) and (4.16), the critical values for the three different cases can be found in [10] and [11]. Unfortunately,
the critical values are for a sample size of T = 400. Although the data of
the ten pairs IMC provided consist of 520 observations, these critical values
will be used when testing the ten pairs for cointegration. I assume that the
critical values are not that different for a sample size of 520. For case 1 this
is very likely because Johansen showed that the asymptotic distribution of
test statistic (4.15) is the same as that of the trace of matrix
¸
¸−1 ·Z 1
¸′ ·Z 1
·Z 1
′
′
′
W(r)dW(r)
W(r)W(r) dr
W(r)dW(r)
Q=
0
0
0
where W(r) is g-dimensional standard Brownian motion, with g = n − h.
59
And fortunately, case 1 is the case we will use because we do not want
an intercept in the cointegrating relations, as was explained in the previous
section, and we assume there is no deterministic time trend in the price processes. The Johansen case 1 test can be compared with the Dickey-Fuller
case 1 test, there is no constant and we do not estimate one. There is not
really a Johansen case which can be compared to the Dickey-Fuller case 2,
with the Johansen case 2 test, the constant c is not necessarily equal to zero.
The critical values for case 1 and T = 400 for both test statistics (4.15) and
(4.16) are shown in tables 4.1 and 4.2 respectively.
Table 4.1: Critical values for test statistic (4.15).
Case 1
g 1%
5%
10%
1 6.51 3.84 2.86
2 16.31 12.53 10.47
3 29.75 24.31 21.63
4 45.58 39.89 36.58
5 66.52 59.46 55.44
Table 4.2: Critical values for test statistic (4.16).
Case 1
g 1%
5%
10%
1 6.51 3.84 2.86
2 15.69 11.44 9.52
3 22.99 17.89 15.59
4 28.82 23.80 21.58
5 35.17 30.04 27.62
Note that if g = 1, then n = h + 1. In this case the two tests are identical. For this reason the first rows of the tables are the same.
With two stocks in pair, we can do several hypothesis tests:
1)
2)
3)
H0 : 0 relations against H1 : 2 relations,
H0 : 0 relations against H1 : 1 relation,
H0 : 1 relation against H1 : 2 relations.
60
For the first test, we use the second row of table 4.1. We basically test the
null of no cointegration between the two stocks against the stocks themselves
being stationary. Although the alternative hypothesis does not imply ’real’
cointegration, because every linear combination of yt is stationary since yt is
already stationary, rejection of the null is taken as evidence of cointegration.
For the second test, we use the second row of table 4.2. We test the null
of no cointegration between the against the alternative of a single cointegration relation.
For the third test, we use the first row of either table. We test the null
of one cointegrating relation against the stock prices being stationary already. Basically we test if the relation is a ’real’ cointegrating relation.
If the third null hypothesis is rejected, the test indicates there are two cointegrating relations which means the stock prices themselves are stationary.
As we saw in section 4.2 we do not think that stock prices are stationary,
but if they are we can trade them as a pair like any other pair. We could
even trade each stock as a spread process. That means, we apply the trading
strategy on the price process instead of the spread process. But this would
not be cash and market neutral anymore, and is seen as far more risky. So
with two stocks in a pair, we would like there to be one or two cointegrating
relations, but we expect there is only one. In chapter 7 the results of the
different tests for the 10 pairs are given. They are compared with the results
from the Engle-Granger method.
In the previous section was stated that the Johansen method has an advantage compared to the Engle-Granger method when there are more than
two stocks in a pair, n > 2. With Johansen we do not impose the first element of the cointegrating relation to be unity, we normalize the estimated
cointegrating relation such that the first element is unity or as Johansen proposed, normalizing such that â′i Σ̂vv âi = 1. With three stocks in a pair, we
would like there to be one, two or three cointegrating relations but we expect
that there are no more than two. With our pair trading strategy it does not
matter how many relations there are as long as the stock are cointegrated,
because we only trade one relation. This relation will be the eigenvector
corresponding to the largest eigenvalue of the matrix in (4.12) because, according to Hamilton [6], this results in the most stationary spread process.
61
4.5
Alternative method
In this section a start is made with an alternative method. Assume, like the
Engle-Granger and Johansen method, price processes xt and yt are integrated
of order one: xt , yt ∼ I(1). Denote with zt the vector of the differences of
these price processes
¶
µ
xt − xt−1
.
zt =
yt − yt−1
Then each component of zt is I(0), i.e. stationary. Notice that
¶ X
µ
t
xt − x0
=
zi .
yt − y0
i=1
Two price processes are cointegrated if a linear combination of them is stationary, i.e. constant mean, constant variance and autocovariances that do
not depend on t. In this section we like to find out if zt can be represented
as an VAR(p) or as a vector MA(q) process.
Engle and Granger showed that a cointegrated system can never be represented by a finite-order vector autoregression in the differenced data ∆yt =
zt . The outline of the deduction is that if zt is causal, i.e. zt can be written
as a linear combination of past innovations, and (xt , yt ) are cointegrated then
zt is non-invertible. This implies that if (xt , yt ) are cointegrated zt cannot
be represented by a VAR(p).
If we assume zt to be an vector MA(q) process, we can find restrictions
on the parameters of the model to ensure a linear combination of (xt , yt )
such that it is stationary exists. Let us examine this for q = 2:
zt = Θ2 wt−2 + Θ1 wt−1 + Θ0 wt ,
where wt is i.i.d N2 (0, Σ) and Θ0 = I. Notice that a MA(q) process is always
stationary.
Then
t
X
i=1
zi = Θ2 w−1 + (Θ2 + Θ1 )w0 + (Θ2 + Θ1 + Θ0 )
+ (Θ1 + Θ0 )wt−1 + Θ0 wt .
62
t−2
X
i=1
wi
P
If v is a cointegrating vector, i.e. a vector such that v ti=1 zi is stationary,
than every multiple of v is also a cointegrating vector. We can make some
kind of normalization so we can write v = [−α 1]. For t > 2
P
(yt − αxt ) − (y0 − αx0 ) = [−α 1] ti=1 zi
= [−α 1]Θ2 w−1 + [−α 1](Θ2 + Θ1 )w0
+ [−α 1](Θ2 + Θ1 + Θ0 )
Pt−2
i=1
wi
(begin)
(4.17)
(middle)
+ [−α 1](Θ1 + Θ0 )wt−1 + [−α 1]Θ0 wt (end)
The mean of (4.17) is constant for every Θ1 , Θ2 and α. The variance, however, is not. The number of terms in (begin) and (end) are the same for every
t, so only the variance of the (middle) part of (4.17) is depending on t. To
resolve this, Θ2 , Θ1 and α have to satisfy:
[−α 1] (Θ2 + Θ1 + Θ0 ) = 0.
The matrix (Θ2 + Θ1 + Θ0 ) must have an eigenvalue zero with eigenvector
[−α 1]. Then (4.17) is a stationary process.
The same argument goes for q > 2. So if the difference process zt is assumed to be an MA(q), then for (xt , yt ) to be cointegrated the parameters
have to satisfy:
matrix (Θq + Θq−1 + · · · + Θ0 ) has eigenvalue 0.
(4.18)
The corresponding eigenvector is the cointegrating relation. Now we have
a method to generate cointegrated data that is unlikely to satisfy the assumptions of the Engle-Granger method as well as the Johansen method.
Engle-Granger assumes that yt − αxt is an AR(p) process and Johansen assumes that the vector (xt , yt ) is a VAR(p). In section 6.4 we will see if the
Engle-Granger method is robust enough to identify data generated in the
way here described as cointegrated.
63
It should be possible to construct a new method for testing for cointegration. With real data it is obvious we can determine the difference process zt .
It is, however, pretty difficult to estimate the parameters of the MA(q) with
only 500 observations, specially when q becomes large. But if we could, than
we could base a hypothesis test on the estimated eigenvalue closest to zero
of the estimated matrices. We do not proceed with this in this report.
64
Chapter 5
Dickey-Fuller tests
In the literature that describes Dickey-Fuller tests there a lot of differences
in the critical values. Some do not state clearly which true model is used,
so the null hypothesis is not clear. Sometimes it seems that different models
are used at the same time and sometimes there is the exact same model but
the critical values are just different. That is why this chapter discusses the
asymptotic distributions of the (Augmented) Dickey-Fuller test statistic to
find the critical values for this test. In other words, this section discusses the
asymptotic distributions for OLS estimated coefficients of unit root processes.
They differ from those for stationary processes. The asymptotic distributions
can be described in terms of functionals of Brownian motion. In the first
section some notions and facts from probability theory, used to establish
these distributions, are stated. In the next three sections the asymptotic
distribution of the estimated coefficients for a first-order autoregression when
the true process is a random walk are derived, i.e., the asymptotic distribution
of the DF test statistic for case 1 to case 3. These distributions turn out
to depend on whether a constant is included in the estimated regression. In
section 5.5 the power of the three different cases is investigated. In section 5.6
the properties of the estimated coefficients for a pth-order autoregression are
derived, i.e., distributions of the ADF test statistics. The book of Hamilton
[6] is used for the derivation of the asymptotic distributions, this book clearly
distinguishes the different models.
65
5.1
Notions/ facts from probability theory
First we need some definitions and theorems. For the following three definitions we assume that {XT } is a sequence of random variables, and X is a
random variable, and all of them are defined on the same probability space
(Ω, F, P).
Convergence almost surely:
The sequence of random variables {XT }∞
T =1 converges almost surely towards
random variable X if
P({ω ǫ Ω : lim XT (ω) = X(ω)}) = 1.
T →∞
Notation: XT → X a.s.
Convergence in probability:
The sequence of random variables {XT }∞
T =1 converges in probability towards
random variable X if
∀ε > 0 lim P({ω ǫ Ω : |XT (ω) − X(ω)| > ε}) = 0.
T →∞
P
Notation: XT → X.
Convergence in distribution:
The sequence of random variables {XT }∞
T =1 converges in distribution towards
random variable X if for all bounded continuous functions g it holds that
E g(Xt ) → E g(X).
D
Notation: XT → X.
Central limit theorem:
Let X1 , X2 , . . . be a sequence of i.i.d variables such that E X12 < ∞. Define
E X1 = µ and var(X1 ) = σ 2 . Then
√
D
T (X̄T − µ) → N (0, σ 2 ),
for T → ∞,
P
where X̄T = T1 Tt=1 Xt .
66
Law of large numbers:
Let X1 , X2 , . . . be a sequence of i.i.d. variables such that E|Xt | < ∞, then
T
1X
Xt → E X1 a.s. for T → ∞.
T t=1
Continuous mapping theorem(random vectors):
D
Let X1 , X2 , . . . be a sequence of random (n × 1) vectors with XT → X and
let g : Rn → Rm be a continuous function, then
D
g(XT ) → g(X).
A similar results hold for sequences of random functions:
Continuous mapping theorem(random functions):
D
Let {ST (·)}∞
T =1 and S(·) be random functions, such that ST (·) → S(·) and
let g be a continuous functional, then
D
g(ST (·)) → g(S(·)).
Definition Brownian motion:
Standard Brownian motion W (·) is a continuous-time stochastic process, associating each time point t ∈ [0, 1] with the scalar W (t) such that:
(i)
(ii)
W (0) = 0 ,
For any time points 0 ≤ t1 ≤ t2 ≤ . . . ≤ tk ≤ 1, the increments
[W (t2 ) − W (t1 )], [W (t3 ) − W (t2 )], . . . , [W (tk ) − W (tk−1 )] are
independent multivariate Gaussian with [W (s) − W (t)] ∼ N (0, s − t) ,
(iii) W (t) is continuous in t with probability 1.
Although W (t) is continuous in t, it cannot be differentiated using standard
calculus: the direction of change at t is likely to be completely different from
that at t + δ, no matter how small we make δ.
67
Now we like to derive something that is known as the functional central limit
theorem. Let ut be i.i.d variables with mean zero and finite variance σ 2 . Given
a sample size T , we can construct a variable XT (r) from the sample mean of
the first rth fraction of observations, r ∈ [0, 1], defined by
⌊T r⌋
1X
ut ,
XT (r) =
T t=1
where ⌊T r⌋ denotes the largest integer that is less than or equal to T times r.
For any given realization, XT (r) is a step function in r, with
0
for 0 ≤ r < 1/T,
u
/T
for 1/T ≤ r < 2/T,
1
(u1 + u2 )/T
for 2/T ≤ r < 3/T,
XT (r) =
..
.
(u + · · · + u )/T for r = 1.
1
T
Then
√
⌊T r⌋
1 X
T XT (r) = √
ut =
T t=1
By the central limit theorem
p
⌊T r⌋
X
⌊T r⌋ 1
√
p
ut .
T
⌊T r⌋ t=1
⌊T r⌋
³p
1 X
D
ut → N (0, σ 2 )
⌊T r⌋ t=1
√ ´
√
√
while
⌊T r⌋/ T → r. Hence the asymptotic distribution of T XT (r)
√
is that of r times a N (0, σ 2 ) random variable, or
√
D
T [XT (r)/σ] → N (0, r).
Consider the behavior of a sample mean based on observations ⌊T r1 ⌋ through
⌊T r2 ⌋ for r2 > r1 , than we can conclude that this too is asymptotically normal
√
D
T [XT (r2 ) − XT (r1 )] /σ → N (0, r2 − r1 ).
√
More generally, the sequence of stochastic functions { T XT (·)/σ}∞
T =1 has an
asymptotic probability law that is described by standard Brownian motion
W (·):
√
D
T XT (·)/σ → W (·) .
(5.1)
68
There is a difference between the expressions XT (·) and XT (r), the first
denotes a random function while the last denotes the value that function
assumes at time r, it is a random variable. Result (5.1) is known as the functional central limit theorem. The derivation here assumed that ut was i.i.d.
Proposition 1:
Suppose that zt follows a random walk without drift
zt = zt−1 + ut ,
where z0 = 0 and ut is i.i.d. with mean zero and finite variance σ 2 . Then
(i)
(ii)
(iii)
(iv)
P
T −1/2 Tt=1 ut
P
T −3/2 Tt=1 zt−1
P
2
T −2 Tt=1 zt−1
P
T −1 Tt=1 zt−1 ut
D
→
D
→
D
→
D
→
σW (1) ,
R1
σ 0 W (r)dr ,
R1
σ 2 0 W (r)2 dr ,
σ 2 (W (1)2 − 1) /2 .
Proof of proposition 1:
(i) follows from the central limit theorem. W (1) denotes a random variable with a N (0, 1) distribution, so σW (1) denotes a random variable with
a N (0, σ 2 ) distribution.
(ii): Note that XT (r) can be written as
0
for 0 ≤ r < 1/T ,
z
/T
for 1/T ≤ r < 2/T ,
1
z2 /T for 2/T ≤ r < 3/T ,
XT (r) =
..
.
z /T for r = 1 .
T
The area under this step function is the sum of T rectangles, each with width
1/T :
Z
1
0
XT (r) dr = z1 /T 2 + · · · + z T −1 /T 2 .
Multiplying both sides with
Z
1
√
√
T:
T XT (r) dr = T −3/2
0
T
X
t=1
69
zt−1 .
Statement (ii) follows by the functional central limit theorem and the continuous mapping theorem.
(iii): Define ST (r) as
ST (r) = T [XT (r)]2 .
This can be written as
0
2
z
1 /T
z22 /T
ST (r) =
..
.
z 2 /T
for 0 ≤ r < 1/T ,
for 1/T ≤ r < 2/T ,
for 2/T ≤ r < 3/T ,
for r = 1 .
T
It follows that
Z
1
0
ST (r) dr = z12 /T 2 + · · · + zT2 −1 /T 2 .
By the continuous mapping theorem:
i2
h√
D
T XT (r) → σ 2 [(W (·)]2 .
ST (r) =
Again applying this theorem:
Z
Z 1
D
2
ST (r) dr → σ
1
W (r)2 dr ,
0
0
which gives statement (iii).
(iv): Note that for a random walk
2
zt2 = (zt−1 + ut )2 = zt−1
+ 2zt−1 ut + u2t ,
summing over t = 1, 2, . . . , T results in
T
X
zt−1 ut =
1/2(zT2
t=1
−
z02 )
Recall that z0 = 0 and dividing by T gives
T −1
T
X
zt−1 ut =
t=1
=
− 1/2
T
X
u2t .
t=1
T
1 X 2
zT2
u
−
2T
2T t=1 t
T
ST (1)
1 X 2
u .
−
2
2T t=1 t
70
But ST (1) →D σ 2 W (1)2 and by the law of large numbers T −1
which proofs (iv).
PT
t=1
P
u2t → σ 2
Now we are ready to construct some asymptotic properties of OLS estimators
of AR(1) processes when there is an unit root.
5.2
Dickey-Fuller case 1 test
Consider a AR(1) process
zt = ρzt−1 + ut ,
for t = 1, . . . , T ,
(5.2)
with ρ ≥ 0 and where ut ∼ i.i.d with mean zero and finite variance σ 2 . The
OLS estimate of ρ is given by
PT
zt−1 zt
ρ̂ = Pt=1
.
T
2
t=1 zt−1
The t statistic S, used for testing the null hypothesis that ρ is equal to some
particular value ρ0 , is given by
S=
ρ̂ − ρ0
σ̂ρ̂
where σ̂ρ̂ is the standard error of the OLS estimate of ρ:
!1/2
Ã
T
X
2
σ̂ρ̂ = rT2 /
zt−1
t=1
with
T
rT2 =
1 X
(zt − ρ̂zt−1 )2 .
T − 1 t=1
When (5.2) is stationary, i.e. ρ < 1, S has an limiting Gaussian distribution:
D
S → N (0, 1).
But Dickey-Fuller tests the null hypothesis that ρ = 1, so we like to know
the limiting distribution of S when ρ = 1. Then we can write S as:
S=
ρ̂ − 1
=³
σ̂ρ̂
ρ̂ − 1
.
PT 2 ´1/2
2
rT / t=1 zt−1
71
(5.3)
The numerator of (5.3) can be written as:
PT
ρ̂ − 1 = Pt=1
T
zt−1 ut
2
t=1 zt−1
.
(5.4)
Substituting this in (5.3):
PT
t=1 zt−1 ut
´1/2
1/2
2
z
(rT2 )
t=1 t−1
S = ³
PT
P
T −1 Tt=1 zt−1 ut
.
= ³
PT 2 ´1/2 2 1/2
−2
(rT )
T
t=1 zt−1
Apart from the initial term z0 , which does not affect the asymptotic distributions (unfortunately it could affect the finite sample size distributions, we
will see this later on), the variable zt is the same as in proposition 1. So it
P
follows from proposition 1 (iii) and (iv) together with rT2 → σ 2 , that
as T → ∞:
D
S→³
1
(W (1)2 − 1)
σ 2 (W (1)2 − 1) /2
2
=
´1/2
´1/2 .
³R
R1
1
2
2
2
2
1/2
W (r) dr
σ 0 W (r) dr
(σ )
0
(5.5)
In conclusion, when the true model is a random walk without a constant
term (ρ = 1, c = 0) and we only estimate ρ and not a constant, basically a
regression without intercept, the t statistic S has limiting distribution (5.5).
This test statistic is referred to as the Dickey-Fuller case 1 test statistic.
Note that W (1) has a N (0, 1) distribution, meaning that W (1)2 has a χ2 (1)
distribution.
We can approximate this asymptotic distribution and the corresponding critical values by simulating a lot of paths W on the interval [0, 1]:
Divide the interval [0,1] in n equal pieces.
Take u1 , u2 , . . . , un i.i.d from a N (0, 1/n) distribution.
Set W(0)=0.
72
) + ui
Build path W by: W ( ni ) = W ( i−1
n
for i = 1, 2, . . . , n.
For each path the fraction in the right-hand side of (5.5) can be calculated,
approximating the integrals with Riemann sums. Then the density of S can
be estimated with applying a Gaussian kernel estimator on all these values.
Figure 5.1 shows the estimated density for 5,000 paths and n = 500.
0.4
0.2
0.0
.........
..... ..........
....
.....
...
....
...
...
.
...
...
...
..
....
.
.
...
.
.
...
...
.
...
..
...
.
...
..
.
...
..
...
.
...
..
.
...
..
.
...
..
.
...
..
...
.
..
...
.
...
..
.
...
..
.
...
..
...
.
...
..
.
....
.
.
....
.
....
...
.
....
.
.
.....
.
..
....
.
.
.....
...
.
.
.
.....
....
......
.
.
.
......
...
.
.
.
.
.......
...
.
.
.
.
.........
.
.
.....
.
...............
.
.
.
.
.
.
.
.
.
........................................................................
....................................................................
−4
−2
0
2
4
Figure 5.1: Asymptotic density of DF case 1 test statistic.
We can approximate the 1%, 5% and 10% critical values by calculating the
corresponding quantiles of all the calculated fractions. Table 5.1 shows the
critical values according to this simulation and the values according to Hamilton [6].
Table 5.1: Critical values for DF case 1.
1%
5%
10%
Hamilton -2.58 -1.95 -1.62
simulation -2.56 -1.95 -1.60
These critical values belong to the asymptotic distribution (5.5), which describes the distribution of the DF case 1 test statistic if the sample size T
goes to infinity.
73
We approximate the critical values for finite sample sizes T , by simulating in
a different way:
Take u1 , . . . , uT from a N (0, σ 2 ) distribution.
Set z0 = 0.
Build path zt : zt = zt−1 + ut ,
t = 1, . . . , T .
Calculate ρ̂.
Calculate σ̂ρ̂ .
Calculate test statistic: (ρ̂ − 1)/σ̂ρ̂ .
Repeat the preceding steps 5,000 times.
We can approximate the 1%, 5% and 10% critical values by calculating the
corresponding quantiles of the simulated test statistics. For finite T , the
critical values are exact only under the assumption of Gaussian innovations.
As T becomes large, these values also describe the asymptotic distribution
for non-Gaussian innovations. Table 5.2 shows the critical values according
to this simulation for different values of T and σ 2 . Table 5.3 shows the critical
values according Hamilton [6]. The critical values should be independent of
σ. Table 5.2 shows roughly the same values for different σ 2 , but as σ 2 becomes
large there is more dispersion. Figure 5.2 shows the estimated density of the
simulated test statistics for different values of σ 2 and T = 500, the graph of
figure 5.1 is also displayed.
Table 5.2:
σ2 = 1
T
1%
5%
100 -2.61 -1.98
250 -2.61 -1.96
500 -2.56 -1.95
Simulated critical values for
σ2 = 5
10%
1%
5%
10%
-1.63 -2.62 -1.98 -1.63
-1.62 -2.56 -1.94 -1.59
-1.61 -2.59 -1.95 -1.62
74
DF case 1.
σ 2 = 10
1%
5%
-2.55 -1.94
-2.54 -1.93
-2.59 -2.00
10%
-1.61
-1.59
-1.63
Table 5.3: Hamilton’s critical
T
1%
5%
100 -2.60 -1.95
250 -2.58 -1.95
500 -2.58 -1.95
0.4
0.2
0.0
values DF case 1.
10%
-1.61
-1.62
-1.62
.............
................ ....
...........................................
.........
....................
.
.........
................
........
.........
.........
...........
..........
.
...........
............
............
.
........
.........
.
.
..........
........
..........
.
........
.....
.
...........
....
.
.
.........
.........
.........
.
.......
........
.
........
........
......
.
.......
.......
.
.....
.......
.
........
......
.......
.
.......
........
.
......
......
.......
.
......
.........
.
.
..........
....
.
.
..............
....
.
.
...........
..
.
.
.................
.
......
..............
.
.
.
.
.............
.....
.
.
.
.
.............
.
.
.
..............
.........
.
.
.
.
..............
........
.
.
..............
.
.
.
.........
.
...........
.
.
.
.
.
.
.
...........
...........
.
.
.
.
.
.
.................
.
.
........
.
.
.
...........................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.........................................................................
.............................................................................................
−4
−2
0
2
4
Figure 5.2: Estimated density of DF case 1 for different σ 2 and T = 500.
The initial term z0 does not affect the asymptotic distribution. Unfortunately
it does affect the distribution when the sample size is finite. With DickeyFuller case 1, we basically fit a line that goes through the origin. If the initial
term is large the slope of this line, ρ̂, is closer to one then when the initial
term is small. The standard error for ρ̂, σ̂ρ̂ is a lot smaller for a large initial
value then for a small initial value. That is why the test statistic for a large
initial value is likely to be larger than the test statistic for a small initial
value. The estimated densities for initial values z0 = 0, 1, 10, 50, 100, 500 are
shown in figure 5.3. The solid lines correspond to z0 = 0, 1, 10 , the dashed
lines correspond to z0 = 50, 100, 500. We see a shift to the right as the initial
value increases. The density found with simulating Brownian motion is not
displayed, it lies among the three solid lines.
75
0.4
0.2
0.0
.........................
.......................................... ........ ...
................. .........
.........
..........
.................. .. . ..
.
......
.... ..... .......... ..... .........
........
........ . .......... ...... .. ..
.
.
.
.....
.. ....
......... . ......
.
.
.
.
.
.
.
.......... ... ..
..
......
.......... .. ......
.. .
..
..........
......
.... ...
.......... ....... ..
..... . ... .
.
........ . . ...
..... .... ....
.
.......
..
.
...... ....... ..
.
.
.
.
....... .. . ...
... ... . ..
....... . ..
.. .. ... ..
...... ..... ..
...
...... . . ...
... ... ....
........ ....
..... . .. ..
.
....... ..... ...
..... ... ....
......
.
.
...... . .. ..
......... ........ .
.
........ .. ...
........... .....
.
........ ... ..
...... . . ..
....... .....
.
........ .. ...
.......... .....
.
...... ......
... . .
...... ..... .
....... . .. .
...... . ....
.......... ....
..... .......
........... ..
..... ....... ..
............ ........ .
.
.
.
....... ........
.
.......... . .....
.
.
........ .........
.
.
.
.
.
.......... .. .......
........... ........
.
.
.
.
.
.......... ..........
.
.
................. ...
.
.
.......... .... ......
.
.
.
.
.
.............. .......
............ .............. ..
.
.
.
.
.
.
.
.
.
.
.
.
............................ .
...................... .......
.
.
.
.
.
.
.................................................. .
.
.
.
.
.
.
.
.
.
.
.
. .............................................................................. .......
........................................................................................................
−4
−2
0
2
4
Figure 5.3: Estimated density of DF case 1 for different z0 and T = 500.
5.3
Dickey-Fuller case 2 test
In this section we consider the AR(1) process with a constant:
zt = c + ρzt−1 + ut ,
for t = 1, . . . , T,
where ut ∼ i.i.d with mean zero and finite variance σ 2 . We are interested in
under the null hypothesis that c = 0
the properties of test statistic S = ρ̂−1
σ̂ρ̂
and ρ = 1. The OLS estimates are given by
¸−1 · P
· ¸ ·
¸
P
zt
ĉ
T
zt−1
P
P
P
=
2
ρ̂
zt−1
zt−1 zt
zt−1
here Σ denotes summation over t = 1, . . . , T .
The deviation of the estimates from the true values is
¸ ·
¸
¸−1 · P
·
P
T
z
u
ĉ
t−1
t
P 2
P
= P
.
zt−1
zt−1
ρ̂ − 1
zt−1 ut
(5.6)
The estimates ĉ and ρ̂ have different rates of convergence, a scaling matrix Y
is helpful in describing their limiting distributions. Note if
v = A−1 w ,
76
then
Yv = YA−1 w
= YA−1 YY−1 w
= (Y−1 AY−1 )−1 Y−1 w.
(5.7)
Here we use the scaling matrix
Y=
·
T 1/2 0
0 T
¸
.
With (5.7), equation (5.6) results in
·
T 1/2 ĉ
T (ρ̂ − 1)
¸
=
·
T
1P
−3/2
zt−1
¸−1 ·
¸
P
P
T −3/2P zt−1
T −1/2
ut
P
(5.8)
2
T −1 zt−1 ut
T −2 zt−1
From the proposition in paragraph 5.1 follows that the first term of the right
side of (5.8) converges to
R
·
¸
¸
·
P
1
σ R W (r)dr
1P
T −3/2P zt−1
D
R
−→
2
σ W (r)dr σ 2 W (r)2 dr
T −3/2 zt−1 T −2 zt−1
R
¸·
¸
·
¸·
1
W
(r)dr
1 0
1
0
R
R
(5.9)
=
0 σ
0 σ
W (r)dr
W (r)2 dr
where the integral sign denotes integration over r from 0 to 1.
The second term of the right side of (5.8) converges to
¸
·
·
¸
P
σW (1)
T −1/2
ut
D
P
−→
σ 2 (W (1)2 − 1)/2
T −1 zt−1 ut
¸
¸·
·
W (1)
1 0
= σ
(W (1)2 − 1)/2
0 σ
(5.10)
Substituting (5.9) and (5.10) into (5.8) establishes
¸
·
T 1/2 ĉ
D
−→
T (ρ̂ − 1)
R
·
¸ ·
¸−1 ·
¸
σ 0
1
W
(r)dr
W
(1)
R
R
(5.11)
0 1
W (r)dr
W (r)2 dr
(W (1)2 − 1)/2
77
The second element in the vector in (5.11) states that
R
1
(W (1)2 − 1) − W (1) W (r)dr
D
2
T (ρ̂ − 1) −→
.
£R
¤2
R
W (r)2 dr − W (r)dr
(5.12)
We like to know the properties of the t statistic S:
S=
ρ̂ − 1
,
σ̂ρ̂
where
rT2
£
¤
·
¸−1 · ¸
P
T
z
0
t−1
P
P 2
,
1
zt−1
zt−1
σ̂ρ̂2
=
rT2
1 X
(zt − ĉ − ρ̂zt−1 )2 .
=
T − 2 t=1
0 1
T
(5.13)
If we multiply both sides of (5.13) by T 2 , the result can be written as
T 2 σ̂ρ̂2
=
rT2
From (5.8) follows
£
0 1
¤
Y
·
¸−1 · ¸
P
0
T
z
t−1
P 2
P
Y
.
1
zt−1
zt−1
¸−1
P
T
z
D
t−1
P 2
Y −→
Y P
zt−1
zt−1
R
·
¸−1 ·
¸−1 ·
¸−1
1 0
1 0
R 1
R W (r)dr
0 σ
W (r)dr
W (r)2 dr
0 σ
·
P
From equation (5.14) and rT2 → σ 2 follows
T 2 σ̂ρ̂2
R
¸−1 · ¸
0
1
W (r)dr
R
0 1
−→
2
1
W (r)dr
W (r) dr
1
= R
£R
¤2 .
W (r)2 dr − W (r)dr
D
£
¤
·
R
78
(5.14)
Finally, the asymptotic distribution of test statistic S is
S
=
T (ρ̂ − 1)
[T 2 σ̂ρ̂2 ]1/2
R
(W (1)2 − 1) /2 − W (1) W (r)dr
−→ ³R
£R
¤2 ´1/2 .
2
W (r) dr − W (r)dr
D
(5.15)
In conclusion, when the true model is a random walk without a constant
term (ρ = 1, c = 0) but we do estimate a constant c and of course ρ, the
t test statistic S has the asymptotic distribution described by (5.15). This
test statistic is referred to as the Dickey-Fuller case 2 test statistic.
We can find this asymptotic distribution and the corresponding critical values by simulating a lot of paths W in the same way as in the preceding
paragraph. The results are shown in figure 5.4 and table 5.4. Figure 5.4
shows that the distribution of the DF case 2 statistic is shifted more to the
left than the DF case 1 statistic.
0.4
0.2
0.0
.........
.... .......
...
...
...
...
..
...
.
...
...
...
..
.
...
..
.
...
..
.
...
..
...
.
..
...
.
...
..
.
...
..
.
...
..
.
...
..
.
...
..
...
.
..
...
.
..
...
.
...
..
.
...
..
.
...
..
.
...
..
...
.
...
..
.
...
..
.
...
..
.
...
..
.
...
.
.
...
.
...
...
.
...
..
.
....
..
....
.
.
....
.
.
....
..
.
....
.
...
....
.
.
.
.....
...
.
.....
.
.
..
.
......
.
.
.
.
..........
....
.
.
.
.
.............
.
.
.
.
.......
.
..........................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...............................
..................................................
−6
−4
−2
0
2
Figure 5.4: Asymptotic density of DF case 2 test statistic.
79
Table 5.4: Critical values for DF case 1.
1%
5%
10%
Hamilton -3.43 -2.86 -2.57
simulation -3.43 -2.85 -2.59
These critical values belong to the asymptotic distribution (5.15), which describes the distribution of the DF case 2 test statistic if the sample size T
goes to infinity. We find the critical values for finite sample sizes T , by simulating paths zt as in the preceding paragraph. The only difference is the way
we calculate ρ̂. Again we simulate for different values of σ 2 . The results are
shown in table 5.5, table 5.6 shows the critical values for DF case 2 according
to Hamilton [6]. Figure 5.5 shows the estimated density of the simulated test
statistics for different values of σ 2 and T = 500.
Table 5.5:
σ2 = 1
T
1%
5%
100 -3.54 -2.85
250 -3.40 -2.84
500 -3.42 -2.86
Simulated critical values for
σ2 = 5
10%
1%
5%
10%
-2.54 -3.50 -2.88 -2.56
-2.53 -3.46 -2.87 -2.56
-2.58 -3.40 -2.86 -2.55
Table 5.6: Hamilton’s critical
T
1%
5%
100 -3.51 -2.89
250 -3.46 -2.88
500 -3.44 -2.87
80
DF case 2.
σ 2 = 10
1%
5%
-3.54 -2.89
-3.46 -2.86
-3.49 -2.87
values DF case 2.
10%
-2.58
-2.57
-2.57
10%
-2.58
-2.56
-2.58
0.4
0.2
0.0
............
................................
............... .....................
.........
.........
.......
.........
.
......
.......
.........
......
.
.......
.....
.....
.
....
...
.
.....
..
.
.....
.....
......
.
.....
......
.
.....
......
.
......
...
.
......
......
.
......
.....
.
......
....
.
.........
.....
.
.........
.....
.........
.
......
........
.
......
.....
.
.......
.....
.......
.
.....
......
.
.....
.....
.
.
......
.....
.
.
.........
........
........
.
.......
.
........
.
.
........
.............
.......
.
.............
.........
.
.............
....
.
.
..............
.
......
.
................
.
.
.
................
.......
.
.
.
.
...............
......
.
.
.
.
.................
.
.
.......
..................
.
.
.
.
.
.
........................
......
.
.
.
.
.
.
.
.
.
.
.....................................
.
.
.
.
.
.
.
.
....................................................................
................................................................................................................
−6
−4
−2
0
2
Figure 5.5: Estimated density of DF case 2 for different σ 2 and T = 500.
Like in the preceding section, the initial term z0 does not affect the asymptotic distribution. And fortunately, with case 2, it does not affect the distribution when the sample size is finite as well. With case 2 we estimate a
constant even though it is not present in the true model, we basically fit a
line which does not have to go through the origin. Then the slope of line,
ρ̂, is not closer to one if the initial value is large, such as with case 1. That
is why the test statistic for a large initial value is likely to be the same as
the test statistic for a small initial value. The estimated densities for initial
values z0 = 0, 1, 10, 50, 100, 500 are shown in figure 5.6.
0.4
0.2
0.0
..........
.............................
............ .........................
........
..............
...........
...........
.
...........
.........
............
.........
.
.............
..........
.
...........
.........
.
.............
.........
.
...........
.......
............
.
........
........
.
........
.........
.
..........
.......
.
........
.........
.
........
.........
.
.........
.........
.
..........
.............
............
.
........
.........
.
......
......
.
.
.
.......
.......
.........
.
...
.......
.
.
......
.....
.
.
.......
.....
.
.......
.
......
.......
.
.
.........
.......
.
.
..........
......
.
...........
.
........
..............
.
.
......
................
.
.
.
.............
........
.
.
.
.
.............
......
.
.
.
.
...........
.
.......
.......
.
.
.
.
.
..........
...........
.
.
............
.
.
.......
.............
.
.
.
.
.
.
.
.
.
.
.................
...........
.
.
.
.
.
.
.
.
..................... .
.
.
.
.
.
.
.
..........
...............................................................................................................
............................................................................................................. ..
−6
−4
−2
0
2
Figure 5.6: Estimated density of DF case 2 for different z0 and T = 500.
81
5.4
Dickey-Fuller case 3 test
In this section we consider again the AR(1) process with a constant
zt = c + ρzt−1 + ut ,
for t = 1, . . . , T,
(5.16)
where ut ∼i.i.d with mean zero and finite variance σ 2 . We are interested in
the properties of test statistic S = ρ̂−1
under the null hypothesis that ρ = 1
σ̂ρ̂
and c 6= 0.
The deviation of the estimates from the true values is
¸ ·
¸
¸−1 · P
·
P
T
z
u
ĉ − c
t−1
t
P 2
P
= P
,
zt−1
zt−1
ρ̂ − 1
zt−1 ut
(5.17)
here Σ denotes the summation over t = 1, . . . , T .
We examine the four different sum terms in the right side of (5.17) separately. First notice that (5.16) can be written as:
zt = z0 + c t + (u1 + u2 + . . . + ut ) = z0 + c t + vt ,
where
v t = u1 + . . . + ut ,
for t = 1, . . . , T,
with v0 = 0 .
Consider the behavior of the sum
T
X
zt−1 =
T
X
t=1
t=1
[z0 + c(t − 1) + vt−1 ] .
(5.18)
The first term in (5.18) is T z0 , so divided by T is a fixed value. The second
term is equal to
T
X
c(t − 1) = (T − 1)T c/2 .
t=1
In order to converge, this term has to be divided by T 2 :
T
1 X
c(t − 1) → c/2 .
T 2 t=1
82
The third term in (5.18) converges when divided by T 3/2 , according to proposition 1 ii ):
Z 1
T
X
D
−3/2
W (r)dr .
vt−1 → σ
T
0
t=1
The order in probability of the three terms in (5.18) is:
T
X
zt−1 =
T
X
T
X
z0 +
|t=1
{z }
t=1
c(t − 1) +
T
X
O(T 2 )
Op (T 3/2 )
|t=1 {z
O(T )
}
vt−1 .
|t=1{z }
The time trend c(t − 1) asymptotically dominates the other two components:
T
1 X
P
zt−1 → c/2 .
2
T t=1
In the same way, we have
T
X
2
zt−1
=
t=1
=
T
X
t=1
T
X
[z0 + c(t − 1) + vt−1 ]2
z02
+
{z }
|t=1
+
t=1
|
2
2
c (t − 1) +
|t=1
O(T )
T
X
T
X
{z
O(T 3 )
2z0 c(t − 1) +
{z
O(T 2 )
}
T
X
t=1
|
}
T
T
X
t=1
tvt−1 = T
−3/2
T
X
t=1
2
vt−1
|t=1{z }
Op (T 2 )
2z0 vt−1 +
{z
Op (T 3/2 )
where the order of the last term follows from
−5/2
T
X
}
D
(t/T )vt−1 → σ
T
X
t=1
|
Z
2c(t − 1)vt−1
{z
Op (T 5/2 )
}
1
rW (r)dr .
0
The time trend c2 (t−1)2 is the only term that does not vanish asymptotically
if we divide by T 3 :
T
1 X 2 P 2
z → c /3
T 3 t=1 t−1
83
From the central limit theorem follows that
And finally
T
X
zt−1 ut =
t=1
T
X
t=1
= z0
T
X
ut +
|
Op (T 1/2 )
T
T
X
t=1
T
X
t=1
| {z }
−3/2
t=1
ut is of order Op (T 1/2 ).
[z0 + c(t − 1) + vt−1 ] ut
t=1
from which
PT
P
zt−1 ut → T
c(t − 1)ut +
{z
Op (T 3/2 )
−3/2
T
X
t=1
}
T
X
t=1
|
vt−1 ut
{z
Op (T )
}
c(t − 1)ut .
This results in the deviation of the OLS estimates from their true values
satisfy
¸
¸−1 ·
¸ ·
·
Op (T 1/2 )
Op (T ) Op (T 2 )
ĉ − c
=
Op (T 2 ) Op (T 3 )
ρ̂ − 1
Op (T 3/2 )
In this case the scaling matrix is
Y=
·
T 1/2
0
0 T 3/2
¸
Using (5.7) we get
·
T 1/2 (ĉ − c)
T 3/2 (ρ̂ − 1)
¸
=
·
T
1
P
−2
zt−1
¸−1 ·
¸
P
P
T −2 P zt−1
T −1/2
u
t
P
(5.19)
2
T −3 zt−1
T −3/2 zt−1 ut
where the first term of the righthand side converges to
·
¸
¸
·
P
1 c/2
1
T −2 P zt−1
P
P
−→
= A.
2
c/2 c2 /3
T −2 zt−1 T −3 zt−1
The second term of (5.19) satisfies
·
¸
·
¸
P
P
−1/2
T −1/2
u
T
u
t
t
P
P
=
+ op (1)
T −3/2 zt−1 ut
T −3/2 c(t − 1)ut
84
(5.20)
Therefore
·
¸¶
¸
·
µ· ¸
P
T −1/2
0
u
1
c/2
D
t
2
P
,σ
−→ N
c/2 c2 /3
0
T −3/2 zt−1 ut
N (0, σ 2 A) .
=
(5.21)
It follows from (5.19)-(5.21) that
· 1/2
¸
T (ĉ − c)
D
−→ N (0, A−1 σ 2 AA−1 ) = N (0, σ 2 A−1 )
3/2
T (ρ̂ − 1)
We like to know the properties of the t statistic S:
S=
ρ̂ − 1
,
σ̂ρ̂
(5.22)
where
σ̂ρ̂2
=
rT2
with
£
0 1
¤
·
¸−1 · ¸
P
T
z
0
t−1
P
P 2
1
zt−1
zt−1
T
rT2 =
1 X
(zt − ĉ − ρ̂zt−1 )2 .
T − 2 t=1
Test statistic (5.22) can be written as:
S=
The denominator is:
Ã
T 3/2 σ̂ρ̂ =
=
rT2
Ã
rT2
£
£
0 T 3/2
0 1
¤
T 3/2 (ρ̂ − 1)
.
T 3/2 σ̂ρ̂
¤
Y
·
·
PT
zt−1
PT
zt−1
85
¸!1/2
¸−1 ·
P
0
P zt−1
2
zt−1
T 3/2
¸−1 · ¸!1/2
P
0
z
P t−1
.
Y
2
1
zt−1
(5.23)
We have already shown that
µ
¸−1
¸
·
·
¶−1
P
P
T
z
T
z
t−1
t−1
−1
−1
P 2
P 2
P
Y =
Y
Y
Y P
zt−1
zt−1
zt−1
zt−1
¸
·
P
−2
1
T
z
t−1
P
P 2
=
T −2 zt−1 T −3 zt−1
converges in probability towards A.
P
Because rT2 → σ 2 , the denominator converges towards
√
P
T 3/2 σ̂ρ̂ −→ σc/ 3 .
Thus, the test statistic S is asymptotically Gaussian. The regressor yt−1 is
asymptotically dominated by the time trend c(t − 1). In large samples, it is
as if the explanatory variable yt−1 were replaced by the time trend c(t − 1).
That is why the asymptotic properties of ĉ and ρ̂ are the same as those for
the deterministic time trend regression. Therefore, for finite T test statistic
S has a t distribution.
In conclusion, when the true model is a random walk with a constant term
(ρ = 1, c 6= 0) and we estimate both ρ and c then the t test statistic S has
an asymptotic distribution equal to the standard Gaussian distribution:
D
S −→ N (0, 1)
This test statistic is referred to as the Dickey-Fuller case 3 test statistic. The
critical values for T → ∞ are given in table 5.7.
Table 5.7: Critical values for DF case 3.
1%
5%
10%
N (0, 1) -2.33 -1.64 -1.28
For finite T the Dickey-Fuller case 3 test statistic is t distributed, but the
degrees of freedom are large so it is almost standard normal. We can also
find the critical values for finite T , by simulating paths zt as in the preceding
paragraphs. Again we simulate for different values of σ 2 . The results are
shown in table 5.8. Figure 5.7 shows the estimated density of the simulated
test statistics for different values of σ 2 while T = 500 and c = 2.5, the standard normal density is also displayed.
86
Table 5.8:
σ2 = 1
T
1%
5%
100 -2.36 -1.76
250 -2.37 -1.68
500 -2.38 -1.75
0.4
0.2
0.0
Simulated critical values for
σ2 = 5
10%
1%
5%
10%
-1.37 -2.51 -1.82 -1.46
-1.33 -2.41 -1.74 -1.35
-1.33 -2.41 -1.69 -1.35
DF case 3.
σ 2 = 10
1%
5%
-2.48 -1.84
-2.40 -1.79
-2.45 -1.76
10%
-1.49
-1.40
-1.38
.............................
.......................................
............ ....... ...........................
...........................
..............
.
.
... .
...........
...................
...........
...........
.............
............
..............
.
............
.......
.
.
.............
........
.
.
............
........
............
.
........
.
............
.
............
..........
.
...........
.........
.
.
...........
........
...........
.
.......
.........
.
.........
...........
.
...........
...............
.............
.
..............
..............
.
.........
...........
.
.
.
.
............
.............
.
.
.
...........
...........
.
............
.
.
.
.............
...................
.
.
.
.
...............
...............
.
.
...............
.
.
...............
...............
.
.
.
.
...............
...............
.
.
.
.
.
.
................
.
................
.
.
.
.....................
.
.
.
.
.
.
......................
.........................
.
.
.
.
.
.
.
.
.
.............................
......................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......................................................................................................
.
.
.
.
.
.
.
.
.
............................................................................. ...
−4
−2
0
2
4
Figure 5.7: Estimated density of DF case 3 for different σ 2 and T = 500.
To see if the value of the constant c has an impact on finite sample distribution of the Dickey-Fuller case 3 statistic, we simulate paths zt for c =
0.1, 0.5, 1, 2.5, 10. The results are shown in figure 5.8, the standard normal
density is also displayed. The graph most left corresponds with c = 0.1.
87
0.4
.......................
.......................................
.......
........ .....................................................................................
. .
.....
................
........................
....
...............
.
.
.
.
.
.
. . ....
...................
............ .....
...
....................
..
................. .....
.
.
.................
....
..
...............
... ...........
.
.
...
............
..
... ..........
.
.
.
...
... ..........
.........
..
.
.
.
...
.
..........
... ......
.
.
.
.
.
...
... .......
...............
...
.
.
...
... .......
...........
..
.
.
... ........
.
...
.
..........
.
.
... ......
.
.
.
...
... ........
...
..............
.
.
.
...
. ..........
... .........
.
..
.
.
...
... .........
..
..................
.
.... .......
.
.
...
..
.. ........
... ........
.
.
.
...
. ..........
.... .......
.
..
.
.
...
............
.
.. .........
.
.
.............
.
.
.
.
...
.............
............
..
.
.
.
.
.
.
....
................
.............
...
.
.
.
.
.
.
..............
.
.
....
...............
...
..............
.
.
.
.
.
.
.
.
.....
.................
.................
..
.
.
.
.
.
.
.
.
.
.
.
.
...............
.....
........................
...
.
.
.
.
................
.
.
.
.
.
.....
........................
...
.................
.
.
.
.
.
.
.
.
.
.
.
.
......
..................
...........................
.
....
.
.
.
.
.
.
.
.
................
.
.
.
.
.
.
.
.
.
.
...........
........................
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........................................................................................
.
.
.
.
.
.
.
.. ...
..
........ ......................................................................................................
..................................................................................................................................................................
0.2
0.0
−4
−2
0
2
4
Figure 5.8: Estimated density of DF case 3 for different c and T = 500.
It looks like a small value of c causes a shift to the left. In figure 5.9 the
estimated distribution for c = 0.01, 0.05, 0.1 is plotted with solid lines. The
dashed line is the distribution of case 1 and the dotted line is the distribution of case 2. For c = 0.01 the estimated density is almost the same as the
density found for case 2. This makes sense because the steps taken in the
case 2 en case 3 tests are exactly the same, the only difference is the true
model for case 2 has no constant and for case 3 it has. So for a decreasing
constant the case 3 test statistic converges to the case 2 statistic. The other
way around is also valid: for an increasing constant, in absolute value, the
case 2 test statistic converges to the case 3 statistic because the tests are the
same but now there is a constant in the true model.
0.4
0.2
0.0
.............
..... ............
....
....
..
.....
.
.
...
..
.......................
......
.
.
.
.
.
......... ...... .......
.... ......
.
..........
....
.... ....
. ..............
.
.... ...
.............. ........... ..
.
.
.
.
..... ...
.... ...
... .. ..... .....
.
.
.
.
..... ..
.... ...
... .. ... ....
.
..... .
.
.
... ..
... ..
... ...
... .
... ....
... .
..... ...
.. ...
.... ...
... ..
..
...... ....
.
.
... ...
.
.
.
..... ....
..
.
..... ...
...
.
.
.
.
..... ...
.
... .....
.
..... ...
.
.
.
.
.... ..
... .
.
.
..... ...
.
.
.
.
.
... ...
.
..... ...
.
..
..... ...
.
... ...
.
.
.
.... ...
.
..... ...
...
.
.
.
.
.
.
..... ..
... .....
.
..
..... ...
.
.
.
.
.
.. ... ...
...
.
..... ...
.
.
.
.
.
.. ..
...
...
...
.
..
..... ...
.
.
.
..
.
...
.
.
.. ...
...
.
..... ...
.
...
.
.
...
.. ..
....
..
.
.... ...
...
.
.
.
..
.
.
.
.
.
.
.
...
.
. ...
....
.... ...
..
.
.
.
.
.
.
.....
. ...
....
....
... ..
.
....
... ..
..
....
....
...
..... .... .......
....
.
.
.
.
.
.
.
.
.....
.
.
.
.
.
.
....
....
.
. ..
.........
.
..
....
.....
..... .... .....
........
...
....
.
.
.....
............... .......
.
.
.
.
.
.
.
.....
.......
.....
..
..
.......... ......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....
.
..
......... .....
.......... .......
.
.
.
.
.
.
.
.
.
... ...... ......
...... ..... ..
........... ........
....
.
.
.
.
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.............. ......... ..........
......... ..........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................... .............. .......................... ....... ...
.....
..
............................................................................................................. ....... ....... ....
................................................................................. ....... .......
−4
−2
0
2
4
Figure 5.9: Estimated density of DF case 3 for small c and T = 500.
88
To be consistent, we also simulate for several different initial values z0 . Figure 5.10 shows the results for z0 = 0, 1, 10, 50, 100, 500 while T = 500 and
c = 2.5. The figure suggests that the initial value z0 does not affect the
density of the test statistic for finite sample sizes.
0.4
0.2
0.0
.........
...............................
............................................................
..................
..........
........
....................
.
.
.
.
.......
.. .
........
..........
......
............
.
.
.
.....
.......
.
.
......
.
.
.
.
......
..............
.
......
........
....
.
.
.....
..........
.
....
......
.
.....
.
.
.........
....
.
.
.......
.......
.
.
.......
......
.
.
.
.
..........
.........
........
.
.......
........
.
.........
......
.
.
.........
......
.
.
............
.
.
.........
.
.
............
.
.
............
.........
.
.
.
.
..........
...........
.
.
.
.
...........
.
............
.
.
...........
.
.
.
.
.
.
.
...........
.................
.
.
.
.
.
.
.
.............
..............
.
.
.
.
.
.
.
..................
.
.
.
.................
.
................
.
.
.
.
.
.
.
.
.
...................
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................................
.
.
..
. ..............................................................................................
......................................................................................................
−4
−2
0
2
4
Figure 5.10: Estimated density of DF case 3 for different z0
There also exist a case 4 for the Dickey-Fuller test, this includes a deterministic time trend in the true model. We are not interested in spread processes
with deterministic trends so we do not discuss this case.
5.5
Power of the Dickey-Fuller tests
In this section we investigate how ’powerful’ the different cases of the DickeyFuller test are. We generate paths zt which do not have a unit root, ρ < 1
in
zt = c + ρzt−1 + ut
and see if the different tests see it as stationary. In other words whether
the outcome of the test is to reject the null hypothesis that there is a unit
root, ρ = 1. For different values of c and z0 we generate 1,000 paths zt ,
t = 0, . . . , T where T = 500 and count the number of rejections. The results
are presented in tables.
89
First we summarize the previous sections.
Case 1: The true model of case 1 is zt = zt−1 + ut where ut ∼ i.i.d with
mean zero and finite variance σ 2 . We estimate the model zt = ρzt−1 + ut .
The critical values for test statistic S = ρ̂−1
when T = 500 are
σ̂ρ̂
1%
5%
10%
-2.58 -1.95 -1.62
Case 2: The true model of case 2 is zt = zt−1 + ut where ut ∼ i.i.d with mean
zero and finite variance σ 2 . We estimate the model zt = c + ρzt−1 + ut . The
when T = 500 are
critical values for test statistic S = ρ̂−1
σ̂ρ̂
1%
5%
10%
-3.44 -2.87 -2.57
Case 3: The true model of case 3 is zt = c + zt−1 + ut where ut ∼ i.i.d
with mean zero and finite variance σ 2 and c 6= 0. We estimate the model
zt = c+ρzt−1 +ut . The critical values for test statistic S = ρ̂−1
when T = 500
σ̂ρ̂
are
1%
5%
10%
-2.33 -1.64 -1.28
We have seen that the initial value z0 does affect the finite sample distribution of Dickey-Fuller case 1 but does not affect case 2 and case 3. The value
of c does affect the distribution of case 3: as c becomes smaller the distribution converges to the distribution of case 2. IMC has provided 10 pairs, the
range of ĉ of these 10 pairs is (−0.01, 0.1). The absolute initial value z0 of
the 10 pairs is less than 1.5 for 9 of the 10 pairs. With one pair z0 is 106. So
we are interested in the power of the three tests for small values of c and z0 ,
but we will also look at large values of z0 .
90
We start with generating paths with c = 0 and z0 = 0. In all following
tables T = 500, σ = 1 and the number of generated paths is 1,000. Table
5.9 shows the number of rejections for the different tests and different values
of ρ. For ρ = 1 we have simulated paths under the null hypothesis of case
1 and case 2, the number of rejections are in line with what we expected.
The case 3 test does not perform very well, with ρ = 1 it rejects the null
hypothesis that ρ = 1 632 out of 1,000 times on the 10% level. For ρ just
under 1, the case 1 test performs better than the case 2 test.
Table 5.9:
Case 1
ρ
1%
5%
0.9 1000 1000
0.95 986 1000
0.975 479 910
0.99
73
310
0.995 41
144
1
10
48
1.01
0
0
Number of rejections, c = 0, z0 = 0.
Case 2
Case 3
10% 1%
5% 10% 1%
5%
1000 1000 1000 1000 1000 1000
1000 746 966 997 1000 1000
982 135 427 663 835 991
528
22
116 205 338 771
274
21
77
148 231 617
96
12
54
105 171 456
0
0
0
0
1
4
10%
1000
1000
1000
918
783
632
12
Table 5.10 shows the number of rejections for generated paths with c = 0 and
z0 = 100. For ρ = 1 we simulated under the null hypothesis of case 1 and
2, the number of rejections for case 1 is small. This was expected because
of figure 5.3. Again, the case 3 test does not perform very well when ρ = 1.
The case 1 and 2 tests do perform well, with ρ slightly less than 1 they reject
the null of an unit root.
91
Table 5.10: Number of rejections, c = 0, z0 = 100.
Case 1
Case 2
Case 3
ρ
1%
5% 10% 1%
5% 10% 1%
5%
0.99 1000 1000 1000 1000 1000 1000 1000 1000
0.995 1000 1000 1000 406 683 805 874 974
1
8
26
47
10
52
106 157 461
1.01
0
0
0
0
0
0
0
0
10%
1000
995
631
0
Table 5.11 shows the number of rejections for c = 0.1 and z0 = 0. For ρ = 1
we have simulated under the null hypothesis of case 3 but the case 3 test still
rejects to many times. Because the null is fulfilled we expect the number
of rejections to be around 10, 50 and 100 for the 1%, 5% and 10% levels
respectively. In figure 5.8 we already saw that the case 3 test is dependent
of the value of c, when c = 0.1 the distribution of the case 3 test statistic
is shifted to the left compared to its asymptotic distribution. We see that
with this setting the case 2 test performs more or less the same as with c = 0
and z0 = 0 except when ρ = 1, in which case it rejects less. The null is not
satisfied for the case 2 test, so this is not a bad outcome. The less rejections
for ρ = 1 the better. The case 1 test performs less compared to case 2 test
as well as the setting c = 0 and z0 = 0.
Table 5.11:
Case 1
ρ
1%
5%
0.9 1000 1000
0.95 758 968
0.975 111 434
0.99
6
37
0.995
0
7
1
0
2
1.01
0
0
Number of rejections, c = 0.1, z0 = 0.
Case 2
Case 3
10% 1%
5% 10% 1%
5%
1000 1000 1000 1000 1000 1000
997 848 995 1000 999 1000
675 157 476 729 831 996
87
39
142 262 376 765
20
18
90
158 257 599
3
5
18
37
73
217
1
0
0
0
2
8
92
10%
1000
1000
1000
917
768
329
9
Table 5.12 shows the number of rejections for c = 0.1 and z0 = 100. It
is remarkable how good the case 1 test performs, it rejects almost every time
when ρ is slightly below 1 and does not reject when ρ ≥ 1 even though
the null hypothesis is not satisfied. In section 5.2 was explained that this
test basically fit a line through the origin and because the scatterplot starts
around (100,100) it estimate ρ very accurately which makes the standard
error relatively small. With this setting, we know there is an intercept of 0.1
but this is so small compared to the starting point of 100 that the test does
not overestimate ρ too much. So when we generate path for ρ < 1, ρ̂ − 1 is
negative and divided by the small standard error the test statistic is a large
negative value, so the null is rejected. When generating paths with ρ ≥ 1,
ρ̂ is always slightly above 1, so the test statistic is a large positive value, so
the null is not rejected.
Table 5.12: Number of rejections,
Case 1
Case 2
ρ
1%
5% 10% 1%
5%
0.975 1000 1000 1000 1000 1000
0.99 1000 1000 1000 996 999
0.995 997 1000 1000 189 451
1
0
0
0
8
21
1.01
0
0
0
0
0
c = 0.1, z0 = 100.
Case 3
10% 1%
5%
1000 1000 1000
1000 1000 1000
586 689 913
45
73
234
0
0
0
10%
1000
1000
961
345
0
For illustration purposes, table 5.13 shows the the number of rejections for
c = 1 and z0 = 0. The value of c is now much larger than the values of ĉ
for the 10 pairs. We see that case 1 test lost all its power, the case 2 test
performs well and the case 3 test is finally performing as it should when ρ = 1
and is very powerful.
93
Table 5.13: Number of rejections, c = 1, z0 = 0.
Case 1
Case 2
Case 3
ρ
1% 5% 10% 1%
5% 10% 1%
5%
0.9
0
0
0
1000 1000 1000 1000 1000
0.95
0
0
0
1000 1000 1000 1000 1000
0.975 0
0
0
998 1000 1000 1000 1000
0.99
0
0
0
1000 1000 1000 1000 1000
0.995 0
0
0
997 1000 1000 1000 1000
1
0
0
0
1
7
12
13
59
1.01
0
0
0
0
0
0
0
0
10%
1000
1000
1000
1000
1000
119
0
This section clearly indicates that the Dickey-Fuller case 3 test is not the one
we should use when testing pairs for cointegration. Unfortunately it does not
clearly distinguish case 1 and case 2. Case 1 performs better for c = 0, z0 = 0
and c = 0, z0 = 100 and c = 0.1, z0 = 100, but case 2 performs better for
c = 0.1, z0 = 0 which is most seen in the 10 pairs. In the remainder of this
report we will focus on the case 2 test because of Hamilton’s view given in
section 4.3 and because this section does not clearly indicate to do otherwise.
Another possible reason to use case 2 instead of case 1 could be that that
the first step of the Engle-Granger method, which is a linear regression to
estimate α, influences the power of the two tests. This will be considered in
chapter 6.
5.6
Augmented Dickey-Fuller test
So far we discussed the properties of the estimated coefficients for a firstorder autoregression when there is a unit root. In this section we discuss the
distribution of the estimated coefficients for a p-th order autoregression.
Recall that the Augmented Dickey-Fuller test tests
H0 : zt ∼ I(1) against H1 : zt ∼ I(0) ,
(5.24)
when zt is assumed to follow an AR(p) model
zt = c + φ1 zt−1 + · · · + φp zt−p + ut ,
94
(5.25)
where ut ∼ i.i.d(0, σ 2 ).
This model can be written as
zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut ,
(5.26)
with
ρ = φ1 + φ2 + · · · + φp ,
βi = −(φi+1 + · · · + φp ) ,
for i = 1, . . . , p − 1 .
The null hypothesis is that the autoregressive polynomial
1 − φ1 x − φ2 x2 − · · · − φp xp = 0 ,
has exactly one unit root and all other roots lie outside the unit circle. The
single unit root gives us:
1 − φ1 − φ2 − · · · − φp = 0
i.e., ρ = 1. This implies
1 − φ1 x − · · · − φp xp = (1 − β1 x − · · · − βp−1 xp−1 )(1 − x) .
(5.27)
Of the p values of x that make the left side of (5.27) zero, one is x = 1 and
all other roots are assumed to be outside the unit circle. The same must be
true for the right side as well, meaning all roots of
1 − β1 x − · · · − βp−1 xp−1 = 0 .
lie outside the unit circle. So, (5.24) is equivalent to
H0 : ρ = 1 against H1 : ρ < 1 .
We are interested in the properties of test statistic S =
ρ̂−1
σ̂ρ̂
in the three cases:
Case 1: The true process of zt is (5.26) with c = 0 and ρ = 1,
the model estimated is (5.26) except for c.
Case 2: The true process of zt is (5.26) with c = 0 and ρ = 1,
the model estimated is (5.26).
Case 3: The true process of zt is (5.26) with c 6= 0 and ρ = 1,
the model estimated is (5.26).
95
We can derive the asymptotic properties in a similar manner as in the preceding sections. To keep this section from being to tedious, we only derive
the properties for case 2. We state the outcomes for case 1 and case 3 at the
end of this section, the derivations can be found in Hamilton [6].
Before deriving the properties for Augmented Dickey-Fuller case 2, we first
state a proposition.
Proposition
P∞
P 2:
Let vt = ∞
j=0 j · | θj | < ∞ and {ut } is an i.i.d sequence
j=0 θj ut−j , where
2
with mean zero, variance σ , and finite fourth moment. Define
γj = E(vt vt−j ) = σ
2
∞
X
θs θs+j ,
for j = 0, 1, . . . ,
(5.28)
s=0
λ = σ
∞
X
(5.29)
θj ,
j=0
zt = v1 + v2 + · · · + vt ,
for t = 1, 2, . . . , T ,
(5.30)
with z0 = 0. Then
(i)
T −1
(ii) T
−1
PT
t=1
PT
t=1 zt−1 vt−j
PT
(iii)
T −3/2
(iv)
T −2
(v)
T −1/2
(vi)
vt vt−j
T −1
t=1 zt−1
PT
2
t=1 zt−1
PT
t=1
PT
vt
t=1 zt−1 ut
P
→
γj
for j = 0, 1, . . .
1
2
1
2
(λ2 W (1)2 − γ0 )
(λ2 W (1)2 − γ0 )
→
+γ0 + · · · + γj−1
D
D
→
D
→
D
→
D
→
λ
λ2
R1
0
R1
0
W (r) dr .
W (r)2 dr .
λ W (1) .
1
σλ(W (1)2
2
− 1) .
The proof of this proposition can also found in [6].
96
for j = 0
for j = 1, 2, . . .
Asymptotic distribution ADF case 2
We assume that the sample is of size T + p, (z−p+1 , z−p+2 , . . . , zT ) and the
model is
zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut
= x′t β + ut ,
where β = (β1 , β2 , . . . , βp−1 , c, ρ) and xt = (∆zt−1 , ∆zt−2 , . . . , ∆zt−p+1 , 1, zt−1 ).
Under the null hypothesis of exactly one unit root and the assumption that
zt follows above AR(p) model with c = 0 and ρ = 1, we show that zt behaves
like the variable zt in proposition 2. Because zt is integrated of order one and
vt = ∆zt ,
vt is stationary and follows an AR(p − 1) model:
∆zt = β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut ,
⇔ vt = β1 vt−1 + · · · + βp−1 vt−p+1 + ut .
The autoregressive polynomial of vt is
Φ(x) = 1 − β1 x − · · · − βp−1 xp−1 ,
and all roots of Φ(x) = 0 are outside the unit circle because vt is stationary
and we assume it is causal, like all other autoregressive models in this report.
Then vt has a MA(∞) representation
vt =
∞
X
θj ut−j
j=0
which polynomial is
Θ(x) = 1 + θ1 x + θ2 x2 + · · ·
and because Φ(x) and Θ(x) are polynomials, we have
Θ(x) =
1
.
Φ(x)
97
All p − 1 roots, which is a finite number of roots, of Φ(x) are outside the unit
circle, so there exists an ε > 0 such that the modus of all roots are larger
than 1 + ε, so Φ(x) 6= 0 for |x| < 1 + ε. Within the radius of convergence
1 + ε, the analytic function Θ(x) is differentiable:
′
Θ (x) =
∞
X
jθj xj−1 .
j=1
And because it is absolutely convergent within its radius of convergence,
particularly in point 1, we have
∞
X
j=1
j · |θj | < ∞ .
This shows we can use proposition 2 without making any further assumptions.
The deviation of the OLS estimate β̂ from the true value β is given by
β̂ − β =
"
T
X
t=1
xt x′t
#−1 "
T
X
t=1
#
xt u t .
(5.31)
With vt = zt − zt−1 , the terms in (5.31) are
P
P
P
P 2
vt−1
· · · P vt−1 vt−p+1 P vt−1
P
P vt−1 zt−1
v
v
·
·
·
v
v
v
vt−2 zt−1
t−2 t−1
t−2 t−p+1
t−2
T
.
.
.
..
X
..
..
..
···
.
xt x′t = P
P
P 2
P
vt−p+1
vP
vP
t−p+1 vt−1 · · ·
t−p+1 zt−1
t=1
P vt−p+1
· · · P vt−p+1
P zt−1
PT
P vt−1
2
zt−1
zt−1
zt−1 vt−1 · · ·
zt−1 vt−p+1
P
vt−1 ut
..
T
X
P .
xt ut =
.
v
u
t−p+1
t
P
t=1
u
t
P
zt−1 ut
98
,
Like in the derivation of DF case 2 we need a scaling matrix, in this section
we use the following (p + 1) × (p + 1) scaling matrix:
√
T √0 · · · 0 0
0
T ··· 0 0
Y= .
√
..
..
. ···
T 0
0
0 ··· 0 T
With multiplying (5.31) by the scaling matrix Y and using (5.7) we get
" T
#)
)−1 (
#
" T
(
X
X
(5.32)
xt ut
Y−1
xt x′t Y−1
Y(β̂ − β) = Y−1
t=1
t=1
P
−1
ConsiderP
the matrix Y
xt x′t Y−1 . Elements in the upper left (p × p)
block of
xt x′t are divided by T , the first p elements of the (p + 1)th row
or (p + 1)th column are divided by T 3/2 and the element at the lower right
corner is divided by T 2 . Moreover,
P
P
γ|i−j|
from proposition 2(i) ,
T −1 vt−i vi−j →
P
P
→ E(vt−j ) = 0
from the law of large numbers ,
T −1 vt−j
P
P
−3/2
0
from proposition 2(ii ) ,
T
zt−1 vt−j →
R
P
D
→ λ W (r) dr
from proposition 2(iii ) ,
T −3/2 zt−1
R
P 2
D
2
2
−2
→ λ W (r) dr from proposition 2(iv ) ,
T
zt−1
where
γj = E(∆zt ∆zt−j ) ,
λ = σ/(1 − β1 − · · · − βp−1 ) ,
σ 2 = E(u2t ) ,
and the integral sign denotes integration over r from 0 to 1. Thus,
γ0 · · · γp−2
0
0
..
#
" T
..
..
..
.
.
.
.
···
X
D
xt x′t Y−1 −→ γp−2 · · · γ0
Y−1
0
0
R
t=1
0 ··· 0
λ R W (r) dr
R 1
0 · · · 0 λ W (r) dr λ2 W (r)2 dr
·
¸
V 0
=
,
0 Q
99
with
V =
Q =
·
γ0
γ1
..
.
γ1
γ0
..
.
· · · γp−2
· · · γp−3
..
···
.
γp−2 γp−3 · · ·
,
γ0
R
¸
λ R W (r) dr
R 1
.
λ W (r) dr λ2 W (r)2 dr
Next, consider the second term in the right side of(5.32)
P
T −1/2 vt−1 ut
#
" T
..
.
X
−1/2 P
−1
xt u t = T
Y
.
v
u
t−p+1
t
P
−1/2
t=1
T P ut
−1
T
zt−1 ut
(5.33)
(5.34)
The first p − 1 elements of this √
vector satisfy the central limit theorem. This
is because these elements are T times the sample mean of a martingale
difference sequence whose covariance matrix is σ 2 V, but this is not discussed
further. The result is
P
T −1/2 vt−1 ut
D
..
2
→ h1 ∼ N (0, σ V) .
.
P
T −1/2 vt−p+1 ut
The distribution of the last two elements in (5.34) can be obtained from
statements (v) en (vi) of proposition 2:
·
¸
¸
·
P
σW
(1)
T −1/2
u
D
t
P
(5.35)
→ h2 ∼ 1
.
σλ(W (1)2 − 1)
T −1 zt−1 ut
2
This gives that the deviation of the OLS estimate from its true value is
¸ · −1 ¸
¸−1 ·
·
V h1
h1
V 0
D
.
=
Y(β̂ − β) →
(5.36)
Q−1 h2
h2
0 Q
100
The last two elements of β are c and ρ, which are the constant term and
the coefficient on the I(1) regressor, zt−1 . From (5.33),(5.35) and (5.36),
their limiting distribution is given by
¸
· 1/2
¸·
ĉ
T
0
D
→
ρ̂ − 1
0 T
R
¸−1 ·
¸·
·
¸
W (1)
σ 0
R 1
R W (r)2dr
(5.37)
1
W (r) dr
W (r) dr
0 σ/λ
(W (1)2 − 1)
2
The t test statistic S of the null hypothesis that ρ = 1 is
S=
ρ̂ − 1
ρ̂ − 1
,
=
P
1/2
σ̂ρ̂
{rT2 e( xt x′t )−1 e}
where e denotes a p + 1 vector with unity in the last postition and zeros
elsewhere. Multiplying the numerator and the denominator by T results in
S=
But
eY
³X
xt x′t
´−1
By (5.37) we have
D
T (ρ̂ − 1)
.
P
1/2
{rT2 eY( xt x′t )−1 Ye}
n ³X
´ o−1
Ye = e Y
xt x′t Y
e
· −1
¸
V
0
D
(5.38)
→ e′
e
0 Q−1
1
nR
=
¡R
¢2 o .
2
2
λ
W (r) dr −
W (r) dr
T (ρ̂ − 1) → (σ/λ)
R
− 1) − W (1) W (r) dr
¡R
¢2 .
R
W (r)2 dr −
W (r) dr
1
(W (1)2
2
(5.39)
P
Using (5.38) and (5.39) together with rT2 → σ 2 , we finally get
R
1
2
D 2 (W (1) − 1) − W (1) W (r) dr
S → ³R
£R
¤2 ´1/2 ,
2
W (r) dr − W (r) dr
101
(5.40)
which is exactly the same as the asymptotic distribution of the Dickey-Fuller
case 2 test statistic. So the critical values are the same as in table 5.4 in
section 5.3 without making any corrections for the fact that lagged values
of ∆zt are included in the regression. This is also true for the other cases,
Augmented Dickey-Fuller case 1 test statistic has the same asymptotic distribution as Dickey-Fuller case 1 and ADF case 3 the same as DF case 3.
Like in the preceding sections we can simulate the density of the test statistic
for finite sample sizes, we show the results for the case 2 test when p = 2.
We simulate for different values of σ when T = 500 and β1 = −0.1, and
naturally ρ = 1, c = 0. We took this value for β1 because this value is seen
a few times in the 10 pairs IMC provided. The estimated densities of 5,000
simulated test statistics for σ 2 = 1, 5, 10 are shown in figure 5.11. Also the
asymptotic density we found for the case 2 test, figure 5.4, is plotted with
a dashed line. The different graphs coincide nicely. With this setting, the
’original’ AR model with lagged terms instead of differenced terms is:
zt = 0.9zt−1 + 0.1zt−2 + ut .
The autoregressive polynomial
1 − 0.9x − 0.1x2 = 0 ,
has roots 1 and −10, so the assumption of exactly one unit root is fulfilled.
0.4
0.2
0.0
.................
............. ..........
......
........
.......
.
....
.
.
..
......
......
....
......
.
......
......
.
....
.
.
......
.
.
...
.....
.
.
.......
....
....
..
.
......
....
.
.....
.....
......
.
....
.....
.
.....
....
.
...
.....
...
.
......
...
.
...
...
.
.
....
......
.
....
.......
.....
.
...
.....
.
.
.....
.......
......
.
.
.
....
.....
.
......
......
......
.
.
........
...
.
.
.
.........
.......
........
.
.
.
.............
........
.
.
.
.
.............
......
.
.
.
.
.
............
.
...........
.
............
.
.
.
.
.
.....................
.........
.
.
.
.
.
.
.
.
.
.
.
.
............................
.
.
. .....................................................................
..... ....................................................................................
−6
−4
−2
0
2
Figure 5.11: Estimated density of ADF case 2 for different σ 2 , T = 500 and
β1 = −0.1.
102
To see what influence β1 has we also vary its value while keeping σ 2 fixed at
1. The results for β1 = −0.9, −0.5, 0, 0.5, 0.9, 1 are shown in figure 5.12. The
awkward graph corresponds with β1 = 1, the ’original’ AR model is:
zt = 2zt−1 − zt−2 + ut
The autoregressive polynomial
1 − 2x + x2 = 0
has twice root 1, so there are two unit roots. That is probably why the graph
for β1 = 1 does not look like the other ones. For the other values of β1 the assumption of exactly one unit root is fulfilled. The values of β1 for the 10 pairs
when an AR(2) model is fit on the spread process are in a range of (-0.25,0.1).
0.4
0.2
0.0
.....
.... ....
.................
.......................
........... ........................
..........
..............
.
..
...............
.......
................
........
...........
........
.
............
........
.
.........
........
.
.........
.............
.........
.......
.........
.
.
...........
............
.........
.........
...........
...
.
.........
.....
.
.........
.
........
.........
....
.......
.
........
.......
.
....... ............................
......
.
.......
.....................
.
.
.
.
.
.
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......
.........
....
......
.
.
........
.
.
.
.........
.......
....
.......
.
......
.
.
............
.
...... ......
......
.
.
.
.
.
.
.........
.....
....... ......
.
.....
.
.
.
.
............
.....
........ .......
.
.
...........
.....
..... ......
.
.
.
.
.
.....
.
.
................
.........
.....
.
.
.
.
.
.
.
.
............
.....
............
.
.
.
.
.....
.
.
.
.
.
.
.................
..........
......
.
.
.
.
.
.
.
.
.
.
.
.
................
.......
........
.
.
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
.
...................
.........
.
.
.........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................
.................
...................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................................................................
.......................................................................
−6
−4
−2
0
2
4
Figure 5.12: Estimated density of ADF case 2 for different β1 , T = 500 and
σ 2 = 1.
103
We also show the results for two higher order models. First p = 3, figure 5.13 shows the estimated densities for three different settings of β1 and
β2 . We take β2 equal to -0.1 and β1 is -0.2, -0.1 and 0.1 successively, these
are also values seen with the 10 pairs. For these values the autoregressive
polynomial has exactly one unit root and the other roots are outside the
unit circle, so the null hypothesis is satisfied. Also the graph of figure 5.4 is
displayed, again they coincide nicely.
0.4
0.2
0.0
......
............................................
............ ........................
..........
..............
.
.
.........
..
........
.........
.......
......
.........
........
.
.........
..........
.
.........
........
.
.........
...........
........
.
.......
..........
.
......
..........
.
......
..........
.
......
.....
.....
.
.....
.......
.
....
...
.
.
.
......
...
.
.
.
.......
...
.
.........
.
.
.
.
.
.
.
......
.
.....
.....
.
.
....
....
....
.
.
....
......
.
....
..
.
.
.......
.
.
....
.......
.
.
.
.
.........
....
.
.
.
.
.........
......
.
.
..........
.
.
....
.........
.
.
.
..............
......
.
.
.
.
..................
.
......
.
.
...................
.
.
.
.
.
.
.
.......................................
.........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...........................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....... ..............................
−6
−4
−2
0
2
Figure 5.13: Estimated density of ADF case 2 for different β1 and β2 , T = 500
and σ 2 = 1.
Lastly, we consider p = 5. Figure 5.14 shows the estimated densities for
three parameter settings:
Setting 1 : β1 = −0.3 β2 = −0.2 β3 = −0.1 β4 = −0.05
Setting 2 : β1 = −0.1 β2 = −0.1 β3 = −0.1 β4 = −0.1
Setting 3 : β1 = 0.1
β2 = −0.1 β3 = 0.1
β4 = 0.05
These three settings also represents most of the 10 pairs and that the null
hypothesis is satisfied was checked with Maple. We see that the densities
show more dispersion and do not coincide with the asymptotic density as
nicely as for the lower order models above.
104
0.4
0.2
0.0
......
................
............................ ..........
.. . .
.............. ...................
.............
.............
..
.......
................
............
.......
....
..
.
...............
........
.
....
.........
.
......
......
.......
.
.......
.....
.........
....
.
..........
.
..........
.........
.
.
.........
.
.
...........
......
.
.... ...
.....
......
.
....
.
.........
.
......
.........
.
............
........
.
.............
.........
...........
.........
.....
.
.
........
.........
.
.........
....
.
..........
.
.
.
.........
.......
.
.
.........
.....
.
.
.
.
...........
........
.
...........
.
.
.
.
.......
.................
.
.
.
.
.
.
.........
........
.
.
.
.
.
.
................
.......
.
.
.............
.
.
.
.
.
.
.
...................
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................................................................
.
.
.
.
.
.
....... ...........................................................................
−6
−4
−2
0
2
Figure 5.14: Estimated density of ADF case 2 for different β1 , T = 500 and
σ 2 = 1.
In the next section we look again at all these models to see if the power of
the Augmented Dickey-Fuller case 2 test is influenced by the value of p.
5.7
Power of the Augmented Dickey-Fuller
case 2 test
In this section we briefly look at how much influence the value of p, the order
of the autoregressive model, has on the power of the Augmented DickeyFuller case 2 test. As in section 5.5 we generate 1,000 paths for different
values of ρ and see how many times the test rejects the null hypothesis of a
unit root.
We start with p = 2, so the paths are generated according to the model
zt = ρzt−1 + β1 ∆zt−1 + ut ,
for we take ut i.i.d standard Gaussian random variables, sample size T = 500
and z0 = 0. Table 5.14 shows the number of rejections for several values of
ρ and β1 . We use values of β1 which are seen with the 10 pairs IMC provided. When ρ = 1 the null hypothesis is satisfied, the other roots of the
autoregressive polynomial lie outside the unit circle, they are -4, -10 and 10
respectively for β1 = −0.25, −0.1, 0.1. We see that under the null hypothesis
the test behaves as expected. The power is quite similar to the Dickey-Fuller
case 2 test in table 5.9. The power is better for the positive value of β1 .
105
Table 5.14: Number of rejections,
β1 = −0.25
β1 = −0.1
ρ
1% 5% 10% 1% 5% 10%
0.9 986 1000 1000 998 1000 1000
0.95 447 845 949 582 919 976
0.975 68 297 491 104 395 589
0.99 18
75
162 13
89
199
0.995 15
70
138 13
66
137
1
9
49
97
7
45
88
1.01
0
1
4
0
1
2
p = 2.
β1 = 0.1
1%
5% 10%
1000 1000 1000
792 980 997
179 502 708
28
117 232
10
74
144
9
42
89
0
1
2
Table 5.15 shows the number of rejections for the AR(3) model:
zt = ρzt−1 + β1 ∆zt−1 + β2 ∆zt−2 + ut .
We use the same values of β1 and β2 as in the previous section: β2 = −0.1 and
β1 = −0.25 − 0.1, 0.1. For these values of β1 and β2 and when ρ = 1 the null
hypothesis of exactly one unit root is satisfied. The table does not indicate
that the power when p = 3 is much less than the power of the test when p = 2.
Lastly, table 5.16 shows the number of rejections for the AR(5) model:
zt = ρzt−1 + β1 ∆zt−1 + β2 ∆zt−2 + β3 ∆zt−3 + β4 ∆zt−4 + ut ,
where we used the three parameter settings from the previous section. This
table indicates that the power of the test with p = 5 is less than the power
of the test for smaller values of p. Specially the first setting of parameters
shows that the power of the test is less for p = 5.
106
Table 5.15: Number
β1 = −0.2
ρ
1% 5% 10%
0.9 978 998 1000
0.95 406 765 900
0.975 63 283 482
0.99 17 83 163
0.995 13 74 134
1
12 51 114
1.01
0
0
2
of rejections, p = 3
β1 = −0.1
1% 5% 10%
986 1000 1000
487 851 944
86 287 473
22
89
177
13
71
136
13
62
104
0
2
3
and β2 = −0.1.
β1 = 0.1
1%
5% 10%
1000 1000 1000
684 941 984
129 399 615
27
113 219
19
67
128
14
58
104
1
5
6
Table 5.16: Number of rejections, p = 5.
setting 1
setting 2
setting 3
ρ
1% 5% 10% 1% 5% 10% 1%
5% 10%
0.9 782 975 999 924 995 1000 1000 1000 1000
0.95 198 527 717 307 708 874 731 904 992
0.975 37 176 313 69 253 418 186 531 743
0.99 15 83 152 15 97 179
21
114 228
0.995 14 52 115 15 52 124
14
83
163
1
12 48
99
9
50 101
12
47
97
1.01
0
1
10
0
4
9
0
1
1
107
108
Chapter 6
Engle-Granger method
In the previous chapter we derived and simulated the properties of the
Dickey-Fuller test. In this chapter we would like to find the properties of
the Engle-Granger method, which we use for testing stock price data for
cointegration. As explained in chapter 4 the Engle-Granger method consists
of two steps, a linear regression followed by the Dickey-Fuller test on the
residuals of this regression. The main question is, are the critical values
of the Engle-Granger method the same as those for the Dickey-Fuller test.
In the first section the critical values of Engle-Granger method are found
by simulating price processes xt and yt with random walks, then all the assumptions of the method are satisfied. The second section also simulates
the critical values but now the model from section 4.2 is used for simulating
the processes xt and yt . Then not all assumptions are completely satisfied,
because xt and yt are not strictly integrated of order one. The third section
finds the critical values with bootstrapping from real data. In the last section we simulate cointegrated price processes xt and yt with the alternative
method from section 4.5 and find out whether the Engle-Granger method
recognizes them as cointegrated. The main focus of this chapter is on the
case 2 test, but we will also compare the power of this test with the case 1
test.
109
6.1
Engle-Granger simulation with random
walks
The Engle-Granger assumes we have two prices processes {xt , yt }Tt=0 where
each individually is integrated of order one, I(1). Then xt and yt are cointegrated if there exists a linear combination that is stationary:
xt , yt are cointegrated ⇐⇒ ∃α, α0 such that yt − αxt − α0 = εt ∼ I(0) .
As described in chapter 4, we prefer to set α0 = 0 because of the cash-neutral
aspect. So the Engle-Granger method boils down to
P
P
Estimate α with OLS: α̂ = Tt=0 xt yt / Tt=0 x2t .
Calculate spread et = yt − α̂xt .
Test et for stationarity with ADF case 2 test.
In order to simulate pairs of data that are certain to be cointegrated and
certain to be not, we like to simulate xt and generate yt belonging to xt such
that the spread process is an AR(p) process. This is because the DickeyFuller test assumes that the input series, in our case the spread process, is
an AR(p) process. The process xt has to be integrated of order one, then the
most simple model for xt is a random walk:
xt = xt−1 + ut ,
(6.1)
with ut i.i.d N (0, σx2 ) variables and x0 an initial value. Then the difference
xt − xt−1 is white noise, so xt ∼ I(1). Now we have xt , we like to generate yt
such that yt − αxt is AR(p) for some α and some p.
In this section we look at a few different settings, but only for p = 1 and find
out whether the distribution and power of the Engle-Granger test statistic
differs from the earlier derived distribution and power of the Dickey-Fuller
case 2 test statistic. First, this is done under the null hypothesis of the DF
case 2 test, this means that there is no constant in the spread process but
the constant is estimated. Second, when there is a small constant present
in the spread process. Last, for p = 1, we generate yt with α0 but do not
regress on a constant to find out whether this is still cointegrated according
to the Engle-Granger method.
110
AR(1) under the null hypothesis of DF case 2
We want the spread process to be an AR(1) process:
yt − αxt = εt = β0 + βεt−1 + ηt ,
for {ηt } we take i.i.d N (0, ση2 ) variables. Then we can generate yt like:
yt = αxt + β0 + β (yt−1 − αxt−1 ) + ηt ,
y0 = αx0 .
for t = 1, . . . , ,
(6.2)
(6.3)
Under the null hypothesis of the Dickey-Fuller case 2 there is a unit root and
no constant in the spread process, β = 1 and β0 = 0. The processes xt and yt
are cointegrated if we take β < 1. Figure 6.1 shows a sample path for x and
y when β = 1 and figure 6.2 for β = 0.5, with both graphs α = 0.8, β0 = 0,
x0 = 25, σx2 = ση2 = 1 and T = 500.
125
75
25
....
......... ......... .
..
.......
........... .... ...
...
. . ........ ...
.....................
.
.
.
..
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..................... ..... .....
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. .....
.............
......
.. ............. ....
.......... ......
.
.
.
.
.
.
. ..
..........
..
............
....
..... .............................................
.....
.
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
......................
.
.
.
.
....... ............ ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.... ...............
...............
.... ............
....
........ .....
...
............ ..... ....
.. ..........
.. .. .....
. ............... . ...
....................
............ .................................... ..................
........... . ............ ........................................
.
.
.
.
..
....
................. ......
...... ............. ... ...................... ... .
.. .. .........
....... .........................
..
..........
... ................................................
.................................... ............. .....
.
.
.
.
.... .. ... .....
.............. ....
.....
0
100
200
300
400
500
Figure 6.1: Pair x, y not cointegrated.
40
30
20
10
....
............ ....... ........
...
.... ...... ............. .........
......
...
.......
.... .................
..... ..
..
..
......
...... ... ..... ........
. ........
.. .... ........... ........
... .....
... ....
.......... ............ .
...............
.
.......... ...... .
..... ... ........ .....
.
.
.
.
.
.
. .. ..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. .... ...
.. .... .........
.
........ ............ ......... ..... .. ............ ......... ......... ....
..... .
.. .. ..
.. ..
. ..... .. ........... ........ ..... ....... ...... ... .... ............. ...... .. ..... ... ...... .............. .. . ............. ....
.
.. .. .. . .. ...
........ ..
...
........ .. .... .. .......... ... .... ... .. .. ... ....
.. ... . ....... ..... ..... ............ .....
.. .. ........ ............... ....
.. ...... ........
.... ....... ....... .... ........... ....... ............. ................
................
.. ........
............
... ............... ..... ....... .......... ...... ................. ...... .......... ..................
... ........ ....
.. . . .........
.
.
.
.
.
.
.
.
.
.
.
....
.
.
.......... ................. .....
.
.
.... ........
.
............ .... ..... .. ... . ..... .. ........ ..... .. .. ....... . ......... .........
.
......... .....
. .. .. ...
... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... .... .....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....... ......... ...
.
........ . .... ..... ...
. ........... ......... ...............
..... . . .
. . .. .. ...
.
.. ............. ....... ........ ...... ... .... ........ ..... .........
...... ...... ...
... .................................. ...... . ............. ..
....... ..... .....
.. ...... ... .......
.. .. . ... . ....
........ .
................ .... ........ .
..... .... .......
................. .. .......................... ......... ..... ...............
.....
..
.. ........ ...... ........ .............
.......
... ... ............ ............. ........
.......................
.
........ .... .......... ............... ........ ......... ................ .... ....... .
.
... ... .... ........ .....
.. .. ... ..... ........... .................
..............
.
.. ..
.... ...
... . ...... ..... ....... .
.. ...
. ..
................. .......... ....
...... ... ..
. .... .... ...
...............
..... .
... .... .....
......
....
...
0
100
200
300
400
Figure 6.2: Pair x, y cointegrated.
111
500
To see whether the critical values of Engle-Granger are more or less the
same as for Dickey-Fuller case 2, i.e. to see if estimating α has an affect
on the critical values, we simulate a lot of paths xt and yt under the null
hypothesis and calculate the test statistic S. The procedure is:
Simulate xt .
Generate yt according to (6.2).
Calculate α̂.
Calculate the spread et = yt − α̂xt .
Calculate the spread et = yt − α̂xt
DF-test on the spread:
Estimate β with OLS.
Calculate the OLS standard error for β: σ̂β̂ .
Calculate test statistic S = (β̂ − 1)/σ̂β̂ .
Repeat all this 5,000 times.
Then we estimate the density of the simulated test statistics, again with a
Gaussian kernel estimator. This we can compare to the density we found for
the Dickey-Fuller case 2 test in chapter 5.
Figure 6.3 shows the estimated densities for different values of α for T = 500,
x0 = 25, σx2 = ση2 = 1 and β0 = 0. The figure also shows the Dickey-Fuller
case 2 from figure 5.4. Figure 6.4 and 6.5 show estimated densities for different values of σx2 and ση2 respectively, for the same parameters as above
and α = 1. When the null hypothesis is completely satisfied, that is β = 1
and β0 = 0, these densities look a lot like the density for the Dickey-Fuller
case 2 density. So it looks like the preceding step to the DF test, namely
estimating α, does not really affect the critical values. To see if the power of
the test is not affected by the preceding step, table 6.1 shows the number of
rejections for different values of β and α. It is clear from the table that the
power of the Engle-Granger method is not dependent of the value of α. This
table should be compared with the columns of case 2 of table 5.9, because
there is no constant and the initial value of the spread process is 0. We see
112
0.4
0.2
0.0
.........................
...................................
.......
...........
.....
.........
.
.......
......
.......
........
....
.
......
......
.
.....
.........
.
........
......
.
.......
.........
.
.........
.......
.....
.
......
........
.
........
............
........
.
.......
.........
.
.............
.........
.
.........
........
.
.........
........
........
.
......
.......
.
.
...........
......
.
.........
....
.
.
.........
..
.
.............
.
.
.
...............
.........
.
.
.
.
...................
........
.
.
.............
.
.
.
.
.
................
..........
.
.
.
.
.
.
.
......................
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
...................................................................................................................................
.
.
.
.
.
.
.
.
.
.
.
....... ............................................................
−6
−4
−2
0
2
Figure 6.3: AR(1) Estimated density.................for
EG test statistic, α = 0.1, 0.5, 1.
.........
0.4
0.2
0.0
. .. .
.... ..........................................
... .......... ...............
...........
... ...... ...
.............
.. ..........
.
............
... ...........
............
.. .........
.
.........
.. ........
.
............
.. .........
.
..............
.. ........
.............
.
.. .........
............
.
..........
............
.
............
............
.
............
...........
.
...... .....
............
.
...... .......
............
.
...... ........
............
....................
.
..............
............ ...
.
...............
...........
.
.... .............
..........
.
.... ............
.........
.
.... .........
.
.
...............
................
.
.
.
......................
..........
.
.
.
...... .............
.
.
.................
.
...........................
.
.
.
.
.
.
.............................
..................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
................................................................................................. .....
.
.
.
.
.
.
....... ................................................................................................
..... ..........................
−6
−4
−2
0
2
Figure 6.4: AR(1) Estimated density of.. EG test statistic, σx2 = 0.1, 0.5, 1, 5.
0.4
0.2
0.0
..... .
...........................
....................................
...........
............
........
........
.
..
.......
... ......
........
.. .........
.
........
.. .........
.
........
.. ..........
.
..........
.. .......
..........
.
.. .......
...........
.
.........
.. .....
.
...........
.........
.
.............
..........
.
............
..........
.
............
...........
..............
.
..........
..............
.
.................
.............
.
... ........
.......
.
... ..........
........
.
.
.... ......
.
..........
.... ........
.
.
.... ...........
..........
.
.
.................
.
........
.
.
......................
.
.
.
.
.
.
.....................
..................
.
.
.
...............
.
.
................. .
..................
.
.
.
.
.
.
.
.
.
.................................. .
........ .
................................................................................
...............................................................................................
−6
−4
−2
0
2
Figure 6.5: AR(1) Estimated density of EG test statistic, ση2 = 0.1, 0.5, 1, 5.
that there is practically no difference between these columns and table 6.1,
which indicates that the power of the Engle-Granger method is as good as
that of the Dickey-Fuller test. The estimation of α does not have a negative
influence on the power of the test, which is a nice property.
113
Table 6.1: Number
α=1
β
1%
5% 10%
0.9 1000 1000 1000
0.95 722 975 1000
0.975 169 485 686
0.99
29
133 248
0.995 14
67
149
1
14
52
105
1.01
0
0
3
of rejections, AR(1)
α = 0.5
1%
5% 10%
1000 1000 1000
724 964 993
164 466 683
38
123 242
17
75
161
8
51
109
1
1
2
and β0 = 0
α = 0.1
1%
5%
1000 1000
717 961
145 469
36
143
18
87
10
52
1
4
10%
1000
993
691
254
163
115
6
AR(1) with constant in spread
When testing for cointegration with the Engle-Granger method, we use the
Dickey-Fuller case 2 test which assumes there is no constant in the spread
process but does estimate one. It is interesting to see what happens to the
properties of the Engle-Granger method if we do include a constant, β0 ,
in the spread process. From tables 5.11 and 5.12 we already saw that the
power of the Dickey-Fuller case 2 test was not really affected by a small
constant, when there was no unit root. But the number of rejections when
there was a unit root were a bit small. To be more precise, for increasing
values of the constant the Dickey-Fuller case 2 test statistic converges to the
case 3 statistic, standard Gaussian, as explained in section 5.4. But with
the Engle-Granger method we do a preceding step, we estimate α, maybe
this first step has a restraining influence on the shift. Figure 6.6 shows the
estimated density of the Engle-Granger test statistic when there is a small
constant, β0 = 0.1, in the spread process. The dashed line is the density
when β0 = 0. With both graphs the paths were generated with T = 500,
α = 1, σx = 1, ση = 1 and β = 1. There is a shift to the right, as we could
expect from the properties of the Dickey-Fuller test. However, the dotted
line is the density when β0 = 1000, so there is a restraining influence on the
shift. The Engle-Granger test statistic does not converge to the DF case 3
statistic for large constants. Although this is nice, for small values of β0 we
still have the same shift as for the DF case 2 statistic, so the first step in
the Engle-Granger method does not have a big influence. This can also be
observed in table 6.2, it shows the number of rejections for different values of
β when β0 = 0.1. The power of the Engle-Granger test is rather close to the
114
power of the Dickey-Fuller case 2 test with a small constant, as seen in table
5.11. So it looks like the Engle-Granger test statistic has the same properties
as the Dickey-Fuller case 2 test statistic, for small constants.
......
...
..
...
..
..
.
.
...
..
..
..
...
.
..
...
.
.
.
..... ..............
.
.
.
.
............ . . .
...
.
..
.
.
.
.. .......
.
..
.
..... . .
.
. .
.....
...
.
..
. ....
....
..
.
.
...
....
.
.
.
....
..
.
.
.
.
.
.
....
...
.
..
.
...
.
.
...
.
..
.
.
.
.
...
.
...
.
.
.
.
.
.
.
.
.
...
.
..
..
.
...
.
...
.
.
.
.
.
.
...
..
.
.
..
...
.
.
...
.
.
.
.
.
.
.
...
.
...
.
.
..
.
....
.
...
.
..
.
.
.
.
...
...
.
.
.
.
.
.
.
.
.
.
...
.
.. ....
.
.
...
.
...
. ....
.
.
.
.
.
.
.
....
.. .....
.
.
....
....
.
.
.
.
.
.
.
.
.
.
.
.
....
.
.
... ....... .
.
.
.
.....
.....
.
.
.
.
.
.
.
.
.
.
.
......
.
.
......
..... ........ .
.
.
.
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......
......
...........
.
. .
.........
.......
............. . . .
.
.
.
.
.
.
.
....... ....... ............................................. .
. ...... . .
.... ....... ....... .........................................
...............................................................
.
...
0.4
0.2
0.0
−6
−4
−2
0
2
Figure 6.6: Estimated density for EG test statistic, β0 = 0.1.
Table 6.2: Number of rejections of the null hypothesis, β0 = 0.1.
β
1%
5% 10%
0.9 1000 1000 1000
0.95 733 969 993
0.975 144 444 650
0.99
33
120 240
0.995 25
86
170
1
6
31
62
1.01
0
0
0
Neglecting present constant in the cointegrating relation
As explained before, we like the cointegrating relation not to have a constant,
α0 . However, if there is a constant we neglect it; we only fit yt on xt and not
on a constant. In chapter 4 we saw that if there is relatively large positive α0 ,
neglecting it results in an overestimation of α which in turn results in a down
trend in the spread process. The other way around, neglecting a negative
value of α0 results in an up trend in the spread process. Two stock price
115
processes with a trend in their spread process, do not form a good pair for
our trading strategy. But for a small value of α0 , there is not a big trend and
the price processes form a good pair as seen in figure 4.18. It is interesting
to see how the Engle-Granger method performs if there is a small α0 but
it is neglected. We can generate cointegrated data where the cointegrating
relation has a constant. We generate yt with:
yt = αxt + α0 + β0 + β (yt−1 − αxt−1 ) + ηt ,
for t = 1, . . . , T.
From this equation we can see that with this generating scheme including
α0 is a bit lame, including α0 is the same as including a larger value of β0 .
We have already seen what happens for larger values of β0 in figure 6.6. But
at last, we now have found a reason to use Dickey-Fuller case 2 instead of
case 1! The power of case 1 is practically zero when there is a constant, see
table 5.13. Table 6.3 shows this is also true when we perform the preceding
step of estimating α. The table shows the number of rejections when we use
case 1 in the Engle-Granger method and the ’normal’ Engle-Granger which
uses the case 2 test. When we use the DF case 1 test in the Engle-Granger
method instead of the case 2 test, the power is almost zero. The paths were
generated with x0 = 25, T = 500, α = 1, σx = 1, ση = 1 and β0 = 0. For
the value of α0 used to make table 6.3 and β < 1, we do see xt and yt as
a good pair, so we like the Engle-Granger method to see them as cointegrated.
Table 6.3: Number of rejections, α0 = 1.
Case 1
Case 2
β
1% 5% 10% 1% 5% 10%
0.9
6
51 126 578 835 918
0.95
0
0
3
196 432 577
0.975 0
0
0
139 352 492
0.99
0
0
0
77 256 415
0.995 0
0
0
56 157 264
1
0
0
0
4
14
28
1.01
0
0
0
0
0
0
116
When performing the Engle-Granger test on real data we do not know if
there is a small constant in the cointegrating relation, so from now on we
only look at the DF case 2 test. Because we use the DF case 2 test within
the Engle-Granger method, this method makes the following assumptions:
Processes xt , yt are integrated of order one.
Spread process yt − αxt is an AR(p) process.
There is no constant in the spread process.
So far, we have seen that when all assumptions of the Engle-Granger method
are fulfilled, the Engle-Granger test statistic has the same distribution and
power properties as the DF case 2 test statistic. In other words the first step
of estimating α does not have an influence. We have seen that when there
is a constant in the spread process, so not all assumptions are fulfilled, the
distribution makes a limited shift to the right. With limited we mean that
the Engle-Granger statistic does not converge to the DF case 3 statistic, like
the DF case 2 statistic does when there is an increasing constant. Last, we
have seen when there is a constant in the cointegrating relation it is better
to use the DF case 2 test within the Engle-Granger method instead of the
DF case 1 test. In the next section we examine what happens when the price
processes xt and yt are not strictly integrated of order 1.
6.2
Engle-Granger simulation with stock price
model
In section 4.2 we derived a stock price model which is commonly used for
the valuation of options. This model is more realistic then the random walk
from the preceding section. Although with this model the assumptions from
the Engle-Granger method are not completely satisfied, it is interesting to
find out if the method performs the same.
The approach for simulating price processes xt and yt is the same as in the
preceding section, only the paths for xt are simulated with the stock price
model instead of random walks:
p
xt = xt−1 + µ δt xt−1 + σ δt ut xt−1 ,
(6.4)
117
where ut are i.i.d N (0, 1). Then xt is not exactly integrated of order 1, there
is an upward drift µ so the expectation of the differences is not constant.
We look at small values of µ and for a for a finite sample size T = 500, so
xt is almost integrated of order 1. By simulating a lot of paths for xt and
corresponding yt we are going to see if this effects the Engle-Granger method.
We again simulate yt such that the spread process is AR(p) and to fulfill
the remaining assumption of the method, we do not include a constant β0 in
the spread process. For p = 1 the results of the simulations are the same as
in figure 6.3 through 6.5 and table 6.1, that is why they are not displayed. It
looks like the Engle-Granger method is not sensitive for xt not being exactly
integrated of order 1. In this section we consider the situation when the
spread process is an AR(2) process.
First we need to simulate xt . Typical values of the drift parameter µ are
between 0.01 and 0.1, and volatility σ between 0.05 and 0.5 when we measure
time in years. We like to simulate daily stock prices, so we take δt = 1/260
because there are roughly 260 trading days a year. We want to generate yt
such that the spread process yt − αxt is AR(2), for some α:
yt − αxt = εt = βεt−1 + β1 ∆εt−1 + ηt ,
(6.5)
for nt we take i.i.d. N (0, ση2 ) variables. Then we can generate yt like:
yt = αxt + β (yt−1 − αxt−1 ) + β1 ∆εt−1 + ηt ,
y0 = αx0 , y1 = αx1 ,
t = 2, . . . , T,
(6.6)
where we use within each step:
∆εt−1 = (yt−1 − αxt−1 ) − (yt−2 − αxt−2 ) .
We take ση2 = 0.1, because if we take the variance of η equal to 1 then yt is
much more jagged than xt and we are trying to model the price processes
more realistically. Then xt and yt are cointegrated if β < 1. Again we
estimate the density of the Engle-Granger test statistic by simulating for
different values of α in the same way as the previous paragraph. The results
are shown in figure 6.7 for T = 500, x0 = 25, µ = 0.05, σ = 0.20 , β1 = −0.1
and of course β = 1. The density of the Engle-Granger test statistic is again
comparable with the density of the Dickey-Fuller case 2 statistic.
118
...............
........................................
. ..
............. ........ ..............
.
.
.
... . .. ...
..
... .... ..
........
... ...... ...
.........
.
.... ... ...
.........
... ..... ...
...... ...
.
......... ...
..... ...
.
... ... ...
..... ...
.
......... ...
......... ...
........ ...
.
..... ...
...... ...
.
...... ...
....... ...
.
...... ...
....... ...
.
....... ...
...... ...
.
........ ...
........ ...
...... ...
.
..... ...
......... ..
.
........ ...
........ ...
.
.... .....
...... ...
.
... ....
....... ...
.
.. .....
... ..
....... ...
.
.........
...... ...
.
... ....
.... ...
.
...... ....
..... ...
............
.
..... ...
..........
.
...........
..... ...
.
..........
...... ...
.
.... ....
........
.........
.
............
........
.
.
........
.............
........
.
..........
..........
.
.
........
..................
.
...........
.
.................
...............
.
.
.
..............
...........
.
.
.
.
.
.
.
.
.............
.................
.
.
.
...............
.
.
.
.
.
.........................
.
..................... .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......................................................................................................................
....... .................................................................................................
0.4
0.2
0.0
−4
−2
0
2
Figure 6.7: AR(2) Estimated density for EG test statistic , α = 0.1, 0.5, 1.
Table 6.4 shows the number of rejections for three different values of β1 .
Compared to table 5.14, which shows the corresponding power of the DickeyFuller case 2 test, the power of the Engle-Granger method has not declined. It
seems that the Engle-Granger performs the same for data that is not exactly
integrated of order one, as for data that is.
β
0.9
0.95
0.975
0.99
0.995
1
1.01
Table 6.4: Number
β1 = −0.25
1% 5% 10%
987 1000 1000
477 834 945
98 337 505
15
98
176
12
66
143
8
61
111
1
2
5
of rejections, AR(2) and β0 = 0.
β1 = −0.1
β1 = 0.1
1% 5% 10% 1%
5% 10%
999 1000 1000 1000 1000 1000
611 925 986 825 986 998
139 404 617 197 569 756
24 126 234
30
150 262
12
65
144
21
97
163
11
53
101
13
50
104
0
2
3
1
4
4
119
6.3
Engle-Granger with bootstrapping from
real data
So far we have simulated paths xt and yt from scratch to find the critical
values of the Engle-Granger method. In this section we build paths xt and
yt by bootstrapping from real data. The data are the ten pairs of stocks that
IMC provided. First we describe the bootstrap procedure and then we look
at some results of the ten pairs.
Bootstrap procedure
Assume we have a pair that consists of two stock price processes xt and yt ,
for t = 0, . . . , T , which are integrated of order one. Let us assume further
that there exists an α such that yt − αxt follows an AR(p) process:
yt − αxt = εt = β0 + βεt−1 + β1 ∆εt−1 + · · · + βp−1 ∆εt−p+1 + ηt .
(6.7)
The null hypothesis of no cointegration against the alternative that there is
cointegration between xt and yt can be formulated as
H0 : β = 1 against H1 : β < 0 .
The first step in the bootstrap procedure is to estimate α with OLS, which
results in α̂. Then we can calculate the spread process:
et = yt − α̂xt ,
t = 0, . . . , T,
this resembles the true spread process εt which is assumed to follow an AR(p)
process.
In the preceding sections we knew the value of p but now do not, since
we are working with real data. The second step is to estimate p with the
information criteria described in chapter 3, which results in p̂.
The third step is to estimate the coefficients of the AR(p̂) model with linear
regression, which results in β̂, β̂0 , β̂1 , . . . , β̂p̂−1 . Then we can calculate the
residuals:
nt = et − β̂0 − β̂et−1 − β̂1 ∆et−1 − · · · − β̂p̂−1 ∆et−p̂+1 ,
120
t = p̂, . . . , T,
this resembles the true residuals ηt which are assumed to be white noise.
The fourth step is to calculate the test statistic for the real data
S=
β̂ − 1
,
σ̂β̂
where σ̂β̂ is the standard error of β̂.
Now we are ready to build a new path yt∗ that belongs to the original xt .
This is done in the following way:
yt∗ = α̂xt + ε∗t ,
t = p̂, . . . , T,
where ε∗t is built under the null hypothesis, that is β = 1 and β0 = 0:
ε∗t = ε∗t−1 + β̂1 ∆ε∗t−1 + · · · + β̂p̂−1 ∆ε∗t−p̂+1 + ηt∗ ,
with ηt∗ is taking uniform out of nt . We initialize the new path by:
ε∗i = yi − α̂xi ,
i = 0, . . . , p̂ − 1.
We treat the new pair {xt , yt∗ } the same way as with the original pair {xt , yt }.
That is, we calculate α̂∗ and spread process e∗t = yt∗ −α̂∗ xt which should follow
an AR(p̂) process. Then we estimate the coefficients of this AR(p̂) process
and calculate the test statistic S ∗ :
S∗ =
β̂ ∗ − 1
.
σ̂β̂ ∗
By building a lot of new paths yt∗ and calculating the corresponding test
statistic S ∗ , we can calculate the density of these bootstrapped test statistics. Then we can see if the test statistic of the real pair is exceptional. The
estimated density should also give an indication for the critical values of the
Engle-Granger method.
Results
The ten provided pairs are named pair I, pair II,..., pair X. We start with a
pair for which all three information criteria indicate that the spread process
is AR(1), pair II. By spread process we mean the residuals from the first
121
regression, et . The two stocks used are the same stock but listed on different
exchanges. The spread process is shown in figure 6.8. This is not necessarily
the spread we trade, in chapter 2 we discussed the adjustment parameter κ
which can result in a different spread. With pair II we will find κ = 0, so
the spreads for this pair look the same. In pair trading, this is as good as it
gets: we have a large number of trades, we never have a position for a long
time and the risk of the two stocks walking away from each other is minimal
because they are in fact the same.
0.1
0.0
−0.1
.
.. ... ...
.
...
. ... .... ....... ..
.
..
.. .... .... ...... ..
...
.. . ..... .. ... ..... ......... ... ......... ............. .. ....
......
..
.. ....
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .... ..
..
.
..
.. . . . . .
.
. ...
.
...... ...... .. . ... .
. ..... . ...
........ ... ..... ......... ... ..... ......... . ............... ..... ............................
... ... ..... ...
. ......
.. ..
... .....
.. .. ....... . .. ... ..
... .. ..... . ... ... ... .......... ............................. ....... ....... ................................... ....................................
......... .. .... ..................... ............ ................. .... .. ................ ............... ... ........ ........ ....... ........... .... .... ....... .......... .......... ............... ..... ........ ........ ........
....... .................. ... ............... ...... .................. ....................................... ............ ........................................................................ ...............................
............ .... .............................. ............ ............. ............... .......... ..................... .... ........................ ............. .... ..... ................ .......... ..... ......... ........ .....
.
................. ...... .............................. ............. ............ ... ........................... ... .................................... ........................ ........................... ......... ..................... ................ ............... ................ .......... ... . ........... ............... ....... .......................... .................. ................................................... ............... .... ................................. .... ...... ......
......................... ................ ................................................. .................................... ...................................................... .............................. ................................ ............................................. .............. ................. ........ ........... .................. ................ ............... ..................... .... .... ..... .. ............................... ...... ... .................... ... ... .
.. ................... .. ... .......... . ..
............................... ....... ...... .......... ................................. .................................. ....................... ....................... ..................... ...... ...................................... ................. ................... ....... .............. ...................... ........................ ............ ... ................................... ... .........
.... ....
. ... ..
......................... ..... .... ......... ................. ..... ............. ....... ...... ... ..... ......................... ...... ..... ........... .................. ...... ..... ... ...................... ......... ........ ............ .................................. ............................. .. ......
... .....
...... .. . ...... . .... .. .. ..................... ..... .... .......... ................. ............ .. ....... ................. .....
... .......... ........ .... ... ..... ........... ... ............. ....... .... ....... ........ ...............
... ... ... .. .......... .............. ... . ........ ...... ..
...
....
...... . .... ..... .
.. ... ...... .......... . ... ... .. .. ...... ......
..... ...... .........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.... ..
.
.
.
.
.
.
... .. .. ........
.
.
.
.
... .
... ..
..
... ... ... ..
.... . ..
...
. ....
...
... ..
.
... .
. . ..
..
.
..
0
100
200
300
400
500
Figure 6.8: Spread process pair II.
The spread series look stationary and according to the Engle-Granger method
the two stocks are cointegrated. The test statistic is -17.5, compared to the
1% critical value which is -3.44, we see that the null hypothesis of no cointegration is rejected. Applying the bootstrap procedure on this data set, we
get figure 6.9. The dashed line is the density of the Dickey-Fuller case 2
test statistic. This figure does not give an indication that the density of the
Engle-Granger test statistic differs from the Dickey-Fuller case 2 statistic.
122
.............
...
...
...
...
..
..
.
......... .... .....
.
.
.
...
.
.
....
.
.
.
.
.
... ....
....
...
.
....
.
.
... ....
.....
. ...
..
.
.
... ....
.......
.. ..
.
..
.......
.....
.
....
.....
..
. ...
...
.
...
.....
...
..
.
.
......
.......
.....
.
.....
.
.....
.
......
.
. ..
...
.
... ...
.. ...
.
... ..
. ...
... .
. ...
.... ....
.
....
......
..... ...
......
.........
.
.....
.........
...........
..
.
.
.
.
.
.......
.
.
.
................
......
.
.
.
.
.
.
.
.
.................................
....
........................................................... .......
....... ....................................................................................
0.4
0.2
0.0
−6
−4
−2
0
2
Figure 6.9: Estimated density for EG test statistic by bootstrapping from
pair II.
Let us consider a pair for which all information criteria say that the spread
process is AR(2), pair VII. The spread process is shown in figure 6.10. It
does not look as good as figure 6.8, but this still is a good pair. According to
the Engle-Granger method the stocks in this pair are cointegrated, the test
statistic is -4.65. The bootstrap procedure results in figure 6.11. The estimated density coincides with the density of the Dickey-Fuller case 2 statistic.
1
0
−1
...
.....
.
........
...........
..
.
... .......
...
...
. .. .....
.
....
....
....
.. ... ....
.
.............. .......
.
.....
. .
..
..... .. .
.
......
.............. ....
.. .. .. .
......
.....................
.. ... ... .....
...............
.......
.......
........... ..
.. ............ ..
... ... .... ......
..... .... ....
..
.....
........ .....
.
....... ..
. .... ................... .......... ....
.... .. ..... ......
........ .......... ...... ............. ..........
. ....... ........ .....
..... .....
....
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
...............
..
. .. .
.......
.. ............ .. ....
...........
....
..
....
... ..... . .....
.. ..
.. .
.. ......
.................. ...
....
......
...
.. ... ..................... ..... .... ....... ....... ..........
.........
....... ... ... ....
.. ....... ......
.. ... ..
............... ... .......... ..........
...
....
... . . ... ......... ........... .......... ........
...... ......
.................... ... ........ ...
. .. .. .... ........... ..
..... ............. ............. ............ ........
................ . .. .... ...
.................... . ... ..... .....
... ...........
........ ...... .... .. ....
............. ... .... ... .... ..
.
.
.
.
.
.
............ .
................. ......... ....... ....... .....
.
.
.
.
..................
........ ......... ...... ...
.... ......
.... .... ...
........
.....
. ......
............... ............ ... .
..................... ..........
.. ... ..
.....
..
..
........... ... ..
.....
..
... ....
.................
... ..... ... ....
.
.
.... ..
.. .....
. ..... ...............
.
.
..
... ..
. ...... .
.
.
.
.
....
...
.
0
100
200
300
400
Figure 6.10: Spread process pair VII.
123
500
0.4
0.2
0.0
..............
... ..........
...
........
...
....
..
.
...
.....
.....
....
.....
.
..
...
.
...
.....
.
...
..
...
.
......
......
.
....
..
.
.....
....
.
......
....
.
...
....
.
......
....
.....
.
....
..
.
......
...
.
....
.
.
.
.....
..
.
......
.
..
...
.
.
.......
..
.
.
......
..
.
.... .
.
...
........
.
.
.
......
..
.
.
.
.....
.
.......
...
.
.
.
.
.........
...
.
.
.
.
..............
.
.
.
.....
..................................................... .......
.................................................................................
−6
−4
−2
0
2
Figure 6.11: Estimated density for EG test statistic by bootstrapping from
pair VII.
Let us consider a pair for which all information criteria indicate that the
spread process is AR(3), pair VI. The spread process is shown in figure 6.12.
This looks a lot less interesting than the previous figure: initially the spread
is below zero for a long time and at the end the spread is above zero for a
long time. This shows that trading the spread would have resulted in only a
few trades and we would have had the same position for a long time. But this
is not necessarily the spread we trade as stated, in the next chapter we will
see the spread we would have really traded. According to the Engle-Granger
method the stocks in this pair are not cointegrated, the test statistic is 2.23. The null hypothesis of no cointegration is not even rejected at the 10%
level. The bootstrap procedure results in figure 6.13. The estimated density
is a bit bumpy but still coincides with the density of the Dickey-Fuller case
2 statistic. Even when the real data is not cointegrated, according to the
Engle-Granger method, the bootstrap procedure finds nearly the same density as the density of the Dickey-Fuller test statistic.
So far we have seen pairs for which all information criteria find the same
small value of p. IMC also provided a pair for which the information criteria
find p to be very large, pair V. As described in chapter 3 we fit an AR(k)
model, for k = 1, . . . , K, on the data and see for which k the criteria have to
lowest values. For this pair, even if we set K = 100 the criteria have to lowest
value for p = K. This indicates that the spread process does not follow an
AR(p) model. The spread process is shown in figure 6.14. It is obvious that
124
...
.....
.......
.... ......
.. ....... ....
.......
.
.
.
.. ..
.
.
.
.
. ...........
.....
....... .. ..........
... ..................... ... ....................
.
.............. ...... .. ....
.......... ...
........................
....
............................ ........................ ......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........ .. .
.... ...... .. .... . . ...
..............................
. ...... ........ ... . ................................... .......
..... ..
...........
..
... .... . . ..
........... ....
.
......................
... .
..... ...... .
.
.
.
.
.
.
.
. ...........
.. . ....
.........
................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
........... ......
............................ .... ........ ...... .........
.................
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. . . . . .
............... . . ........ ....
.. ..
.
..................... .......... ......................... ...... ...................... ......... ........ ..... .......... ........ .... ...... . .... . ........ ...................... ...................... .......
..... . ...... .. .
....... ... ............................ .......... ....... ... .................. ... .. ... ... .. . ......... . ... ............................... ..
................. ........ .... .... ..........
.....
...... ..... .... ...... ..... ............. ... ....... ... ........ .....
...
...... .......
.........
............. ... .. ...
......
.
.
.
.
.
.
.... ......
..
.....
.
.
...
....
..
2
0
−2
0
100
200
300
400
500
Figure 6.12: Spread process pair VI.
0.4
0.2
0.0
.. ..
........... ..........
..
.....
...
.......
...
....
....
......
...
.
....
...
.
.....
......
.
......
..
.
...
......
.
.......
.. .
.
... .
.. .
... ..
.
. ...
.
... ...
.
.. .
... .
.
... ...
.. ....
.
... .
..
.
... ..
.. ....
.
... ...
.. .
.
... .
....
... ...
.
.... .
....
.
.... ..
..
.......
......
.....
.
.
......
...
.
.
.....
......
.
.....
.
.
...
............
.
.
.
.
.
.
........
.........
.
.
.
.
.
............
.
.
...........
.
......................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................... ....... .......
....... ............................................
−6
−4
−2
0
2
Figure 6.13: Estimated density for EG test statistic by bootstrapping from
pair VI.
this ’pair’ is not suitable for pair trading. The Engle-Granger method does
not reject the null hypothesis of no cointegration because the test statistic is
-1.04 when p = 10 and 0.63 if p = 100. To apply to bootstrap procedure, we
set p = 10. The result is shown in figure 6.15, which coincides surprisingly
well with the density of the Dickey-Fuller case 2 test statistic.
We examine these and the remaining pairs further in chapter 7. In this
chapter, we have seen no reason to assume that the test statistic of the
Engle-Granger has a different distribution than the Dickey-Fuller case 2 test
statistic. The power of the two tests are also comparable. To find out if the
Engle-Granger method is also ’robust’, we apply the method on generated
data which do not fulfill the null hypothesis.
125
200
100
0
−100
....
.........
... ....
...... .... ...
......
.....
...
....
..
........
.... .......... ...
..
...
....
................ ..... .........
...........
.........
... .. ..
.
.
.
.
.
.
.
.
.
.
.
.
............ ...
. .. ......... .... .......................
............ . ....... ... ...... ...... .........
.. . ... ...
... ... . .........
..... ..
...
....
......
...
............
.
.
...
..
......
......
.......
...
.
.
.
.
.
.
.
.
.
....
....
.
....
....... ... .... ... ......
.
...
.. ........................ ...
..... .. ............ ... ... ...
.
............. ..... .. .....
... ...... ......... .. ....... ....
......
......
.
............ ....
.. ............. .... ..... .... .......... ......................
.............. ..............
.
.
.
.
... ........... ................
.
.
.
.
............. ..... .......... ............... ............ ............... ............ ......
...... . ..... .. ........
.
........... ... ............ ........ ...
.... ......
..... ..... ..
....
..
... ..
..........
...
..
............
......
.
0
100
200
300
400
500
Figure 6.14: Spread process pair V.
0.4
0.2
0.0
..
.......................
....
.
...
.....
.....
...
...
....
...
.......
...
.
.
....
...
.
......
...
.
...
..
.....
.
..
......
.
....
..
.
......
..
.
....
.
.
.
.....
...
......
.
...
...
.
......
....
.
....
...
.
....
...
.....
.
.....
....
.
....
...
.......
.
....
......
.
.
..
......
.
.
.
..........
.......
.
.
.....
.
...
.
............
.
.
.
. ........
.......
.
.
.
.
.
..............
....
.
.
.
.
.
..................................
.
.
.
.
.
.
.
.
.
.
.
............................................... .......
....... ................................................................................ .
−6
−4
−2
0
2
Figure 6.15: Estimated density for EG test statistic by bootstrapping from
pair V.
6.4
Engle-Granger simulation with alternative method
In section 4.5 we found a method for generating cointegrated data which do
not satisfy the assumptions of the Engle-Granger method. The generated
data is integrated of order one, but the spread process is not likely to follow
an AR(p) process. It is interesting to find out if the Engle-Granger method
is robust enough to see this data as cointegrated.
126
We will generate data such that the difference process zt follows an MA(2)
model:
¸
·
xt − xt−1
= zt = Θ2 wt−2 + Θ1 wt−1 + Θ0 wt ,
yt − yt−1
where wt is i.i.d. N2 (0, Σ) and Θ0 = I. Then xt and yt are cointegrated if
matrix (Θ2 + Θ1 + Θ0 ) has eigenvalue zero and the corresponding eigenvector
is the cointegrating relation, which we normalize to [−α 1]′ . For example,
the matrix
¸
·
4
2
,
−2 −1
has eigenvalue zero with eigenvector [−1/2 1]′ . So one possibility to generate cointegrated xt and yt , is
·
¸
·
¸
·
¸
2
1
1 1
1 0
Θ2 =
, Θ1 =
, Θ0 =
.
−2 −4
0 2
0 1
There are no restrictions on the covariance matrix of the innovations wt , Σ
except that it is a covariance matrix, so it must be symmetric. We start
with a diagonal matrix, so the innovations are independent. For Σ = cI,
table 6.5 shows the number of rejections of the Engle-Granger test for 1,000
different paths xt and yt . Although the spread process in this section is not of
autoregressive form, the Engle-Granger method fits an AR(p) on the spread
process. The value of p is again estimated with the information criteria from
chapter 3, the maximum value of p, K, was set equal to 10. The table also
shows the average of the estimated values of p. Unfortunately, the EngleGranger method does not perform very well. It sees on average 12% of the
generated paths as cointegrated on a 10% level. The average of estimated
p values is high, which means that it is difficult to fit a good AR(p) model
on the spread process of the data, which in turn is not strange because the
spread process does not follow an AR(p) model.
127
Table 6.5: Number of rejections, Σ = cI.
c 1% 5% 10% p̄
2
47 89 147 9.9
1
25 77 125 9.6
0.5 24 61 116 9.1
0.1 11 49 102 7.6
Consider the situation when the innovations are correlated, we take Σ of the
form:
¸
·
1 ρ
.
A=
ρ 1
Table 6.6 shows the number of rejections of the Engle-Granger test for different values of ρ. Even for ρ = 1 the Engle-Granger method does not perform
well.
Table 6.6: Number of
ρ 1% 5%
1
34 86
0.5 31 67
rejections, Σ = A
10% p̄
140 9.8
134 9.7
To see what happens, figure 6.16 shows the spread process of a realization
xt and yt . This does not look stationary, there seems to be a trend in the
spread process. This could mean that with this setting there a is a constant
in the cointegrating relation, α0 . Figure 6.17 shows the spread process if we
regress the same realization of yt on the same xt and a constant.
It is clear that there is a constant, α0 , in the cointegrating relation. Neglecting the constant, results in a spread process which is not stationary.
That is why the Engle-Granger method does not reject the null hypothesis
of no cointegration. Table 6.7 shows the number of rejection when we do
not neglect α0 and take Σ = cI. The maximum value of p is set to 5 in
the remainder of this section, to reduce computation time. It is clear that
the Engle-Granger method performs very well, almost every path is seen as
cointegrated (which is true).
128
250
200
150
100
50
0
... .
. ........
......................
. .........
..........
.
...
..
.....
...
..
...
.
.
.
.
....
. . ..... ..
..
...... ....... .... ........ .... ......... ..........
...
..
............ .. .............................. ...... ... ......
..
.......... ..... ..... ................ .............. ........
...
.... .. ........ ....
....
. ..... .. ...
...
..... ........
...
... .......
.......
....
.
.
.
.
.
.
....
.
..... ...........................
.
.
.
.
.
.
.
.
.
.
. ....
.... ......
......... .. .
.. .......... ....... ..
........ ......... . ........
.....
..
.
......
.
.. ....
..... .
.
......
. ............
... .. .......... .
...................................
........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ... . . ..
....
...
.
..
.
.
.
..... .......... ........ ..........
.
...
..
............ ....
....
.
...
. ......
..
.. . ...... ........... .. ................
....
.......
.. ...... .... ... ....
...
........
..
... ........ ........... .............
....
..
...
.... .........
... ... ..............
.
.
.
.
.
.
......
...
......
..
....... ...
............ ...........
...
.. ... ...
......
........ .. .. ...
............. ...
..
... .. ..
. .. ...
.. .....
..
... .. ...
.. ... ..
.. .
.. ........ ...
... .... .. ..
..... .....
........................
...
...... .... ...
.... .
.
....
0
100
200
300
400
500
Figure 6.16: Realization spread setting 1, ρ = 0.5.
10
5
0
−5
−10
.
...
...
...
.
....
......
......
..
.
.... .
..
..
.. ..
.....
...
.....
...
.
..... ...
..
...
.....
......
...
....... ...
...
.
............... ........
.
... .
....
.
...
.....
...... ...
....
.. ..........
.
. .
....
......
... ... ... .................. ... .. ..... ....... .
.
.
............... ....... .. ..... ... . .... ...
........
.
...
.... ....... ...
.......... ... . .. .......
.... .... .... ................... ...... ... . ......... ...... ...
..
......
....... ........
................... ...... ... ............. ... ... .......... ... .... ...... ........ . ......
.
......... ... ... ... .........
.
... ..... ..... ........... .. ........ ... ... ... ...... ............ .. .... ..
...
..... ....... ..
..... ... .. ... .. ..........
............ .... .. ........... ... .. .. ........... .... ..... .... . ....... .. .....
.
.
.
.
.
.
.
.
.
.
.
.
.
......... ......................... ................ ... ...... . .................. ...... ...... ............... . ...... .......... ... ........................ ........ ... ...... ... .......... . ....... ............
.. ........ ....... .. .. . .... .
..... .. ..... .............. .. ........ .......... ..... ............ .... ..... ..
......... ....................................................... ............... .................. ......... ....... ............ .... .................. ...... ........................ ..................... ...... ... ............ ... ......... ................... .......... ................ ................ ........ ...... ................... ...... .................... ........ ........................... ......................... .. ......................................... ............................................... ............ ....
........ ................................................. ............ ................ ............... ....... ......... ........ ........................... ......... ............. ......... ...................... ...... ... ........... ..... .................................... ...................... . .......... ...... ......... ............... ..... ............................ ................ ................ ................................... ... ........................... .................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................... .......................................... ...... ................ ........................ ........... ........ ......... ........................... .. .................. ................................................... .................. ........ ............... ...... ...... ............ ... .. ....... ........... ..................... .... ... .. ...................... ...................
............................................ ......... ...... ......................... .................... .... .... ...... ..................... .............................. ............... ......... .......... ...... ................................ ...... .................. ............................................................. ........................... .............................. ......................... ...... ...... ... ... . . ............. ........................... ...... ... ... ......... ...... ... .....................
........ ................... ..... ..... ............. .... .. .............
................. ...................... ......... .... ...................... ....... .... .......... ...... ......... .................... ...... ......... ... ................ ......... ............. ................................................... ............ ..... ......... ................ ....... .................. ........ .......................... ........................ ..... ...... ... ...
...... ...... .............. ...... ..... ...... ...... ......... .........
.............. ................... .......... .. .................. ...... ... . ... .... ........ ...... ..... .. ............ ................. ........ ........ ............................................. ...... ............... ..... ........ .............. ................ ... ........... ...... ... ..... ... ................. ...... ..... .......
... .... ............ .... ..... .... ... ......... ......
............ .... .. .... ........ .... ......... ................ .... .... ................. .......... .. .... .......... .. .... ............ ..... .. . ......... .... ...... .. ........ .... .... ......
.......... ............. .......
... ...... .......... ... ...... ... ... ........... ......
........ ...... ...... ... ...... ............ ... ............ ............ ... ...... ...... ............ ......... .......... . ...... ... ...... ...... ............ ...... ... ... ...... ...... ........... ............ ..... ... ..........
....... ......... .....
.. ..... ...... .. ... ....... ... . ........ ..
..... .. ........... ......... ... ........
.... ..... .. . ..... ........ ... ......... ..... ... .. ..... ..... ......... ........ ..... ..... . ...
.. ...... ... ..
...... ...
.
.
.
.
.
.
.
.
. .... ... . ... ... ..
.
.
.
.
.
.
.
.
.
......
.. .... .. ... ...
.... .
... ...
........ ......... ... ......
. ..... .. ..... ...... ... .. ..... ...
...
.....
... .. .. ..
.
.
.
.
.
... ...
.
.
.
.
.
.
.
.
.
.
...
... .. ..... .... .. . ..
....
...
.
.... .... .. .
.....
... ... .. ..
...
...
...
....
... ...
... ... ..... ... ..
.
...
... ...
...
.
... .
... ... ...... .
.
...
.
...
...
. .. ......
.....
...
.
.....
.
....
.....
....
...
..
0
100
200
300
400
Figure 6.17: Realization spread setting 1, with α0 .
Table 6.7: Number of rejections, with
c
1%
5% 10%
2 1000 1000 1000
1 1000 1000 1000
0.5 999 1000 1000
0.1 1000 1000 1000
129
α0 and Σ = cI.
p̄
4.7
4.5
4.3
3.5
500
So far, we have generated cointegrated
it to be cointegrated, which is data with
at a different setting of parameters:
¸
·
·
−1
1 −1
, Θ1 =
Θ2 =
2
−1 0
data but not in the way we want
a small or no constant α0 . We look
2
0
¸
,
Θ0 =
·
1 0
0 1
¸
.
The matrix (Θ2 + Θ1 + Θ0 ) has eigenvalue zero with eigenvector [−1 1]′ .
Figure 6.18 shows a realization of the spread process, when yt is only regressed on xt and not on a constant. In other words, we neglect a possible
α0 .
10
5
0
−5
..
.
...
...
...
.
..
...
..
..
..
.
.
.
.
....
.
.
.
..
...
...
..
...
..
..
... ... ...
...
...
.
..
...
... ....
...
.
...
...
... .. ...
...
. ...
... .. ..
.... ..
...
....
..
...
... .
... .. ... ... ... .. . ..... ...
... .....
....
......
...
... ... ...
.... ...
.....
..
.. . ... ... ... ... ... . ... ...... ........... ...
...... ....... . .. ... . .............. ... ... .. ............ ......... ... .. . ...
.
..........
... .................. ...... ......
........
..........
.. .......
...
.
.... .. ... .. ... ... .. .. .. .... ..... ...
. .... . .... . ...... ... . .. .. ... ...... .. ... .. ... .......... . ..... .. .. .. ..
.. .. ........... ... .... ....... .... ...... .. ........
.. .. ......... ... ......... .. .... .... .. . .... .. .... ........ . .... .. .... ............ .. ........ ... ... .. .... ... ... ...... . .... ......... .. .... .......
..... ......... ..................... .. ..... ................. ...... ..... ...... .. .......... . . ................................... .................. .. ..... ......... ................. ..... ........ ..... ................ ..... .. ..... ........................ .................. ..... .. ....... ... ............................ ... ...... ............. ... ........ .................
. ........ ............... ............................ ..................................... ........................... ........................ ................... ............................................................................... ............ ............................................. ............ .......................... ...... ...... ............................................... ................... ............................ ........................................................................... ................... ...... .................... ............................
............................................................................................................................................................. .................................................................................................................................................................................................................................. ..................................................................................................................................... .................................................................................................................................................................................................................................. .............................................................................................................................................................................................................................................
........................................ ............................................................................................... .................. ..... ............................................... ................................................................... ... ........................................ .....................................................................................................
...... ....... ................................ ........... ......... ................................................................................... .... .. ....................... .................................................................. .... ........................................................................... ... .............................. ........................................................................ ................................................
........ ......... .......... .................. ............ .......... ........................ ...................... ... ..... ............................ .... .................... ..... ................... .............................................. ....... ...... ............................................................................. ... ......................................... ........................................................... ........................................ ........ ........................ ............. ..................... ....
.................... ... .......... ..... ..... ......... ........ .. .......... ....... . .. .... ...
........ ........ .. .... ........ .......... ............. .............. ... ........ ... ....................... ..... ..... ............................. .... ................... . .......... ........ ......... ..... ................. ... ........... ......... ... ............... ........................................ .........................
. .. .. .... .... .. ..
............... ... ....... .. ... ......... ..... ........ ....... . ... ...
.....
.........
.. ... .. .. .. .. ......... ....
... ....... .. . ...... ........
... ...... .........
.. .. .. .. ..
....
...
... ... ... .... .. ....
... .... ..
......
........ ... .. ... ... ... ...... .
... ... ..
...
... .. .........
. .. ... ...
..... ... ..
.
....
. .. .. .....
..... ..
.
... ....
.
.
... ...
.
.
.
.
.
.
.
...
...
. .. .
.. .. .....
.....
.. . ... .. ...
...
... ...
.. ...
...
.
...
...
......
... ...
...
...
.
.
...
...
.. ..
...
...
.
.
...
...
..
...
0
100
200
300
400
500
Figure 6.18: Realization spread setting 2, ρ = 0.5.
Neglecting α0 with this setting is no problem, the spread process seems to
be stationary. We wish Engle-Granger method to see the corresponding xt
and yt as cointegrated. Table 6.8 and 6.9 show the number of rejections for
Σ = cI and Σ = A respectively. The power of the Engle-Granger test is
good, almost every time the null is rejected.
The Engle-Granger method performs very well, even when the spread process
does not follow an AR(p) model, the test behaves exactly how we want it. If
there is a large constant α0 in the cointegrating relation, it does not reject
the null hypothesis of no cointegration. Although the data is cointegrated, it
is not cointegrated the way we want it, that is with a small or no α0 . If there
is a small or no constant in the cointegration relation, the test has rejected
the null hypothesis almost every time.
130
Table 6.8: Number of
c 1%
2 946
1 974
0.5 981
0.1 995
rejections,
5% 10%
980 992
988 997
992 995
998 999
Table 6.9: Number of
ρ 1%
1 991
0.5 970
rejections, setting 2 Σ = A.
5% 10% p̄
996 999 4.7
991 996 4.6
131
setting 2 Σ = cI.
p̄
4.83
4.75
4.60
3.95
132
Chapter 7
Results
In this chapter the results for the ten pairs IMC provided are discussed. To
be clear, IMC provided 2 years of historical closing prices for each stock of
the ten pairs. According to IMC, among these ten are some very good pairs
which means they make high profits. Some are losing money and some are
mediocre. In the first section we apply the trading strategy to the historical
data to see which pairs would have been profitable and put the pairs in order
of profitability. We like to see if the stocks in a profitable pair are cointegrated, and if the stocks in a pair that loses money are not cointegrated. In
other words if profitable and cointegration coincides. We apply two different
cointegration test, the Engle-Granger and the Johansen method, but first we
will examine in the second section if the assumption of the price processes
being integrated of order 1 is fulfilled. In the third and fourth section the results for respectively the Engle-Granger and the Johansen method are stated,
the pairs are put in order of the levels of rejection of the cointegration tests.
7.1
Results trading strategy
The 10 pairs are named, pair I, pair II,..., pair X. In chapter 2 the trading
strategy was explained. For each pair, we need the first half of observations
to determine the parameters of the strategy and we apply the strategy to
the second half. In order to compare the results we trade the same amount
of money with each pair. With each trade we buy one stock for the amount
of ¿ 10,000 and sell the other for roughly the same amount. The sell trade
is not exactly ¿ 10,000 because of the positive or negative ’investment’ of
133
threshold Γ, as explained in section 2.3. The results/profits are shown in table 7.1. The traded spread processes of the 10 pairs are shown in figure 7.1,
these are the spreads with the adjustment ratio if present. The upper left
corner is the spread for pair I, upper right corner for pair II, and so on. To be
clear, the spread of the second half of observations is displayed and this is the
spread which is traded. The dashed lines are the corresponding thresholds Γ.
Table 7.1: Results trading strategy.
parameters
result
pair
Γ
κ
# trades profit
I
2.33
5
3
1129
II
0.02
0
25
5536
III
0.16
2
4
506
IV
0.77
5
7
1344
V
19.68 8
0
0
VI
0.48
1
4
495
VII
0.13
1
11
2293
VIII
030
2
10
2091
IX
0.12
2
4
141
X
0.30
2
12
2304
Even the highest profit may look a bit small, but recall that we do not have
to invest a lot of money. On the other hand to loose the same amount as the
highest profit, the two stocks have to walk 50% away from each other in the
wrong direction, which has little chance of occurring. Profits above ¿ 1,000
are considered good enough to trade, profits below ¿ 1,000 are considered not
to be worthwhile. But profit is not the only criteria, the number of trades is
also important. Obviously, the more trades the higher the profit. But this is
not the only reason, in chapter 2 was explained that traders do not want a
position for a long time because that involves risk and the number of trades
is an indication for this. According to IMC, pair IV is still a good one. We
get exactly the same selection of good and bad pairs as IMC if we set the
minimal amount of trades equal to 7. IMC already decided which of the 10
pairs is a good one and which is not based on trading experiences, before
providing the data. A pair is considered good enough to trade if the profit is
134
5
0
−5
..
....
.
......
.......
..... ........
.........
....
.. .........
......
... ... ......
....
. ..........
...
.............. ....... ....... ........ ....... ....... .......... ........... ........ ....... ........... ................ ....... ..
............. .
...
.... .. .........
..
....
....
.............. ..
...
........... ... ......... .
...
......
......... ........... . ........ ....
..
.... ..... ........... .......... .......... .... .......... ............ ....
.... .......... .... .. .... ....
.......
.......... ... ... ............... ........
. ... .. .. .... ....... ..... ....
... .................
.
.
.
.
.
.
.
.
.
. ...... .......... ....... ....
.............
.
................. ............ ...... .........
......... .. ........ ..
.... .......
....
.... ............ ..... ...... ... ..
.... ... .
.....................
..
..... ......
. ...
....
.. .
. ..
..................... ..... ........
.
.
...
....... . ............ ...
.
.
....... ....... ....... ............ .......................... ....... ....... ....... ....... ....... ....... ....... ..
..
... ..
.
... ..
........
......
....
....
......
.. ......
.. ......
......... ...... ........
........... ........
... ....... ... .......
...
...
.....
....
..
.
..
.
..
......
.
.
.
.
.
.
............................. .....
.
.
.
.
.
.
........ ............. ........ .......... ................................... ....... ....... .................... ....... ....... .......... ..
...
......... ........
. ..........
.
.
.
.
.
.
.
.......
.
.
....... ....... ................ ............ ........... ....... ....... ....... ....... ........... .............. ....... .........................
.. .........
.......... ...
... ................
....
... ......... ... ..............
... .... .....................
.
...... ....
...........
...
300
1
0
−1
10
0
−10
1
0
−1
0.5
0.0
−0.5
300
400
400
400
500
400
500
......
.
. ..
............................. ....... ....... ....... ....... ....... ....... ........ ........................ ....... ....... ..
. ....... ..... ......
..
... .
......................
........
....... ..
...
... . .. ......
............ ...... .........
.
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....... ....... ...................................... ....... ....... .......... ........ .............. .............. ...................................... ..
...
.... .............. . ... ......... ... .. ..... ..
.
.
.....
........
.................
.. .....
..... ....
....
....
....
.. .............
......
......
.. .. ..
......
..... ..
...
.... ... .. ....................
.........................
..............
.....
..
300
400
−0.1
300
2
0
−2
1
0
−1
500
.
......
. ...
.. ...
..........
.
...........
.... ..
..
..
........................ ..........
.......
.... ... ... ......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......
. ....
. . ..... ....
. ..
. .
.................. ... ... .... ... ..........
................ ..... ..... ..
... ..
........... ............. ....... ................................................................... ....... ....... ................................. .................. ...................................
.
.. ..
........... ... ......... .........
.. . . .... ... .................. .
...
...
..........
...... ......
. .. .. .
...
. ......
......
....... ............ ............. ....... .......... ................................. ........................................... ......... ....... ................ ......... ..
....
...... .. . ... .. ..... . ..
...... .... .....
.
.
.
.
.
.
.
.
.
.
.
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............
........... ..
........ ..
.
.
......
.
.
.
.
.
.... ... ..
.....
.
.
.
.
.
.
....
... ... .....
............
..
.....
300
0.0
500
.
.. ......
...
....
.. ......
.....
. ......... ...
.............
.........
.
.
.... .............. .......
... .......
.
.
.
.
.
..
...
... ... . ....
.. .. ..
.........
.................
...
..... .. ... ....
..
...............
...
... .. ...
... .... ... .. ...... ...... ........
...........................
...
.. .... .... .. ..... .................
.
................... ...... .... ......... .............................
.
.
.
.
...
.
........
.. . . ....... .. ....... ... ... ... ... . .....
............ ..............
........ .. ..... ......................
.......... .... .. ... .. .. .. .. ....
.. ..... ...................... .........
...... ......
......... .... .. . .. .. ... . ..
........... .......... ..... ..............
........ ........
........ ..... ... ... ..... .. .. ... ...
.
.
.
.
.
.
......
..
... ... .... .... .
.
... .. ........... ........ ..
.......
... ..... ..
... .. ................. .
... ... ............ ......
........
....... ...
....... ... ............
...... ... .......... ....
.. .. ..
...
................ .. ......
..... .. ..... ..
... ..
.
.............. ... .....
.......... ...
........ ....
................ ... .....
.......... ...
.......... ...
...........
.........
........
.
.
.
.
......
.............
......
.
...... ......
........
...... ......
...
.... ...
...
..
..
300
0.1
.
...
...
....
..
...
.
....... .. ..... . ... .
.....
..
......
..
..... ............... ........................ .............. .. ............ ..................... .......
.
.
.............. ................ ............................... .................................. .......... ....... ... .......... ....... ....... ....... .............. ..
.. ............. ........................... ......................... ............. ... .................................... .... .......... . . ....................... . .. ........ . ... ......
........................ ...... .............. ............. ........ ....................... ......... .. ............................ ........ ... ................. ... ... . ...............
.
.
.
................ ............ ....... ........................................... ............................ ....................... .............................................................. ..................................................................................................................................
.. .........
..... .....
... .. .. ......... ............................ ...... ....... ...................................... .....................................................
.. ........ ....... ...... ... .... ...... .. ........... ............................. ....
....
...
...
.... . ...... . .. ..... ....... .......... ...... ........................ ...
..
... ...... .... ...... . .. ..
...
. ..... ..... ........ .... ...
. ... ..
0
−1
2
0
−2
−4
500
300
400
500
400
500
...
.....
..
.....
....
.... ....
....
.......................
........
.. ........
.. ...... .. ...
.. ..............
... ............ ..
.... ...........
... ........
....
... ......
...
........ . .....
.
.
.
.
.
.
.
.
... .....
...
......
........
.......
.............. ....
....... .. ............
............
..
......... .... ........
... . .
.....
.. .. ........
............... ....... ....................... ....................................... ....... ....... .................................... ....... .............. .... .......... .........
.............. ..
...... ... ......
.. .. .... .. .......
.. ...
...... .. ........
.. ....... . .. .. .. .... ..... ...
..... ..... ...
.. .................... .... ... .... .... . ...
..... .....
..... ... ..
...... .....
... . ......................... ...... ...
..
.
.
.
.
.
.
.
.......... ....... ......... . ...... ....... ....... ....... ....... ....... ....... ....... ....... ........... ..
.
.. ...
......
....
.....
.. ..
..
.....
.
.....
..
.
.
......
...
.
....
........
.........
.............
.
........ ....
...............
. . . .........
.. ........
.
..... ......... ... ..
... ..............
.....
........ ... . ....
..
... ... ................ ..
......
. .. ......
...
...
........
... ....
.
.
.
......
............. ............... ...
.
.
.
... ...... ..
.. ....
..
. . .... .
.
.................................. ........................... ....... .................................. .... ....... ....... ........... .............. ....... .......... .........
.
. .
..
.
.. .
.
....... .............. ............. ....... ....... ....... ........ ................. ....... ................ .......... ....... ........................
.... .. ..
.
..... ...
.
.
.
.
.
...... ...
.
.
.
.
.
. .. ..
. .... . ...... ... .............. ...
... ...
.......
....... ...... ... ...... ... ....
. ...
..... . ...... ... .. ....
.....
. ....... .. ... ... .. ..
.............. .. .. ..
.
... ... ..
. .....
.
.
.
...
. ... ....
...... ......
.............
...........
........
........
...
...
..
300
400
500
300
400
500
Figure 7.1: Traded spreads.
135
500
.
.
..
............ .................. ........... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ..
..
.
.... ....... .....
......
.............. .....
...
.
..... ..
...
.......... .... ...
..... . ........ ..... ...
........... ..
. .. ...
...... . .. ... .................... ..... .............. ........
.
.
.
.
..
... . ..
. ... ....... ... ...... .......... ........... ..........
.......................
......................... .....
............. .. .... ....... ........ .........
....................
........................ .....
............ . .... . . .
....... ........................................ ........... ......... .................. ....... ....... ............................... ................... ....... .. ..........................................
.
..........
.... ....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......... .. .................. .. ....... .............
.
................
....
.....
...
..... ... ..............
...................
.. ...... ...... ..
.. ... ........ ......
...... ....... .. .................... ......... ....
... .. .
.
... ... ... ... .......... .. ....
... ...
. ......... .. ............. ... ..
.............. .. ....... .. ....
... .......
................. ... .......
... ......
.................. .. ..........
..... ..
............... . .......
.....
............
.........
.... ...
........
... .
....
..
...
300
1
400
...
................
.............
......... ....
.
.... ..
......
.......
... ...
.. .....
... ...
... ..
........... .... .......
.... ....
. .. . ... . .. . .
.
....... .......... ........................................... ............... ............... ............................. ........... ......... ..................... ....... ......... ..
... ........ ... .................... ...... .... ... ...... .......... . ...... ...... ...... ....
.....
...
...... ..... ...... ......... .. .. ........... ...... .. .. .. ....
.......
... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
.
.
.. ... .... ............ .... ..... .............. .... .. .... .... ........ ...
.......
.
... .. ..........
...
. ...... ............ . . ..... ....... ..... ..........
.... ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.......... ......... ....... ....... ....... .......... ....... ....... ....... ....... ....... .............. .......................
............. .. ..
. ...
..... ..
....
.... .....
... ..
..
.
... ...
... ...
.........
.
........
.
.
.
above ¿ 1,000 and the number of trades is no less than 7, otherwise the pair
is considered not to be worthwhile. The ordering of the 10 pairs based purely
on the results from the trading strategy, where the first five are considered
good or good enough, the remaining five are not, is:
1
2
3
4
5
6
7
8
9
10
pair
pair
pair
pair
pair
pair
pair
pair
pair
pair
II
X
VII
VIII
IV
I
III
VI
IX
V
We briefly discuss the spreads from figure 7.1. The spread for pair I is rarely
hitting its threshold Γ although the adjustment parameter κ is large. The
spread for pair II looks good, but it could be better if we had used κ = 1.
After t = 425 the spread is a relatively long time below +Γ, with κ = 1 we
would have made a profit of ¿ 6721 in 36 trades. The spreads for pair III, VI
and IX are rarely hitting their tresholds Γ, the adjustment parameter κ is
small but increasing it does not have a positive effect. For pair III increasing
κ to 5, results in a loss of ¿ 2,108. For pair VI the profit gets smaller, while
the number of trades increases. The spread for pair IV shows the reason
why we use an adjustment parameter, without it this pair would have traded
twice with a total profit of ¿ 385. For pair V the threshold Γ is not displayed,
because it is 19.68. Lowering the threshold results in a loss when we keep
κ = 8, only when we also reduce κ to 1 or zero we get a small profit. The
spreads from pair VII, VIII, and IX look good, they hit their thresholds Γ
regularly and produce a nice profit. Changing the parameters slightly does
not effect the number of trades, it affects the profits slightly.
136
7.2
Results testing price process I(1)
Both cointegration tests require that the stock price processes xt and yt are
integrated of order one. In section 4.2 was derived that it is reasonable to
assume stock price processes fulfill this requirement, but in this section we
will do a unit root test on the stocks of the 10 pairs to see if the requirement
is fulfilled. The unit root test we use is again the (Augmented) Dickey-Fuller
case 2 test, we perform the test twice. The first test is:
H0 : xt ∼ I(1) against H1 : xt ∼ I(0) .
The outcome should be not to reject H0 . The second test is:
H0 : xt ∼ I(2) against H1 : xt ∼ I(1) ,
which is equivalent to:
H0 : ∆xt ∼ I(1) against H1 : ∆xt ∼ I(0) .
The outcome of this second test should be to reject H0 , which makes it likely
that the price processes are I(1).
The Dickey-Fuller test fits an AR(p) model to the stock price process, we
estimate p with the information criteria from chapter 3 and set the maximum value of p, which is K, equal to 10. Table 7.2 shows the outcomes of
both tests, where we used the following critical values
1%
5%
10%
-3.44 -2.87 -2.57
since we have roughly 520 observations, T = 520. The stocks in a pair are
denoted with x and y, the test statistic of the first test is stated along with
whether the null hypothesis is rejected. The outcome of ’not rejected’ is denoted with symbol ¬ otherwise the level is stated. The average value of the
estimated p also stated and the results of the second test are stated in the
same way.
The table shows that it is likely that all stocks from the 10 pairs are integrated of order one.
137
Table 7.2: Results I(1).
stock
I -x
I -y
II -x
II -y
III -x
III -y
IV -x
IV -y
V -x
V -y
VI -x
VI -y
VII -x
VII -y
VIII -x
VIII -y
IX -x
IX -y
X -x
X -y
7.3
statistic
-1.7
-2.1
-1.4
-1.3
-2.1
-2.2
-1.4
-1.5
-1.4
-0.3
-0.6
-0.6
-1.1
-0.9
-1.4
-1.5
-1.4
-0.9
-0.5
-0.7
Test 1
outcome p̄
¬
4
¬
4
¬
1
¬
2
¬
1
¬
1
¬
1
¬
1
¬
8
¬
10
¬
4
¬
4
¬
1
¬
1
¬
1
¬
1
¬
1
¬
1
¬
2
¬
1
statistic
-11
-11
-12
-25
-22
-26
-24
-22
-8
-8
-12
-12
-25
-25
-23
-23
-25
-25
-23
-16
Test 2
outcome p̄
1%
4
1%
4
1%
3
1%
1
1%
1
1%
1
1%
1
1%
1
1%
8
1%
10
1%
4
1%
4
1%
1
1%
1
1%
1
1%
1
1%
1
1%
1
1%
1
1%
2
Results Engle-Granger cointegration test
In this section we perform the Engle-Granger test on the 10 pairs. We have
found no reason to assume the Engle-Granger test statistic has a different
distribution than the Dickey-Fuller case 2 test statistic, so we will use the
same critical values:
1%
5%
10%
-3.44 -2.87 -2.57
138
We perform the cointegration test on the whole data set, so we have 520 observations per stock. Recall that the profits were determined for the second
half of observations. As stated in section 4.3, the Engle-Granger method is
not symmetric. The results can be different for regressing xt on yt and the
other way around. That is why we perform the Engle-Granger test twice.
The results are stated in table 7.3.
Table 7.3: Results Engle-Granger test.
pair
I
II
III
IV
V
VI
VII
VIII
IX
X
statistic 1
-1.57
-17.54
-2.21
-2.67
-1.04
-2.23
-4.65
-2.92
-0.63
-3.48
outcome
α̂1
p̄
¬
1.53 6
1%
0.997 1
¬
0.72 3
10%
0.52 1
¬
5.03 10
¬
1.08 3
1%
1.60 2
5%
0.53 1
¬
2.36 2
1%
0.72 1
statistic 2
-1.56
-17.56
-2.21
-2.61
-1.08
-2.24
-4.64
-2.91
-0.90
-3.03
outcome
α̂2
p̄
¬
0.65 6
1%
1.003 1
¬
1.37 3
10%
1.91 2
¬
0.20 7
¬
0.93 3
1%
0.63 2
5%
1.87 1
¬
0.42 1
5%
1.39 4
We see that there is only one pair where the outcome of the two tests are
different, pair X. The estimated cointegrating relation are for all pairs practically the same:
α̂1 ≈ 1/α̂2 .
So the disadvantage of the Engle-Granger method of not being symmetric
does not seem to be very harmful when testing pairs for cointegration. The
pairs are put in order of the test statistic, the idea is that the lower test
statistic the lower the level of rejection, which is more evidence for being
cointegrated. For example, the Engle-Granger method rejects the null hypothesis for pair II even at 0.1% level, while pair VIII is only rejected at
5%. So there is more evidence that pair II is cointegrated than pair VIII,
which is why we prefer pair II.
139
The ordering of the 10 pairs based on the Engle-Granger method is
1
2
3
4
5
6
7
8
9
10
pair
pair
pair
pair
pair
pair
pair
pair
pair
pair
II
VII
X
VIII
IV
VI
III
I
V
IX
where there is evidence for cointegration for the first five pairs, and no evidence for the remaining five. This ordering is not exactly the same as the
ordering found with the trading strategy, but they coincide on what is good
and what is not. The five pairs which are considered to be worthwhile trading are cointegrated and the five pairs that are not worthwhile trading are
not cointegrated according to the Engle-Granger method. The first in both
orderings are the same, this is the pair that consists of two equal stocks but
listed on different exchanges. In the first half of the ordering, the good ones,
only places 2 and 3 are switched, the others are at the same places. The
second half of the two orderings differ a lot.
7.4
Results Johansen cointegration test
In this section we perform the Johansen test on the 10 pairs. As discussed
in section 4.4, we perform three tests:
Test 1
Test 2
Test 3
H0 : 0 relations against H1 : 2 relations.
H0 : 0 relations against H1 : 1 relation.
H0 : 1 relation against H1 : 2 relations.
The critical values for each test are in table 7.4, these are for a sample size
of T = 400. Although the data of the 10 pairs IMC provided consist of 520
observations, these critical values will be used when testing the 10 pairs for
cointegration.
140
Table 7.4: Critical values for
Test 1%
5%
1
16.31 12.53
2
15.69 11.44
3
6.51 3.84
Johansen test.
10%
10.47
9.52
2.86
One issue that was not addressed in section 4.4 was how to find p. The
Johansen method assumes that the vector process yt = (xt , yt ) follows a
VAR(p) model. In S-PLUS, the program used for all simulations and calculations in this report, exists a built-in function called ’ar’ which fits a VAR
model using Yule-Walker equations. The function determines the order of
the VAR with the Akaike information criterion. This function is used for
estimating p. We set the maximum value of p equal to 10 and the minimum
value equal to 2 because the first step of the Johansen method is to fit a
VAR(p − 1) on the differences ∆yt . The results of the Johansen test are in
table 7.5.
Table 7.5: Results Johansen test.
pair
I
II
III
IV
V
VI
VII
VIII
IX
X
Test 1
Test 2
Test 3
stat. 1 outcome stat. 2 outcome stat. 3 outcome
6.80
¬
6.12
¬
0.68
¬
154.6
1%
153.8
1%
0.82
¬
7.37
¬
6.56
¬
0.81
¬
10.89
10%
10.43
10%
0.46
¬
2.86
¬
2.79
¬
0.07
¬
6.47
¬
5.91
¬
0.56
¬
24.25
1%
23.10
1%
1.15
¬
13.84
5%
13.16
5%
0.02
¬
2.78
¬
2.68
¬
0.11
¬
14.98
5%
10.11
5%
1.98
¬
141
Parameters
p̂
α̂
2
1.49
2
0.997
2
0.71
2
0.52
4
5.44
3
1.08
2
1.59
2
0.52
2
2.55
2
0.72
The Johansen method is symmetric, there is no difference if we set yt =
(xt , yt ) or yt = (yt , xt ). The test statistics and the estimated cointegration
relations are exactly the same. We consider the stocks of the pair being cointegrated if the null hypothesis of the first and the second test are rejected
and the null hypothesis of the third test is not rejected.
The Johansen method finds the same pairs cointegrated as the Engle-Granger
method, pair II, IV, VII, VIII and X. The levels for rejection the null hypothesis of no cointegration are the same. Only for pair X the results differ a bit,
but this is because the Engle-Granger method had two different outcomes,
the first test had rejected at 1% and the second test at 5%. The Johansen
method has rejected pair X at 5%. There are no real differences for the
cointegrated pairs, the estimated cointegrating relations are also practically
the same. The biggest difference is for pair VIII, where the Engle-Granger
method estimates α equal to 5.338 and the Johansen method 5.231. The
two methods differ more for pairs that are not cointegrated, the differences
between the estimates of α are larger. But according to these methods the
pairs are not cointegrated so there does not exist an α such that yt − αxt is
stationary.
The ordering of the 10 pairs based on the Johansen method is
1
2
3
4
5
6
7
8
9
10
pair
pair
pair
pair
pair
pair
pair
pair
pair
pair
II
VII
VIII
X
IV
III
I
VI
V
IX
where there is evidence for cointegration for the first five pairs, and no evidence for the remaining five. This ordering differs slightly from the EngleGranger ordering. But most important is that the two methods coincide on
which pairs are cointegrated and which are not. And this in turn coincides
with the results from the trading strategy.
142
Chapter 8
Conclusion
The goal of this project was to apply statistical techniques to find relationships between stocks. The closing prices of these stocks, dating back two
years, are the only data that have been used in this analysis.
From trading experience, IMC is able to make a distinction between good
and bad pairs based on profits. In chapter 2 we derived a trading strategy
that resembles the strategy used by IMC. From this strategy, we derived the
important characteristics of a good pair. We saw that we like the price processes to be tied together such that their spread oscillates around zero and
does not walk away.
In this report we tried to identify pairs with cointegration. If two stocks
in a pair are cointegrated, a certain linear combination of the two is stationary. This implies that this linear combination, which can be seen as the
spread, is mean-reverting. This is in line with the characteristics of a good
pair.
In chapter 4 we introduced two methods for testing for cointegration, the
Engle-Granger and the Johansen method. We have looked at the EngleGranger method in detail. This method makes use of a unit root test, the
Dickey-Fuller test. Because there is a lot of ambiguity in the literature of
which Dickey-Fuller test and which critical values should be used, we discussed the different cases in chapter 5. The asymptotic distributions of the
test statistics were derived and the critical values for finite sample sizes were
found with simulation.
143
In chapter 6 we examined the properties of the Engle-Granger method, which
consists of a linear regression followed by the Dickey-Fuller test on the residuals of this regression. The main question was, which Dickey-Fuller case
to use and whether the critical values of the Engle-Granger method are the
same as those for this Dickey-Fuller test. We saw that case 2 was the most
appropriate one for the way we want to test for cointegration, that is without a constant in the cointegrating relation. There was no indication, based
on simulations, that the critical values from the Engle-Granger test differ
from those of the Dickey-Fuller case 2 test. Also the power of the two tests
were found similar when the assumptions of the method were fulfilled. The
Engle-Granger test appeared to perform well, even when some assumptions
were not fulfilled. The Engle-Granger test assumes that the residuals follow
an autoregressive model. When we generated cointegrated data with residuals that are not likely to be autoregressive, the method still rejects the null
hypothesis of no cointegration often.
IMC has provided a selection of ten pairs that are different in quality. In
chapter 7 we applied the trading strategy from chapter 2 to the historical
closing prices. Based on profitability and the number of trades, we find a
distinction between good and bad pairs which coincides with the distinction
made by IMC. In this chapter we also tested the ten pairs for cointegration,
using both the Engle-Granger as well as the Johansen method. The two
methods coincide on which pairs are cointegrated and which are not. Also
the estimated cointegrating relations are almost the same. All the good pairs
according to the trading strategy are seen as cointegrated, according to both
tests. Furthermore all bad pairs are seen as not cointegrated according to
both tests.
Based on the results of this project, we may conclude that cointegration is
an appropriate concept to identify pairs suitable for IMC’s trading strategy.
144
Chapter 9
Alternatives &
recommendations
In this chapter we briefly discuss some alternative trading strategies in the
first section and give some recommendations for further research in the second
section.
9.1
Alternative trading strategies
In this report we focused on pair trading with two stocks in a pair. Two
stocks being cointegrated is easily translated in the trading strategy from
chapter 2, we take the spread process as the linear combination of the two
stocks corresponding to the cointegrating vector:
yt − αxt .
If we would take r̄ from chapter 2 as the least squares estimate instead of
the average ratio, the spread process of chapter 2 would be exactly the same
as the spread process found with the Engle-Granger method. That is if we
use the strategy without adjustment parameter κ, i.e., κ = 0.
In section 4.3 was stated that we neglect a possible constant in the cointegrating relation, α0 . In this section we will look at a trading strategy that
does not neglect the constant. We also look at what can happen if we have
cointegration between the logarithms of the stock prices.
145
Trading strategy with constant
Consider two stock price processes, xt and yt , which have the relation
yt − αxt − α0 = εt ,
(9.1)
where εt is some stationary process. In other words, the two stocks are cointegrated with a constant in their relation. We could trade the pair y, x with
ratio 1 : α and give up the cash neutral property, but another possibility
is to determine the trading instances with (9.1) and trade a quantity of x
such that the whole trade is cash neutral. More clearly, with (9.1) we can
determine whether xt is over- or underpriced compared to yt at time t but
we do not trade this relation, we trade one stock of y and yt /xt stocks of x
if there was a mispricing larger than Γ at time t.
Let us consider an example, let x and y be a pair with relation (9.1) where
α = 2 and α0 = 20 such that spread εt looks figure 9.1. The corresponding
processes for xt and yt are shown in figure 9.2.
1
0
−1
......
......
......
........
........
........
.........
.. ..
.. ..
.. ..
.. ...
.. ..
.. ..
.. ...
.. ...
.. ..
.. ..
.. ...
.. .
... ..
... ...
.. ...
.. ....
..
.. ....
.. ....
.. ....
... ....
..
..
... ....
..
..
...
...
...
..
.
.
.
.
..
.
...
.
...
.
...
.
.
.
.
.
..
..
..
.
..
.
.
.
.
..
..
..
...
..
..
..
..
..
..
..
..
..
...
...
...
...
...
..
..
..
..
...
...
...
...
...
..
..
..
..
..
..
...
...
..
..
..
..
..
..
..
..
..
..
..
..
..
..
...
..
..
...
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.
.
...
...
...
..
...
...
...
...
...
...
...
...
..
...
...
...
...
...
...
..
..
...
...
...
...
...
...
...
...
...
...
...
..
...
..
..
..
...
..
...
...
...
...
...
..
...
..
..
...
.
...
.
.
.
...
.
...
.
.
.
....
.
.
.....
...
...
...
..
.
...
...
...
...
.
...
...
...
..
..
...
...
..
..
..
..
..
...
...
...
...
...
...
...
..
..
..
..
...
...
..
..
...
...
...
..
...
...
...
...
..
..
..
..
.
.
.
.
.
.
.
.
.
.
...
.
.
.
..
.
.
.
..
..
..
..
.
..
.
.
..
..
..
...
...
..
...
..
...
..
..
..
...
..
..
..
..
..
..
..
..
..
...
..
...
..
..
..
..
..
..
..
..
..
..
..
..
..
..
...
...
...
...
...
...
...
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
...
... ..
.
... ..
... ...
... ...
... ...
.. ....
... ...
.. ...
... ...
.. ...
.. .
.. ...
.. ..
... ...
... ..
.. ..
... ...
... ...
... ..
.......
.......
......
......
......
.....
0
100
200
300
400
500
Figure 9.1: Spread εt .
For illustration purposes we took an artificial example. We have 500 observations, where we use the first half to determine the parameters of the
strategy and the second half to see if the strategy works. We fit the first
half of observations of y on the first half of observations of x and a constant,
which results in:
α̂ = 1.98 and αˆ0 = 20.29
The threshold Γ is determined in the same way as in chapter 2, but now
the spread process is the residuals from this fit. For this example it turned
out to be that Γ = 0.91. We apply the new strategy to the second half of
146
80
.
.. .......................................
.................
............ .
......................... ................. .................
.
.....
................................................................................................. ....
.
.
....................
.
.
.
.
.
.
.
.
.
.
........................... ......
.........
.......
........ ....................
....................................................................................................................................... ......
60
40
............................................................
..
..................................................................................................................................................................................
......................................................................
. .................
...........................................................................................................................................................................
20
0
100
200
300
400
500
Figure 9.2: Price processes xt and yt .
observations. The trades are shown in table 9.1. With the first trade we put
on a position for the first time, so we made no profit yet. The second trade
consists of two parts, we flatten the position from the first trade which results
in a profit and we put on a new position. We always trade one stock of y and
we trade yt /xt number of stocks x so it is exactly cash neutral. The actual
traded spread is not shown because it is basically the same as the right half
of figure 9.1. Figure 9.3 shows the spread if we do not include a constant,
i.e., if we neglected α0 .
trade
1
2
3
4
5
6
7
Table 9.1: Trading instances.
t
st
position (y,x) price yt price xt
251 0.91
(-1,+2.82)
72.08
25.59
291 -0.93
(+1,-2.80)
66.87
23.90
331 0.93
(-1,+2.85)
70.20
24.63
370 -0.91
(+1,-2.73)
71.03
25.98
409 0.95
(-1,+2.74)
77.65
28.37
452 -0.94
(+1,-2.66)
77.05
29.02
487 0.93
(-1,+2.71)
79.86
29.49
total profit
profit
0.44
1.27
2.99
0.07
2.37
1.55
8.69
Although the profit for each trade is not at least 2Γ, as the profit for the
trading strategy from chapter 2 with constant ratio was, it is still quite profitable to trade this pair this way. Specially because the trading strategy from
chapter 2 would not make any money, even if we would have used a large
adjustment parameter κ.
147
4
0
−4
....
.....
........ ...
... ............
....
......
.. ......
........
....... ........
......... .....
.. ... ........
....
....... ............. ..................
..
.
.
.
.
..... ........... . . ........
....
.....
... .
..
..
....
.......... . ... . ........
.. .
.
.......... ..............
......
... ......
.
.
.
.
.
.......
. .. .. ..
....
. . ..... ....
.. .
....
.............. ......... ....
.............................
... ....
....
.
.
.
.
.
.
.
.
.
....
.
.. .. .....
.
... . ... .... ... ....... .. ............... .. ...... ...
...... .
........... ..
..... ................
....... .........
.
.....
........ ..
.
...... ....
....
....
. ...
.. ... ................ .................
..... .... . .... ...
....... . .... ...
...
.
.
.
. .......
...
....
.
..
....
...... ...
......
......... ..........
.....
..... .....
.......
..
........
...
...
......... ... ....
...... ..............
...
0
100
200
300
400
500
Figure 9.3: Spread when neglecting α0 .
Although this strategy can be applied for every α0 , we still do not want
α0 to be large because of the market neutral property of pair trading. If the
overall market is up 50%, so x increases with 50% then we expect that y also
increases with 50%. With a large α0 compared to the stock prices, this does
not hold. Actually it does not hold for any α0 6= 0, but there is only a small
effect when α0 is small. The value of α0 used in the example is actually too
large, it is equal to the first observation of x. Which values of α0 that can
be used with this strategy, should be examined further.
Trading strategy for the logarithms
Assume we have two stock price processes and their logarithms are cointegrated:
log yt − β log xt = εt ,
where εt is some stationary process. Then the relation between xt and yt
becomes
yt = xβt eεt .
(9.2)
If β = 1, we can apply a trading strategy on the ratio process yt /xt instead
of applying it on a spread process. An example is shown in figure 9.4, where
we simulated xt according to the model in section 4.2 and generated yt such
that εt follows a stationary AR(1) model. A trading strategy could be to sell
one stock of y and buy one stock of x when the ratio is above 1 + Γ, and the
other way around if the ratio is below 1 − Γ. Or we could trade really cash
neutral, so we trade one stock of y and yt /xt number of stocks of x.
148
1.2
1.0
0.8
..
...
.
.
.
...
...
......
...
....
.....
.........
..... . .......
....
.....
.......
....
..............
........
........ ... ............
......
.
.
.
...
.
.
.
.
.
......
.
...
.......
.
.
.
... ....
.
.
. . .
.
..
... .
.
..... .... .... ...... ... .....
.. ...
.........
. .. ...
....... ...
....
..... ....... .... .. .. .... ....
..
...... ....... ..
..... ....... .. ... ... ... ...............
...........
.... .......... . .. ... ....
..................... ...
.............. .... .... ...
........... ....... ..... . ... .. ...... ..............
... .. .................. ... .. ..... .....
........................... .....
........... .................................. .... ........ ... ....
.
...................... ........... ....... .......... .... ....... ......... ....
.
.
.
.
.
................. ....... .... .......
. . . .
.
.
.
. . . . .. .
....
...
..
.................. ....... ............. ....... ... ...... ... ...
... .... ... .. ..
... ..... ......
... ...... ...... ... ........ .... .. .....
....
.... ..
...
.. ...... ..... .. .. ....
..
..... ... ..... . ........... ..... ... ...
.......
....... ...... ....................... .. .. ....... ... .. ......
.......
...... ......
.....
.......
........ ..... ........ ......... ........................
...
....... . . . .. .......... ....
........ .............
..... ... ............ ....... .. ... .. ....... .. ....... .......... ...........................
..... ... .. ........... ..... ..........
.......
........
....... .. ..
..
... .. .............. .....
............... .......................... ..
.......
..... . ........ ....................... .... .... ............ .....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... .... ..
.
..... ...... .. ..... .. ... ....... .......... ... ....
.
. ...... ...... ..... ..
..... ... ........... ..... .. ... .... .. ...... ... ..........
......
... .. . .......... ....
... ........
.. ..... .. ..
...
.....
... ... .... .. .. .... .. ........
.... ......... ..... .
....... ..... ........
......
... ........ ............... .......
.....
......
.... ....
... ..... .
........... ......... ... ... ..... ... .................. ....
.... .. ....
...
..... .. ...... ......
.... ...
.
.
.
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...... ....
..........
.. ..
..................... .... . ........ ...... ....
........ . ..... ....
.....
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... ...... ...
.........
.... . ...... .......
.....
.....
......
.................
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........
.
.
.
.
.
.
....
... ......
...
. . .....
.... . ... .. ..
.
.
.
.
......
.
..
....
.......
.
....
0
100
200
300
400
500
Figure 9.4: Ratio process yt /xt .
When β 6= 1, it is not that simple anymore. It is valid to say that when
β 6= 1, β > 1. Because if it is not, we take yt to be xt and vice versa. Then
we get the same problem as with a large α0 , the relation is not market neutral. If x increases with 50%, y increases more than 50% according to the
relation (9.2). And if this happens when we have a long position in x and a
short position in y, the profit in x cannot compensate the loss in y, so we lose
money. Maybe this can be prevented for values of β close to 1, by adjusting
the ratio in which we trade x and y but this should be examined further.
In this report we have only discussed trading strategies that trade one line,
we put on a position when the spread reaches ±Γ and wait till the spread
reaches Γ in the other direction. But it is very interesting to trade more lines.
For example, if the spread reaches +Γ1 , we put on a short position in y and a
long position in x. If the spread increases further and reaches Γ2 we enlarge
our short position in y and our long position in x. A trading strategy could
be to trade the same amounts at each threshold, which are equally spaced,
Γ2 = 2Γ1 . Figure 9.5 illustrates this idea. To make this more clear, table 9.2
shows the trading instances for this strategy with two lines when we trade x
and y in the ratio 1:1.
149
4
2
0
−2
−4
...
.....
....
.
... .
.. .............
.
.......
.
.
.
.
.
... . .. .. . ... ..
.. ...
. . . . . . . . . . ......... .... ............... .. ..... ..... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... .... . .
...
.. .... .. ... .. .. ..........
.
.
...
....... ...
.. .... .. ... ... ... ......
.. ..
...
... ... ......... ....... ..
..... ...
.. ....
...
..... ...
.
.
.
.
.
.
.
.
. ..
..
...
... .....
...........
..... ........
...
..
... ... ........
... ...... .........
...
..
...
.. .....................
............... ......
.
.
.
.
.
.
.
....... ....... .............. ............... ....... ....... ....... ....... ......... ....... ....... ................ ....................... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......... ........... .......
...
...
...
.. ..
.
.
.
...... ... . ...........
.
.
.
.
.
.
.
...
...
.... ..
.
....
.... ...
..
.
.
.
.
.
.............. ........ .........
.
.
.......
...
... .... ...... . .......
..
........
........
...........
...
.. .. ........ ....
... .............. ............. .........
........ ...........
........
.... ...
...
.
.......... . . ..........
.
.... ..... .... ....
.
..
. .. ...
..... ...
...
.... ....
........
.
... ... ......... ........
... ..........
.. ..
...... ...
...
.... ............... ....
. .....
..
.
.
.
.
.................................................................................................................................................................................................................................................................................................................................................................................
.....
....... ....... .. ..
......
.
.
...
......
.
.
.
.
.
.
.
.
. .. .. ... ..
.....
. ..
...
.
.
.
.
.
.
.
.
.
.
.
......... . .......
....
...
... ..
....
..
.... ...... .....
...
... ....... . .........
....................
.
.
.
.
.
.
.
.
.
.
.
.
....
.....
..
... . ....... . ..
..
.... ......
........
... .. ... ........ ..
..... ... .......... . . ... . ....
....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ............ ............ .......... ....... .......... ............... ....... ....... ....... ....... ....... ....... ....... .......
... ..
.. .......
..........
... ...
.......
... ..
..
... ...
... ...
... ..
.. ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. . . . . . . . . . . . . . . . . . .
.....
.
0
100
200
300
Figure 9.5: A 2 line strategy.
This strategy can be easily extended to more lines, it can even be easily
extended to three or more stocks in a pair. How to choose the number of
lines, the thresholds and the corresponding amount of stocks is very interesting to examine further.
9.2
Recommendations for further research
It would be nice to develop the alternative method from section 4.5 further,
such that we have a new method for testing for cointegration. In order to
do so, we need an accurate algorithm for the estimation of the parameters of
the MA(q) model.
In this report we used closing prices. It is interesting to apply the trading
strategies and cointegration tests to intra-day data because we trade during
the day. This is specially interesting if we have a trading strategy with a
large number of lines with the thresholds close to each other.
We could cut the cointegration test into several pieces. Suppose we have
datasets containing four years of closing prices, than we could perform three
tests on two years of data with an overlap of one year. More clearly, the
first test is on the first and second year, the second test is on the second and
150
Table 9.2:
trade
t
1
26
2
56
3
97
4
152
5
158
6
199
7
206
8
221
9
284
10
289
11
297
12
306
Trading instances.
st
position (y,x)
2.12
(-1,+1)
4.22
(-2,+2)
1.94
(-1,+1)
-0.01
flat
-2.11
(+1,-1)
-4.20
(+2,-2)
-1.89
(+1,-1)
0.13
flat
2.18
(-1,+1)
4.06
(-2,+2)
1.92
(-1,+1)
-0.13
flat
third year and the third test is on the third and fourth year. Then we can see
if the stocks are cointegrated on each time interval and if the cointegrating
relation changes. This could be very helpful to determine a good adjustment
parameter κ.
There exists several representations for cointegrated processes, one is the
VAR representation we saw briefly with the Johansen method in section 4.4.
It would be interesting to see if it is possible to use one of the representations
to build a monitoring system; a set of confidence intervals to see if the spread
behaves according to the model and attach certain actions when the intervals
are exceeded. For example, if the first confidence interval is exceeded we stop
with enlarging our positions, if the second interval is exceeded we revert a
part of our positions with a loss and if the third interval is exceeded we close
out our entire positions and stop seeing the stocks as a pair.
151
152
Bibliography
[1] C. ALEXANDER. Market Models. John Wiley & Sons, 2001.
[2] P.J. BROCKWELL and R.A. DAVIS. Introduction to time series and
forecasting. Springer-Verlag, 2002.
[3] P.J. BROCKWELL and R.A. DAVIS. Time Series: Theory and Methods. 1987, Springer-Verlag.
[4] D.A. DICKEY and W.A. FULLER. Distribution of the estimators for
autoregressive time series with a unit root. Journal of the American
Statistical Association, 74:427–431, 1979.
[5] R.F. ENGLE and C.W.J. GRANGER. Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2):251–
276, 1987.
[6] J.D. HAMILTON. Time Series Analysis. Princeton University Press,
1994.
[7] D.J. HIGHAM. An introduction to financial option valuation. Cambridge University Press, 2004.
[8] S. JOHANSEN. Statistical analysis of cointegration vectors. Journal of
Economic Dynamics and Control, 12:231–254, 1988.
[9] S. JOHANSEN. Estimation and hypothesis testing of cointegration vectors in guassian vector autoregressive models. Econometrica, 59:1551–
1580, 1991.
[10] S. JOHANSEN and K. JUSELIUS. Maximum likelihood estimation and
inference of cointegration - with application to the demand for money.
Oxford Bulletin of Economics and Statistics, 52:208, 1990.
153
[11] M. OSTERWALD-LENUM. A note with quantiles of the asymptotic
distribution of the maximum likelihood cointegration rank test statistics.
Oxford Bulletin of Economics and Statistics, 54:462, 1992.
[12] P.C.B. PHILIPS and S.N. DURLAUF. Multiple time series regression
with integrated processes. Review of Economic Studies, 53:473–495,
1986.
[13] P.C.B. PHILIPS and S. OULIARIS. Asymptotic properties of residual
based tests for cointegration. Econometrica, 58(1):165–193, 1990.
[14] J.H. STOCK and M.W. WATSON. Testing for common trends. Journal
of the American Statistical Association, 83(404):1097–1107, 1988.
[15] G. VIDYAMURTHY. Pairs Trading. John Wiley & Sons, 2004.
154