Academia.eduAcademia.edu

Pairs trading

2005, Quantitative Finance

AI-generated Abstract

Pairs trading is a trading strategy that exploits mispricings between correlated financial instruments by initiating positions in pairs of stocks. This approach facilitates statistical arbitrage by going long on one stock and short on another, thus profiting from the convergence of their price movements over time. The technique, which gained popularity in the 1980s thanks to automated trading systems developed by financial experts, offers significant advantages for managing trading risk and improving trade execution efficiency.

Pairs trading Gesina Gorter December 12, 2006 Contents 1 Introduction 1.1 IMC . . . . . . . . 1.2 Pairs trading . . . 1.3 Graduation project 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . 2 Trading strategy 2.1 Introductory example . . . 2.2 Data . . . . . . . . . . . . 2.3 Properties of pairs trading 2.4 Trading strategy . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 6 . . . . . 7 8 14 15 17 26 3 Time series basics 4 Cointegration 4.1 Introducing cointegration 4.2 Stock price model . . . . 4.3 Engle-Granger method . 4.4 Johansen method . . . . 4.5 Alternative method . . . 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Dickey-Fuller tests 5.1 Notions/ facts from probability theory . 5.2 Dickey-Fuller case 1 test . . . . . . . . 5.3 Dickey-Fuller case 2 test . . . . . . . . 5.4 Dickey-Fuller case 3 test . . . . . . . . 5.5 Power of the Dickey-Fuller tests . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 39 48 55 62 . . . . . 65 66 71 76 82 89 5.6 5.7 Augmented Dickey-Fuller test . . . . . . . . . . . . . . . . . . 94 Power of the Augmented Dickey-Fuller case 2 test . . . . . . . 105 6 Engle-Granger method 6.1 Engle-Granger simulation with random walks . . 6.2 Engle-Granger simulation with stock price model 6.3 Engle-Granger with bootstrapping from real data 6.4 Engle-Granger simulation with alternative method 7 Results 7.1 Results 7.2 Results 7.3 Results 7.4 Results trading strategy . . . . . . . . . testing price process I(1) . . . . Engle-Granger cointegration test Johansen cointegration test . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 . 110 . 117 . 120 . 126 . . . . 133 . 133 . 137 . 138 . 140 143 9 Alternatives & recommendations 145 9.1 Alternative trading strategies . . . . . . . . . . . . . . . . . . 145 9.2 Recommendations for further research . . . . . . . . . . . . . 150 Bibliography 152 2 Chapter 1 Introduction 1.1 IMC IMC, International Marketmakers Combination, was founded in 1989. IMC is a diversified financial company. The company started as a market maker on the Amsterdam Options Exchange. Apart from its core business activity trading, it is also active in asset management, brokerage, product development and derivatives consultancy. IMC Trading is IMC’s largest operational unit and has been the core of the company for the past 17 years. IMC Trading trades solely for its own account and benefit. IMC is active in the major markets in Europe and the US and has offices in Amsterdam, Zug (Switzerland), Sydney and Chicago. By trading a large number of different securities in different markets, the company is able to keep its trading risk to a minimum. The dealingroom in Amsterdam is divided in two main sections: Marketmaking and Cash. Marketmaking’s main focus is on option trading, a market maker for a certain option will quote both bid and offer prices on the option and make profits from the bid-ask spread. The Cash or Equity desk is dedicated to the worldwide arbitrage of diverse financial instruments. Arbitrage is a trading strategy that takes advantages of two or more securities being mispriced relative to each other. Pairs trading is one of the many trading strategies with Cash. 3 1.2 Pairs trading History Pairs trading or statistical arbitrage was first developed and put into practice by Nunzio Tartaglia, while working for Morgan Stanley in the 1980s. Tartaglia formed a group of mathematicians, physicists and computer scientists to develop automated trading systems to detect and make use of mispricings in financial markets. One of the computer scientists on Tartaglia’s team was the famous David Shaw. Pairs trading was one of the most profitable strategies that was developed by this team. With members of the team gradually spreading to other firms, so did the knowledge of pairs trading. Vidyamurthy [15] presents a very insightful introduction to pairs trading. Motivation The general ’rule of thumb’ in trading is to sell overvalued securities and buy undervalued ones. It is only possible to determine that a security is overvalued or undervalued if the true value of the security is known. The true value can be very difficult to determine. Pairs trading is about relative pricing, so that the true value of the security is not important. Relative pricing is based on the idea that securities with similar characteristics should be priced more or less the same. When prices of two similar securities are different, one security is overpriced with respect to its ’true value’ or the other one underpriced or both. Pure arbitrage is making risk-less use of mispricing, which is why one could call this a deterministic moneymaking machine. The most pure form of arbitrage is profitably buying and selling the exact same security on different exchanges. For example, one could buy a share in Royal Dutch on the Amsterdam exchange at ¿ 25.75 and sell the same share on the Frankfurt exchange at ¿ 26.00. Because shares in Royal Dutch are inter-exchangeable, such a trade would result in a flat position and thus risk-less money. Although pairs trading is called an arbitrage strategy, it is not risk-free at all. The key to success in pairs trading lies in the identification of pairs and an efficient trading algorithm. Pairs trading is an arbitrage strategy that makes advantage of a mispricing between two securities. It involves putting on positions when there is a certain magnitude of mispricing, buying the lower-priced security and selling the higher-priced. Hence, the portfolio consists of a long position in one security and a short position in the other. The 4 expectation is that the mispricing will correct itself, and when this happens the positions are reversed. The higher the magnitude of mispricing when positions are put on, the higher the profit potential. Example To determine if two securities form a pair is not trivial but there are some securities that are obvious pairs. For example one fundamentally obvious pair is Royal Dutch and Totalfina, both being European oil-producing companies. One can easily argue that the value of both companies is greatly determined by the oil price and hence that movements of the two securities should be closely related to each other. In this example, let’s assume that historically, the value of one share Totalfina is at 8 times a share Royal Dutch. Assume at time t0 it is possible to trade Royal Dutch at ¿ 26.00 and Totalfina at ¿ 215.00. Because 8 times ¿ 26 is ¿ 208, we feel that Totalfina is overpriced, or Royal Dutch is underpriced or both. So we will sell one share in Totalfina and buy 8 shares in Royal Dutch, with the expectation that Totalfina becomes cheaper or Royal Dutch becomes more expensive or both. Assume at t1 the prices are ¿ 26.00 and ¿ 208, we will have made a profit of ¿ 215 minus ¿ 208 is ¿ 7. We would have made the same profit if at t1 the prices are ¿ 26.875 (215 divided by 8) and ¿ 215.00 respectively. In conclusion, this strategy does not say anything about the true value of the stocks but only about relative prices. In this example a predetermined ratio of 8 was used, based on historical data. How to use historical data to determine this ratio will be discussed in paragraph 2.4. 1.3 Graduation project The goal of this project is to apply statistical techniques to find relationships between stocks in all markets that IMC is active in, based solely on the history of the prices of the stocks. The closing prices of these stocks, dating back two years, is the only data that will be used in this analysis. The goal is to find pairs of stocks whose movements are close to each other. IMC is already trading a lot of pairs which were found by fundamental analysis and by applying their trading strategy to historical data (backtesting). No statistical analysis was made. From trading experience, IMC is able to make a distinction between good and bad pairs based on profits. IMC has provided a selection of ten pairs that are different in quality. 5 The main focus of this project will be modeling the relationships between stocks, such that we can identify a good pair based on statistical analysis instead of fundamental analysis or backtesting. The resulting relationships will be put in order of the strength of co-movement and profitability. Although one could study pairs trading between all sorts of financial instruments, such as options, bonds and warrants, this project focuses on trading pairs that consist of two stocks. 1.4 Outline In the next chapter a trading strategy for pairs will be derived, it illustrates how money is made and what properties a good pair has. In chapter 3 some basics of time series analysis is briefly stated, which we will need for the concept of cointegration. Chapter 4 discusses cointegration and two methods for testing, the Engle-Granger and the Johansen method. Also in this chapter a start is made with an alternative method. The Engle-Granger method makes use of an unit root test named Dickey-Fuller, the properties of this unit root test will be derived in chapter 5. The properties of the EngleGranger method are found by simulation in chapter 6. IMC has provided 10 pairs for investigation. The results of the trading strategy and cointegration tests are stated in chapter 7, the pairs are also put in order of profitability and cointegration. After the conclusions in chapter 8, some suggestions for alternative trading strategies are made in chapter 9. In this chapter we will also give some recommendations for further research. 6 Chapter 2 Trading strategy IMC first started to identify pairs of stock based on fundamental analysis, which means they have investigated similarities between companies in products, policies, dependencies of market circumstances, etcetera. When a pair is identified, the question remains how to make money. In this chapter, a trading strategy is explained that is quite similar to the strategy used by IMC. It is not exactly the same strategy because IMC does not want to give away a ready-to-go-and-make-money trading strategy but also because essential parts of their strategy, like the selection of parameters, are based on ’gut-feeling’ and is in the hands of the trader. That makes it at least very difficult to write down a general model of their trading strategy. 7 2.1 Introductory example Assume we have two stocks X and Y that form a pair based on fundamental analysis. Also available are the closing prices of these stocks dating back 2 years, which form times series {xt }Tt=0 and {yt }Tt=0 as shown in figure 2.1. In one year there are approximately 260 trading days, so two years of closing prices form a dataset of approximately 520 observations for each stock. 60 40 ¿ 20 . ......... ... ..... .. .. . . ........ ............. .................... . .. .. . . ... .. . ... ..... . ...... .............. . . ... . . . . . . . . . . . . ......... ............ ................. .... ... ......... .......... .. ... .. ....... .... .. ................. ......... .... ... . ..... .. . ............ ........ ...... .......... . . . . . . . . . . . . . . ....... ......... ... ......................................... ... . . . . . . . . . . . . . . . . . . . . . ........ ............... ....... ....... .................. ...... ................. .. . ........... ........................................ .. ........ ...... . ......... .... ................. ................. .... ..... . . . . . . . . . . . . . . . ............ ... ........ . ............................. 0 0 100 200 300 400 500 t Figure 2.1: Times series xt and yt . The first half of observations are used to determine certain parameters of the trading strategy. The second half are used to backtest the trading strategy based on these parameters, i.e., to test whether the strategy makes money on this pair. The average ratio of Y and X of the first 260 observations, 259 1 X yt r̄ = , 260 t=0 xt in this example is 1.36, which means that 1 stock of Y is approximately 1.36 stock of X during this time period. Although the average ratio is probably not the best estimator, we will use it in the trading strategy to calculate a quantity called spread for each value of t: st = yt − r̄xt . 8 If the price processes of X and Y were perfectly correlated, that is if X and Y changes in the same direction and in the same proportion (for every t > 0, yt = αxt for some α > 0, so the correlation coefficient is +1), the spread is zero for all t and we could not make any money because X nor Y are ever over- or underpriced. However, perfect correlation is hard to find in real life. Indeed, in this example the stocks are not perfectly correlated, as we can see in figure 2.2. 3 ¿ 0 −3 −6 ..... ........ ... ...... .. ............... ....................... ...... .... ......... . . . . . .. . . .. ... ......... ..... .... ...... ................... .... ... .... . .... ........ ... ... .......... ......... . .... ... . .. ....... ...... ... . ..... ..... . . . ..... . . . . .......... .. . . ...... . . .... ...... ..... .......... .. .... .. ........ ......... ......... .... ........ .... ... .... ... ... ......... ... ...... .... .. ....... ....... .. . . .......................... ... .... ...... ................ . ............... ... ... ... ........ . . . . . . ...... ....... ........ .. ... .. ... ... . . . ........... .............. . ....................... ........ . ... . . . . . . . . . . . . . . ........ .... ... .. . .... ... .. .. ... ... ..... . ..... .. .. . . . . . . . . ......... ......... .. ...... ... ... ... .. . ... ..... . . . ... ... ... .... .. .............. ............ ........ .... ... . 0 100 200 300 400 500 t Figure 2.2: Spread st . As mentioned before, we like to buy cheap and sell expensive. If the spread is below zero, stock Y is cheap relative to stock X. The other way around, if the spread is above zero stock Y is expensive relative to stock X (another way to put it is that X is cheap in comparison with Y ). So basically the trading strategy is to buy stock Y and sell stock X at the ratio 1:1.36 if the spread is a certain amount below zero, which we call threshold Γ. When the spread comes back to zero, the position is flattened, which means we sell Y and buy X in the same ratio so there is no position left. In that case, we have made a profit of Γ. An important requirement is that we can sell shares we do not own, also called short selling. In summary, we put on a portfolio, containing one long position and one short, if the spread is Γ or more away from zero. We flatten the portfolio when the spread comes back to zero. Just like the average ratio, Γ is determined by the first half of observations. In this example we determined a Γ of 0.40. The way Γ has been calculated will be discussed in paragraph 2.4. 9 After determination of the parameters, the trading strategy is applied to the second half of observations in the dataset. This results in 13 times making a profit of Γ. In other words, the spread moves 13 times away from 0 with at least Γ and back to 0. Note that this involves 26 trading instances, since putting on and flattening a position requires two. Figure 2.3 and table 2.1 shows all 26 trading instances. The profit, made here, is at least 13Γ: We use closing prices instead of intra-day data, so we do not trade at exactly −Γ, 0 and Γ as we can see in table 2.1. 3 ¿ 0 −3 −6 .... .......... ...... ..... . ....... ..... . . .. ........ ...................... . . . .. ...... .. .. . ... .. ..... ......... ......... ............. ................... ... ... . .. . .. .... .. .... ........... .. ... ........ ... ...... .............. .... . ....... ........ . . ... ... ... .. ... .... ........... .. . .. ... . .. . ................. ... .... ... .. ..... ..... ..... .... ......... . . . . . ... .... . . . . . . . ... ...... ... .. .... ... . ....... .. ... ... .. .. . ... ........ ..... ... ...... ... ....... ... ...... . .. .... .. ... ..... ........ . . ... ...... ................. . . . ...... . ...... . . ............... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ....... .. . . ............ ...... .................. . . . . . ...... ... . . . . . . . . . . . . .... . . .... . ....................................................................................................................................................................................................................................................................................................................................................................................................................... . . . . . . . . ............ . . . ...... . ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................. . ...... . . ........ . ...... . . . . ...... . . . . . . . . . . . . . . . . . . . . ........... .... .... ... ... . . ... ...... ... . .. ......... . ... . . . . ..... .. .... .. ...... ... ........ .. ..... ..... ...... .... . 300 350 400 t Figure 2.3: Spread st and Γ. 10 450 500 trade 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 t 268 282 284 289 293 300 302 310 311 420 423 428 429 432 434 435 437 440 444 445 446 449 450 467 468 519 Table 2.1: Trading instances strategy I. st position (Y ,X) price Y price X 0.69 (-1,+1.36) 31.49 22.63 -0.07 flat 31.37 23.11 -0.47 (+1,-1.36) 30.54 22.79 0.01 flat 31.43 23.10 0.55 (-1,+1.36) 32.05 23.15 -0.16 flat 32.81 24.23 -1.05 (+1,-1.36) 33.57 25.44 0.17 flat 33.56 24.54 0.45 (-1,+1.36) 33.58 24.34 -0.30 flat 40.33 29.85 -1.15 (+1,-1.36) 40.79 30.82 0.08 flat 43.15 31.65 0.65 (-1,+1.36) 43.43 31.44 -0.19 flat 42.60 31.45 -0.47 (+1,-1.36) 42.16 31.33 0.04 flat 42.61 31.28 0.82 (-1,+1.36) 42.79 30.84 -0.25 flat 44.01 32.52 -1.33 (+1,-1.36) 46.53 35.17 0.12 flat 46.17 33.84 1.24 (-1,+1.36) 46.32 33.13 -0.17 flat 45.89 33.85 -0.63 (+1,-1.36) 45.46 33.87 0.05 flat 46.19 33.91 0.48 (-1,+1.36) 47.16 34.31 -0.21 flat 44.95 33.19 total profit 11 profit 0.76 0.48 0.71 1.22 0.75 1.23 0.84 0.51 1.07 1.45 1.41 0.68 0.69 11.80 Rather than closing the position at 0, one could also choose to reverse the position when the spread reaches Γ in the other direction. Assume we have sold 1 Y and bought 1.36 X, because the spread was larger than Γ, we could now wait until the spread reaches −Γ and buy 2 times Y and sell 2 times 1.36 X. As a result, we are now left with a portfolio of long 1 Y and short 1.36 X. This results in one initial trade and 12 trades reversing the position. Note that the profit of reversing the position is 2Γ, so the total profit is at least 12 times 2Γ. These trades are shown in table 2.2. Table 2.2: Trading instances strategy II. trade t st position (Y ,X) price Y price X 1 268 0.69 (-1,+1.36) 31.49 22.63 2 284 -0.47 (+1,-1.36) 30.54 22.79 3 293 0.55 (-1,+1.36) 32.05 23.15 4 302 -1.05 (+1,-1.36) 33.57 25.44 5 311 0.45 (-1,+1.36) 33.58 24.34 6 423 -1.15 (+1,-1.36) 40.79 30.82 7 429 0.65 (-1,+1.36 ) 43.43 31.44 8 434 -0.47 (+1,-1.36) 42.16 31.33 9 437 0.82 (-1,+1.36) 42.79 30.84 10 444 -1.33 (+1,-1.36) 46.53 35.17 11 446 1.24 (-1,+1.36) 46.32 33.13 12 450 -0.63 (+1,-1.36) 45.46 33.87 13 468 0.48 (-1,+1.36) 47.16 34.31 total profit profit 1.16 1.02 1.60 1.50 1.60 1.80 1.12 1.29 2.15 2.57 1.87 1.11 18.79 This change of strategy reduces the number of trading instances on average by a factor of 2. In doing so, we reduce trading costs. More important, if the spread moves around 0 back and forth, strategy II will be more profitable. For example, with the first trade the spread has moved above Γ, so we sell 1 Y and buy 1.36 X. When trading according to strategy I, we will flatten our position at 0 and have zero position while moving from 0 to −Γ and not profit from this movement. When trading according to strategy II, we will still be short Y and long X while the spread moves to −Γ (eg. X becomes more expensive relative to Y ). This is shown is figures 2.4 and 2.5. 12 ... ........ ...... ..... ........ .... .... . . . . . . ... . . ... ............ ...... . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... . ... ... ..... ..... ... ... ... ... ..... ...... . ... ... ...... .. ........... ... ... ....... ... ... ... ...... ....... ... ... . . ........ .... ........ . . . . ...... ..... .. ... ....... . .. ..... ..... . ....... ... ......... ... .... ... ... ... ... .. ...... ... ....... .. . ... ...... . .... . ... ... .... ....... ..... ... .. ... ... . ... ..... ... .......... ............. ..... ... . . . . ........ ............ ........ .... ...... ... .... ... ... ... .. .... ... ... .. ... ..... ..... ... ... . ... ... ..... . ... .. ......... ......... . . ... . ... ......... ... . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . ................ . . ... . . . ..... . . . . . . . . . ... . . . . . . . . ..... ..... ..... . . ........ ........ . .... .... ........ ........ ..... .. ..... ...................... . .... .... ..... .. ❢ ❢ ❢ ❢ ❢ ... ........ ...... ..... ........ .... .... . . . . . . ... . . ... ............ ...... . . . . . . . . . . ..... . . . . . . . . . . . . . . . . . . . . . . ...... .... ... ... ..... ..... ..... ... ... ... ... ..... ...... ...... . ... ... ...... .. ........... ... . ... ... ....... ... ... ... ...... ....... ... ... ... . . ........ .... ........ . .. . . . . ...... ..... ... ....... ... .... . .. ..... ..... . ....... .......... ... .... ... ... ... ... ... .... .. ... .... ... ... . ... . ... . ... . .. ... ....... . . . ... ..... ... .. . . . ... . . ... . .. ... ....... ... . . ... . . ............ . .. . ........ ............ ........ ... .... ...... ... .... ... ... ... ... ...... ... ....... . ... . . . . . . . ... . . . . ... . ... .. .......... . . ... . ..... ......... . ... . . . ... . ... ......... .......... . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . ..................... . ... . . . ..... . . . . . . . . . . . . . . . . .... . ..... ..... ..... . . ........ ........ . .... .... ........ ........ . ..... .. ..... ...................... . .... ..... ..... . ❢ ❢ ❢ Figure 2.5: Trading strategy II. Figure 2.4: Trading strategy I. Unfortunately, it involves a certain opportunity of loss as well. If a pair has a tendency to move between 0 and +Γ or between 0 and −Γ, we might not be reversing our position at all, whereas strategy I will take on and flatten a position time and again and make money. This is shown in figures 2.6 and 2.7. . ... ..... . ...... ................ ..... ... ... ....... ... ... ........ .. ..... ..... ...... ... ...... ..... ...... . . . ... .... .......... ...... ... ............ ...... . . ... ....................... . . . . ..... . . . . . ................ .......... ..... . . . ... . . . ................ . . . ...... . . ..... . . . . . . . . . . ... ... .. ... ...... ..... ... .... .. ........ ...... ... . ... ....... ... ... .......... ..... .................. ............. . .... . ..... ..... ..... ... ... ...... ........ . ... .... .. .. . ... .......... ... ... ........ ... .. ...... .. ... .. . .. . . ..... . . . ... . .. ... . ....... .. ...... ..... ..... ... ...... . . . ............. . . . ........ ... ... .... ... ...... ... .... .. ...... ....... . . . . . . . . . . . . . . . . . ... ..... ........... . .. .... ....... .. . ..... ... ..... . ................. .... ........... ........ ........ .......... ... ...... .... ............... .. ................. ................... ... ... ..... . ... .. .. .... ..... .. .............. . ... .. .. .... ...... ....... .... .... .. ... ... ... . . . ... ... . ... .. ... ...... ........ ... ............ ........ ........ ... ......... ... .... ... ......... .. ..... ... ....... .... . . .... ... ..... ... . ..... ... ...... ... ...... ..... ... ...... ... ..... ...... ......... ..... ... ..... ........ .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. .. . . . . .... ............ ........... ............ ....... ❢ ❢ ❢ ❢ ❢ ❢ ❢ ❢ . ... ..... . ...... ............... ..... .... ... ....... ... ... ....... .. ..... ..... ...... ... ...... ..... ...... . . . ... .... ........ ...... ... ............ ...... . . ... ........................ . . . . . . . . . . . ................ .......... ..... . . . . . . . ................ . . . ...... . . . . . . . . . . . ... . . ... .. ....... ..... ... .... .. ........ ...... ... ... ....... .... . ........ ..... .................. ............ . .... . ..... ..... ... ..... ........ . ... .... .. .. . ... .......... .......... .. ...... .. .. . .. . . . ... ..... . . . . ... . . ....... ...... . ..... ..... . . . . .... ........... . ............. . . . ........ ... .... ...... ... .... ....... . ... ... . . . . . . . . . . . . ... ..... ........... .. .... ....... ..... ...... ..... . . ... .. . . . . . . . . . . ..... ........ .. ... ........ ... ...... . . . .... .... . . . . . . . ...... .......... .. ... .. . ..... ... . . . . . . . . ... ........ ... ... .. ... .. .. .......... . . . . . . . . . . . ... .. .... ....... ... .. ... ............. . ........ ... . .......... ... ...... . . ......... .. . . . . . . . ... .... ..... ...... ... .... ... ... ... ...... ... ...... ... ... ..... ........ ... .... .. ... ...... ........ .... . ........ ......... .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................ .. . . .... . . ............ ........... ............ ....... ❢ ❢ Figure 2.6: Trading strategy I. Figure 2.7: Trading strategy II. In this report we will use a modified version of strategy II. 13 2.2 Data The price data which IMC uses is provided by Bloomberg. Bloomberg is a leading global provider of data, news and analytic tools. Bloomberg provides real-time and archived financial and market data, pricing, trading, news and communications tools in a single, integrated package to corporations, news organizations, financial and legal professionals and individuals around the world. Historical closing prices of stocks are easily extracted from Bloomberg to Excel. One issue has to be considered, namely dividend. Companies normally pay out dividend to its shareholders every year or twice a year, some companies pay out dividend four times a year. The amount of dividend is subtracted from the stock price at the day the dividend is paid out, called going ex-dividend. This usually results in a twist in the price process like in picture 2.8. 60 55 ¿ 50 45 . ........................... ..... ................. ... ... .... ... ............... ... ... ....... ...................... . . . . ........ .......... .... ..... . . ......... .. ................ .............. .......... ..... ...... . ..... ........ ........ ... .......... ................ ..... . ... . . . ... ... .... ... .. ... ...... ... ... ... ...... ... ........ .......... ... 40 t Figure 2.8: Dividend. It is unlikely that different companies go ex-dividend at the same day. So the closing prices of stocks have to be corrected for dividend, to make a good comparison with other stocks. In this report we will assume that the dividend is re-invested in the stock. So it is not just adding the dividend up with the closing price, it is a growing amount proportionally to the growth of the stock price. 14 Example Consider the following the ex-dividend dates and amounts of a certain stock. date amount 04/28/2003 1.20 04/30/2004 1.40 04/29/2005 1.70 Suppose we want to use data of this stock starting from 03/01/2004. So we extract from Bloomberg the closing prices from this date forward, actually we start at 03/02/2004 because the first of March was a Sunday. From 03/02/2004 until 04/29/2004 we use exactly these prices, the first ex-dividend is not used. On 04/30/20004 the stock is ex-dividend for the amount of 1.40. We calculate what percentage this is of the stock price and from this date forward we keep multiplying the closing prices from Bloomberg with this percentage until the next ex-dividend date. Then we calculate the percentage of the dividend amount and adding it up to the percentage before, this is shown in table 2.3. 2.3 Properties of pairs trading Pairs trading is almost cash neutral, we do not have to invest a lot of money. We use the earnings of short selling one stock to purchase the other stock. This usually does not exactly sum up to zero, to be precise it sums up to ±Γ, a small positive or negative amount compared to the stock prices. An other aspect that makes pairs trading not entirely cash neutral is short selling. Short selling is selling something we do not have. The exchange on which we trade will want to be sure that we will not go bankrupt. We need to put money, called margin, aside to secure the exchange there are no risks involved with short selling. Normally, this margin is a percentage of the value of the short sale, typically between 5 and 50, depending on the credibility of the short seller. IMC’s costs for short selling are relatively low, so pairs trading is almost cash neutral. 15 Table 2.3: Calculation of closing prices corrected for dividend. date Bloomberg 03/02/2004 44.00 03/03/2004 43.37 .. .. . . 04/29/2004 43.85 04/30/2004 43.04 05/01/2004 42.90 .. .. . . 4/28/2005 4/29/2005 4/30/2005 .. . 51.44 50.11 50.64 .. . dividend .. . factor 1 1 .. . our prices 44.00 43.37 .. . 1.40 .. . 1 1+1.40/43.04=1.03 1.03 .. . 43.85 1.03*43.04=44.33 1.03*42.90=44.19 .. . 1.70 .. . 1.03 1.03*51.44=52.98 1.03+1.70/50.11=1.07 1.07*50.11=53.62 1.07 1.07*50.64=54.18 .. .. . . Pairs trading is also market neutral: if the overall market goes up 10% it has no consequences for the strategy and profits of pairs trading. The 10% loss in the short stock is compensated by a 10% gain in the long stock, and the other way around if the overall market goes down. We do not have a preference for up or down movements, we only look at relative pricing. How to make money with pairs trading was explained in the example in paragraph 2.1. The amount of money made by trading a pair is a measure for the quality of a pair. Obviously, more money is better! We make profits if the spread oscillates around zero often hitting Γ and −Γ. An important issue for the traders is that the spread should not be away from zero for a long time. Traders are humans and they tend to get a bit nervous if they have a big position for a long time. There is a chance that the spread will never return to zero and in that case it costs money to flatten the position. Example Consider figure 2.9 of the spread of pair X, Y . We put on a position the first time the spread hits −Γ, because there Y is cheap relative to X in our opinion. We reverse our position at +Γ and again at −Γ, making a profit of at least 4Γ. Then we like the spread to go to +Γ, but the spread is going further and further away from zero not knowing if it will ever come 16 back. At this time, our portfolio is worth less than when we put it on: the value of the long position in Y becomes less because Y is getting cheaper (relative to X) and the value of the short position in X is getting less because X is more expensive now (relative to Y ). So, if we want to flatten our portfolio we have to sell Y for less than we bought it and/or buy X for more money than we sold it. ❢ . . . . . . . . . . . . . . . . . . . . . . . . . . ...................... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . . ....... ............. ............ ........... ....................... ... ................... ....... ......... ... ....... ... . . . . . . . . . . . . . . . . . . . .. ........ ......... .... .... . .. . .... ... ...... ..... ............ ...... ...... ... .. .. . . .... .... . . . .................... . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ . . . . . . . . . . . . . . . . . . ... ...... ... ... .. . ......... . ........... ...... ... ... ... ... ... ... ...... ... ....... ................. . .. ................ . ... ............... . .... . ❢ ❢ Figure 2.9: Spread st walks away. In conclusion, a good pair has a spread that is rapidly mean-reverting and the price processes of the stocks in the pair are tied together, they can not get far away from each other. 2.4 Trading strategy In this section we describe how the parameters in the introductory example (section 2.1) are determined. Then a few adjustments are made to strategy II, to get the final trading strategy that resembles the strategy from IMC. Finally, we give the assumptions made for applying this strategy. Parameters Assume we have two datasets of closing prices of two different stocks X and Y for a certain period, roughly two years, which are corrected for dividend: xt and yt , for t = 0, ..., T. 17 The first half, t = 0, ..., ⌊T /2⌋, is considered as history and is used to determine the parameters ratio r̄ and threshold Γ. The second half, t = ⌊T /2⌋ + 1, ..., T , is considered as the future and is used to determine the profit or loss that would be made trading the pair {X, Y } with these parameters. The ratio r̄ is the average ratio of Y and X of the first half of observations: ⌊T /2⌋ X xt 1 r̄ = . ⌊T /2⌋ + 1 t=0 yt The threshold Γ is determined quite easily, we just try a few on the ’history’ and take the one that gives the best profit based on the ’history’. We calculate the maximum of the absolute spread of the first half of observations, denoted as m: m = max (|yt − r̄xt |, t = 0, ..., ⌊T /2⌋). t The values of Γ that we are going to try are percentages of m. Table 2.4 shows the percentages and the outcome for the introductory example of paragraph 2.1, where m = 2.01. Because of rounding to two digits it looks like there are several values of Γ which give the same largest profit, but Γ = 0.40 gives the largest profit. The profit is calculated by multiplying number of trades minus one with two times Γ, except when no trades were made then the profit is just zero. It is the minimal profit if you always trade one spread, in this example one Y and 1.36 X. The first trading instance is to put on a position for the first time, denoted by t1 , then we do not make a profit yet: t1 = min (t, such that |st | ≥ Γ). The succeeding trading moments are: If stn ≥ Γ: If stn ≤ −Γ: tn+1 = min (t, such that t > tn , st ≤ −Γ). tn+1 = min (t, such that t > tn , st ≥ Γ). 18 Table 2.4: Profits with different Γ. percentage 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Γ trades 0.10 15 0.20 9 0.30 9 0.40 7 0.50 3 0.60 3 0.70 3 0.80 3 0.90 3 1.00 3 1.10 3 1.20 3 1.30 2 1.40 2 1.51 2 1.61 2 1.71 1 1.81 0 profit 2.81 3.21 4.82 4.82 2.01 2.41 2.81 3.21 3.61 4.02 4.42 4.82 2.61 2.81 3.01 3.21 0 0 To determine Γ we simply take the one that has the largest profit based on the history, but in practice we do not take Γ larger than 0.5m. This profit is a gross profit, no transaction costs are accounted. We neglected the transaction costs because it turned out they hardly had any influence on the value of Γ. This is because IMC does not trade one spread, which in this example was 1 Y and 1.36 X, but they trade a large number of Y and X, for example 1,000 Y and 1,360 X. The costs that IMC makes consists of two parts, a fixed amount a plus amount b times the number of traded stocks. The costs of trading 1,000 Y and 1,360 X would be 2a + 2, 360b. We always trade the same amount, no matter the value of Γ, so the costs per trade for all Γ are exactly the same. So the more trades the more costs, but the costs are really small compared to the profit. When the profits for the different thresholds are not too close to each other, the Γ when considering 19 the net profits is the same Γ when neglecting the costs. Unfortunately of all the pairs considered in this report, the pair from table 2.4 is the only one where accounting transaction costs would have made a difference. There are three thresholds, 0.30, 0.40 and 1.20, which result in almost the same profits. Therefor accounting the transaction costs would resulted in the threshold with the lowest number of trades, Γ = 1.20. In the remainder of this report, we will neglect transaction costs. Modified trading strategy There are pairs of stock that work quite well for a certain time but then the spread walks away from zero and starts to oscillate around a level different from zero. We can see an example in figure 2.10. If we do not do anything, we are probably going to have a position for a long time which is not desirable as explained in paragraph 2.3. The figure shows us that the relation between the stocks in the pair has changed, the ratio r̄, determined by the past, is not good anymore. It would be a waste to lose money on these kind of pairs by closing the position or to exclude them from trading. A better way is to replace the average ratio r̄ with some kind of moving average ratio. 5 0 −5 .. .. .............. .. ....... .... . . .. .. ..... .. .. ... ..... ... ... ............... ...... .. .............. ........................ ...... ....... ... ... ..... .... .... .... ................................... ........... .... ........... ........................ ..... .... .... .......... ........... ..... ..... ........ ...... ..................... ..... ... ..... ......... ... .... ... ....... ... .. .. ........ .. .. .. ........... ........ ........ ... ... ....... .............. .. . ..... .... .. ......... .... ..... ......... ........... ....... ...... .... . .... . .. .. .... ........ ...... ...... ............... ... ........ ... ... ..... . . . ..... . . . . . . . . . . . ........ . . ....... . ............ ........ . . ...... . . . . . . . . ...... . . . . . . . . . ..... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................... ........... . .. ........ .. .. ........ ............. .. .......................................................................................................................................................................................................................................................................................................................................................................... .. .. . . ... ....... ..... ......... .. ... .. . ....... . . . . . . . . . . . . . . . . . . . . ............. . ........ . . ...... ......... ............... ...... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. ..... ... ..... .. ..... .. ........... ........ ... .. .... ........ . .. ... ... ... ... ... .. ... ... ......... ....... .... . 0 100 200 300 400 500 Figure 2.10: Spread oscillates around a new level. Assume we have a dataset of closing prices, the first half is used in the exact same way as described before. So we have the average ratio r̄ and threshold Γ. The backtest on the second half of the data set is slightly different because we use a moving average ratio r̃t , instead of r̄, to calculate the spread. 20 The moving average ratio we use, is: r̃t = (1 − κ) r̃t−1 + κ rt , t = ⌊T /2⌋ + 1, ..., T, with r̃⌊T /2⌋ = r̄ and where rt is the actual ratio: rt = yt , xt t = 0, ..., T. The parameter κ is a percentage between 0 and 10% and is determined very simple with the first half of the data set. We count how many trades were made in the first half and use table 2.5 to find κ. Table 2.5: Determining κ. # trades >15 10-15 8,9 7 6 5 κ 0 1 2 3 4 5 # trades 4 3 2 1 0 κ 6 7 8 9 10 If there were a lot of trades in the first half of observations we do not expect to need a moving average ratio, the table motivates this. The use of a moving average ratio and this way of determining its value, has some disadvantages which will be discussed later on. So the first half of the data set determines three parameters: Average ratio r̄, threshold Γ and adjustment parameter κ. In the second half of the data set, the new spread is calculated as: s̃κ, t = yt − r̃t xt . Trading the pair goes in the same way as described before, the difference is the position in X is not equal to r̄ anymore but it is equal to r̃t . The following example will make this more clear. 21 We take the pair from figure 2.10, available are 520 closing prices of the two stocks. The first half of observations gives us three parameters: r̄ = 1.86, Γ = 0.77, κ = 5%. First we look at what the strategy without the modification does on the second half of observations, table 2.6 shows the trading instances. Two trades are made with a total profit of ¿ 1.88. The strategy with the modification works better, 7 trades with a total profit of ¿ 5.21. Table 2.7 shows all trading instances. The table also shows that the position in stock X is not longer constant in absolute sense. For example, with trade number 1 we put on a position of +1 Y and -1.85 X because r̃t at this time is 1.85. With the second trade we flatten this position and put on a position the other way around, but now r̃t is 1.81 so in total we sell 2 shares of stock Y and buy 1.85+1.81=3.66 shares of stock X. The profit of these two trades is calculated with the position that is flattened, i.e., (51.81-48.70)+1.85*(26.80-28.06)=0.77. Table 2.6: Trading instances strategy II. trade t st position (Y,X) price y price X 1 263 -1.10 (+1,-1.86) 48.70 26.80 2 285 0.78 (-1,+1.86) 52.33 27.74 total profit profit 1.88 1.88 Table 2.7 also shows that not all profits per trade are larger than Γ, one trade gave a relatively large loss. This happens because the ratio when the position was put on, differs a lot from the ratio when this position is reversed. The ratios differ a lot because the actual ratio rt is moving a lot. We can see all the ratios in figure 2.11. The solid line is the actual ratio rt , the dashed line is the moving average ratio r̃t and the straight dotted line is the average ratio r̄. 22 trade 1 2 3 4 5 6 7 Table 2.7: Trading instances modified strategy. t seκ, t position (Y,X) price Y price X ret profit 263 -0.99 (+1,-1.85) 48.70 26.80 1.85 281 1.07 (-1,+1.81) 51.81 28.06 1.81 0.77 358 -0.82 (+1,-1.97) 51.52 26.56 1.97 -2.43 392 0.93 (-1,+1.96) 56.38 28.23 1.96 1.57 407 -0.94 (+1,-1.98) 55.45 28.52 1.98 1.52 459 0.97 (-1,+1.98) 55.27 27.47 1.98 1.92 476 -1.31 (+1,-1.99) 57.20 29.38 1.99 1.86 total profit 5.21 2.1 2.0 1.9 1.8 1.7 ............. . .. ...... ........... ... . ..... .... .. ..... . . .. .. ... ................... ......... .... ....... ... .. ....... ... ............. .. ...... ... .. ... .. ...... ... .. .. .......... ... .... .... ....... .... ............ .... ....... .... .... .. ........ ................................................................ ............. ... ........ .......... . . . . . .. .......... ... . . ..... .. ..... ...... ......... ....... ........ ... . .......... ........... ... ....... ... ... .. . .. ........... ........................... ....... ............................................... ........ ... .... ... ... . ........... ... .... .. ..... .... ..... . .. ......... ........... ......... .... . ..... .. .............. .. ...... . ... ......... ......... . ......... . ..... . . ............ .......... ... .. ..... ................. ... ... . ... ... . .... ... .... .......... . ... . ....... ........... . . .. . ....... . . . . ...... . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... .. .... ... .. .. .. .... .... ... .... ..... .......... .... ... ... ... ... ... ... ... ......... ... .. ..... ... 300 350 400 450 Figure 2.11: Ratios rt , ret and r̄. 23 500 Figures 2.12 and 2.13 show the spread calculated with the average ratio r̄ and calculated with the moving average ratio r̃t with κ = 5% respectively. 4 2 0 −2 −4 . ....... ...... .. ................. .. ..... ... .......... . .. .... .... ....... ........ ...... ... ... ........ ................ ..................... ........... ........ ......... ..... .... .... ... ..... ....... ........ ... .... . . . .. .. . . ... ....... ...... .. ... ... ........... ...... ..... ... .. .. .. . ............... ... .. ... ....... ... .... ... .. .. ............. ....... ......... .. .. .. .............. ... .. ... ... ... ......... ..... ........ ....... ....... ........ ................................. ......... ..... ....... ....... ... ......... .... . .. .... ... ...... ........... .. . ..... ........... ... .... ..... ...... ........ ........ .. .... ......... ...... .. .. ..... ..... .... ....... .......... .......... .... . ...... ......... .......... ... .... ...... ..... ....... ......... .. ..... ..... ............... . .. ... ... ... .. ... .... ... . . . . . . . . . .... . . ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. ... ............................................................................................................................................................................................ ... . .. ...... . ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... .. ...... ... ... ... ... .. ... ... ... ... ... .. ... ... ... ... ....... .. ........ ..... ..... .. 4 2 0 −2 ..... .................. .............. ........... .... .... .. ... ..... ... ...... .. .. ..... .. .. .. .... ... ....... . . . .. .. .... .... ....... .. ..... .. .... .. ........... .... ........ ... .. ... .. . . . . ..... . . ....... ................... ..... .... .............. ......... . . . . . ............. .... . .. . . . . ..... . ...................... . . . . . . . . . .. .. . .... .... ... ......... .... .... .... ..... .. . .... . ............... .. ...... . ...... .... ..... .................. ......... ........ ........ ..... ... ... .. .. ..... ...... ..... ....... .......... .. .. .. ........ ........ .. .. .. .... . . . . . .................................................................................................................................................................................................................................. . .. ... ... ..... ........ .. .... ... .. ..... ........ ..... .. ... . .. ....... .... .......... . .... ..... .... ........ ... .. ... ............. .. .. ......... . ...... . . . . . . . . . . . . . ......... ............ . . . . . ........ . . ... . .. . ..... . . . ............. ...... . ................ ...... .. ... ...... ...... .... . . . .... .. . ............... ... ... ... . ... ... ... .. ....... ............ ....... −4 300 350 400 450 500 300 350 400 450 500 Figure 2.12: Spread st . Figure 2.13: Spread s̃κ, t , κ = 5%. From figure 2.10 it is clear that the average ratio r̄ does not fit anymore, around t = 300 the stocks in the pair get another relation. Replacing the fixed average ratio r̄ by an moving average ratio r̃t resolves this. As we saw in the example we can lose money if the moving average ratio, used to calculate the spread, differs a lot between trades. If there is some fundamental change, such a trade will happen once or twice and the loss that is made will be compensated by good trades from that moment on. The advantage of the modified trading strategy is when the relation between stocks in a pair changes in some fundamental way as in the example above i.e., the spread is oscillating around a new level, we are still able to trade the pair with a profit instead of making a loss by closing the position and exclude the pair from trading. When there is no such fundamental change but we use the modified strategy, with κ > 0, it is possible we throw away money with each trade. This happens if the moving average ratio differs a lot between each two succeeding trades. We consider an example, suppose we have 520 observations. 24 The first half is used to determine the three parameters r̄, Γ and κ: r̄ = 1.00, Γ = 0.62, κ = 7. Figures 2.14 and 2.15 show the spread for the second half of observations calculated with the average ratio r̄, which is the same as κ = 0, and κ = 7 respectively. 2 0 −2 . .. . ........ .. .......... .. ..................... . .. ....................... .................................. .......... .......................................... . . . . . . . ..... ......... . ...................................... . ....... ... ... ............ ......... . ......................... ... .................... ... .... .. ... ... ..... ... ........ .. .... .... ... ..................... ... ..................... .... ......... ... . ..... ........................ . . . . . . ... .. . . ... ..... . . .................. .. ..................... . . . . .............................. . . . . . . . . . .... . . . . . . . . . . . . . ..................... . . . . . . . . . ..... . . . . . . . . . . . . . . ... ............. .. ........... ... ... . . . . . .... .... .. .. ... . . .............................. . . .......................................................................................................................................................................................................................................................... ... ................... ... . ... . . . ............ ... ... ........... .... .. ........ ... ...................... . . . . . . . . . . . . . . . . . ...... . . ... . .......................... . . . . . . . . . . . . . . . . .... . . . . . ............................. . ............. ... ... .. ............. ..... ...................... .. . . . . ... ... . . .. .... ......... ... . .. ................ ..... ... . .............................................. ..... .... .......... .... ... .... ..... ............. .... ...... ..................... ....... .............. .................... ......................... ...... .. ...... .. ..... ... ... .. 300 400 2 0 −2 500 Figure 2.14: Spread s̃κ, t , κ = 0. .. .. . ..... ... .... ... .. .. . . .. . .... ............ .......... ...... .... ..... . .. .. . .. ...... . . .............................................. ............................................... . . . . . . . . . . . ......................................................................................................... .... ...................... .. . . . . . . . ... . . .......... ... . . ... ..... ................ ...... ........... .... ..................................... ........ ................... .................................................................. ...................... ............ .. ... ............... ........ ................. ... .......................... ...... . ...... .... ...... ................... ........... ............................................................................................................................................................................................................................................................................................................................................. ........ .. . . ..... .. ...... ..... . ... . . ............................ . . ....... . . . . .... ... ... ....... ........... ... . . . . . . . . . . . . . . ....... . . . ................. .. . .. . . . . . . . . . . . . . . . . . ..... ... . . . ....... . . . . . . . . . ... ..... ...... . ... . . . . . ... ..... ... .. .... ........... .... ......... .... ....... .... ......... ... ... ... ... ... ..... ... ..... ... ..... ... ...... ....... ... .... ............ ...... .......... ...... . ...... ..... .. . .... ... 300 400 500 Figure 2.15: Spread s̃κ, t , κ = 7. Trading the spread with κ = 0 results in four trades with a total profit of ¿ 5.69. However trading the spread with κ = 7 results in five trades with a total loss of ¿ 4.03, table 2.8 shows the corresponding trading instances. In this example there is a loss with every trade if we use κ = 7, but we make a substantial profit when we use κ = 0. This is a bit of an extreme example but what is often seen is that when there is no fundamental change between the stocks in the pair, the profit is less when using the modified strategy (κ > 0) then the original strategy (κ = 0). This is a big disadvantage of the modified strategy, it is at least very difficult to determine if the relation between the stocks is fundamentally changing. In spite of this disadvantage we use the modified strategy because we do not want to exclude pairs like in figure 2.10, we are willing to give up some profit on pairs who do not change much. 25 trade 1 2 3 4 5 Table 2.8: Trading instances modified strategy. t seκ, t position (Y,X) price Y price X ret profit 270 0.71 (-1,+1.01) 10.72 9.94 1.01 328 -0.63 (+1,-1.18) 10.80 9.72 1.18 -0.30 378 0.72 (-1,+0.92) 9.93 10.50 0.92 -1.25 449 -0.87 (+1,-1.18) 11.59 10.55 1.18 -1.21 487 0.63 (-1,+0.90) 9.54 9.88 0.90 -1.27 total profit -4.03 Assumptions We apply the trading strategy to historical closing data to see if trading a pair of two stocks would have been profitable. This assumes that we could have traded on the closing price and that there was no bidask spread. It also assumes we could have traded every amount we wanted, including fractions. If it is decided to start trading a specific pair, it is going to be traded intra-day, so it would probably be better to apply the trading strategy to intra-day data but that kind of data is difficult to get and is difficult to handle. With real life trading the number of stocks have to be integers. The assumption that we are allowed to trade fractions is not that bad because when trading a pair it is about large quantities so we can round the number of stocks to an integer without completely messing up the ratios. 2.5 Conclusion In this chapter we have derived a trading strategy that resembles the strategy IMC uses. It is not necessary anymore to do a fundamental analysis to find out if a pair of two stocks is profitable to trade as a pair. We can apply the trading strategy on historical data and see if we would have made a profit if we actually traded the pair. In this way IMC identified a lot of pairs. We would like to see if we can identify pairs in a more statistical setting, again using historical data of two stocks, not to estimate profits, but to see if the two time series exhibit behavior that could make them a good pair. We will examine the concept of cointegration, but first we need some time series basics. 26 Chapter 3 Time series basics This chapter discusses briefly some basics of time series which we will need for later purposes. More information can be found in [2] and [3]. White noise A basic stochastic time series {zt } is independent white noise, if zt is an independent and identically distributed (i.i.d.) variable with mean 0 and variance σ 2 for all t, notation zt ∼ i.i.d(0, σ 2 ). A special case is Gaussian white noise, where each ut is independent and has a normal distribution N (0, σ 2 ). Stationarity A time series {zt } is covariance-stationary or weakly stationary if neither the expectation nor the autocovariances depend on time t: E(zt ) = µ, E(zt − µ)(zt−j − µ) = γj , for all t and j. Notice that if a process is covariance-stationary, the variance of zt is constant and the covariance between zt and zt−j depends only on lag j. For example, a white noise process is covariance-stationary. Covariancestationary is shortened by stationary in the remaining of this report. A stationary process exhibits mean reverting behavior, the process tends to remain near or tends to return over time to the mean value. 27 MA(q) A q-th order moving average process, denoted MA(q), is characterized by: zt = µ + ut + θ1 ut−1 + θ2 ut−2 + · · · + θq ut−q , (3.1) where {ut } is white noise (∼ i.i.d(0, σ 2 )), µ and (θ1 , θ2 , . . . , θq ) are constants. The expectation, variance and autocovariances of zt are given by: E(zt ) = µ, γ0 = (1 + θ12 + θ22 + · · · + θq2 )σ 2 , ½ (θj + θj+1 θ1 + θj+1 θ2 + · · · + θq θq−j )σ 2 if j = 1, . . . , q, γj = 0 if j > q. So an MA(q) process is stationary. AR(1) A first-order autoregressive process, denoted AR(1), satisfies the following difference equation: zt = c + φzt−1 + ut , (3.2) where {ut } is independent white noise (∼ i.i.d(0, σ 2 )). If |φ| ≥ 1 , the consequences of the u’s for z accumulate rather than die out over time. Perhaps it is not surprising that when |φ| ≥ 1, there does not exist a causal stationary process for zt with finite variance that satisfies (3.2). If |φ| > 1 the process zt can be written in terms of innovation in the future instead of innovations in the past, that is what is meant by ’there does not exist a causal stationary process’. If φ = 1 and c = 0 the process is called a random walk. When |φ| < 1, the AR(1) model defines a stationary process and has an MA(∞) representation: zt = c/(1 − φ) + ut + φut−1 + φ2 ut−2 + φ3 ut−3 + · · · . The expectation, variance and autocovariances of zt are given by: µ = c/(1 − φ), γ0 = σ 2 /(1 − φ2 ), γj = (σ 2 φj /(1 − φ2 )), 28 for j = 1, 2, . . . AR(p) A p-th order autoregressive process, denoted AR(p), satisfies: zt = c + φ1 zt−1 + φ2 zt−2 + · · · + φp zt−p + ut . (3.3) Suppose that the roots of 1 − φ1 x − φ2 x2 − · · · − φp xp = 0, (3.4) all lie outside the unit circle in the complex plain. This is the generalization of the stationarity condition |φ| < 1 for the AR(1) model. Then the expectation, variance and autocovariances of zt are given by: µ = c/(1 − φ1 − φ2 − · · · − φp ), γ0 = φ1 γ1 + φ2 γ2 + · · · + φp γp + σ 2 , γj = φ1 γj−1 + φ2 γj−2 + · · · + φp γj−p , for j = 1, 2, . . . If equation (3.4) has a root that is on the unit circle, we call that a unit root and the process that generates zt a unit root process. Information Criteria In chapter 4 we want to fit an AR(p) model on a given dataset, with p unknown. An information criterion is designed to maximize the model fit while minimizing the number of parameters, in our case minimizing p. The criterion assigns a value to each model depending on the model fit and the number of parameters in the model. The better the model fit is, the smaller the value will be. The more parameters are used, the larger the value will be. The model with the smallest value is most suitable for the data according to that criterion. There are several information criteria, they differ in the penalty they give to each extra parameter and therefore have different properties. The Akaike information criterion (AIC) formula is: AIC(k) = −2 log L + 2k, (3.5) where k is the number of parameters, and L is the likelihood function. The likelihood function assumes that the innovations ut are N (0, σ 2 ). 29 The log likelihood for an AR(k) model is given by: T T 1 log(2π) − log(σ 2 ) + log |Vk−1 | 2 2 2 1 (zk − µk )′ Vk−1 (zk − µk ) − 2σ 2 T X (zt − c − φ1 zt−1 − · · · − φk zt−k )2 − 2σ 2 t=k+1 log L = − where σ 2 Vk denotes the covariance matrix of (z1 , z2 , . . . , zk ):  E(z1 − µ)2 E(z1 − µ)(z2 − µ) · · · E(z1 − µ)(zk − µ)  E(z2 − µ)(z1 − µ) E(z2 − µ)2 · · · E(z2 − µ)(zk − µ)  2 σ Vk =  .. .. ..  . . ··· . E(zk − µ)(z1 − µ) E(zp − µ)(z2 − µ) · · · E(zk − µ)2 and µk denotes a (k × 1) vector with each element given by      µ = c/(1 − φ1 − φ2 − · · · − φk ), and zk denotes the first k observations in the sample, (z1 , z2 , . . . , zk ) and T denotes the sample size. The first term in (3.5) measures the model fit, the second term gives a penalty to each parameter. The Akaike information criterion is calculated for each model AR(k), with k = 1, 2, . . . , K. The k with the smallest value AIC(k), is the estimate for the model order. Two other information criteria are the Schwarz-Baysian and the HannanQuint information criteria. The Schwarz-Baysian information criterion (BIC) formula is: BIC(k) = −2 log L + k log(T ), where T denotes the number of observations in the data set. The Hannan-Quint information criterion (HIC) formula is: HIC(k) = −2 log L + 2k log(log(T )). 30 First difference operator The first difference operator ∆ is defined by: ∆zt = zt − zt−1 . I(d) A time series is integrated of order d, written as yt ∼ I(d), if the series is non-stationary but it becomes stationary after differencing a minimum of d times. An already weakly stationary process is denoted as I(0). If a time series generated by an AR(p) process is integrated of order d, than its autoregressive polynomial (equation (3.4)) has d roots on the unit circle. Unit root test Statistical tests of the null hypothesis that a time series is non-stationary against the alternative that it is stationary are called unit root tests. In this paper we consider the Dickey-Fuller test (DF) and the Augmented Dickey-Fuller test (ADF). Dickey-Fuller test The Dickey-Fuller test tests whether a time series is stationary or not when the series is assumed to follow an AR(1) model. It is named after the statisticians D.A. Dickey and W.A. Fuller, who developed the test in [4]. The assumption of the DF test is that the time series zt follows an AR(1) model: zt = c + ρzt−1 + ut , (3.6) with ρ ≥ 0. If ρ = 1, the series zt is non-stationary. If ρ < 1, the series zt is stationary. The null hypothesis is that zt is non-stationary, more specific zt is integrated of order 1, against the alternative zt is stationary: H0 : zt ∼ I(1) against H1 : zt ∼ I(0), which can be restated in terms of the parameters: H0 : ρ = 1 against H1 : ρ < 1, under the assumption that zt follows an AR(1) model. 31 The test statistic of the DF test S is the t ratio: S= ρ̂ − 1 , σ̂ρ̂ where ρ̂ denotes the OLS estimate of ρ and σ̂ρ̂ denotes the standard error for the estimated coefficient. The t ratio is commonly used to test whether the coefficient ρ is equal to ρ0 when the time series is stationary, i.e. ρ < 1. Then the test statistic ρ̂ − ρ0 , σ̂ρ̂ has a t-distribution. But we do not assume that the time series is stationary, because the null hypothesis is that ρ = 1. So, the test statistic S does not need to have a t-distribution. We need to distinguish several cases to derive the distribution of the DF test statistic. Case 1 : The true process of zt is a random walk, i.e. zt = zt−1 + ut , and we estimate the model zt = ρzt−1 +ut . Notice that we only estimate ρ and not a constant c. Case 2 : The true process of zt is again a random walk and we estimate the model zt = c + ρzt−1 + ut . Notice that now we do estimate a constant but it is not present in the true process. Case 3 : The true process of zt is a random walk, but now with drift, i.e. zt = c + zt−1 + ut , where the true value of c is not zero. We estimate the model zt = c + ρzt−1 + ut . Although the differences between the three cases seem small, the effect on the asymptotic distributions of the test statistic are large, as we will see in chapter 5. 32 Augmented Dickey-Fuller test The Augmented Dickey-Fuller test tests whether a time series is stationary or not when the time series follows an AR(p) model. One of the assumptions of the Augmented Dickey-Fuller test is that the time series zt follows an AR(p) model: zt = c + φ1 zt−1 + · · · + φp zt−p + ut . (3.7) Like the regular Dickey-Fuller test, we test: H0 : zt ∼ I(1) against H1 : zt ∼ I(0). The null hypothesis is that the autoregressive polynomial 1 − φ1 x − φ2 x2 − · · · − φp xp = 0, has exactly one unit root and all other roots are outside the unit circle. Then the unit root cannot be a complex number, because the autoregressive polynomial is a polynomial with real coefficients and if x = a + bi is a unit root than so is its complex conjugate x̄ = a − bi. This contradicts the null hypothesis that there is exactly one unit root. Two possibilities remain, the unit root is -1 or 1. The first possibility gives an alternating series, which is not realistic for modeling the spread (this becomes more clear in the chapter of cointegration). Thus the single unit root should be equal to 1, which gives us 1 − φ1 − φ2 − · · · − φp = 0. (3.8) The AR(p) model (3.7) can be written as: zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut , (3.9) with ρ = φ1 + · · · + φp , βi = −(φi+1 + · · · + φp ), for i = 1, . . . , p − 1. The advantage of writing (3.7) in the equivalent form (3.9) is that under the null hypothesis only one of the regressors, namely zt−1 , is I(1), whereas all of the other regressors (∆zt−1 , ∆zt−2 , . . . , ∆zt−p+1 ) are stationary. Notice 33 that (3.8) implies that coefficient ρ is equal to 1. This leads to the same hypotheses as with the regular Dickey-Fuller test: H0 : ρ = 1 against H1 : ρ < 1, and the same test statistic: S= ρ̂ − 1 . σ̂ρ̂ To derive the distribution of the ADF test statistic we need to distinguish the same three cases as above, but now in the appropriate AR(p) form. As we will see in chapter 5, the distributions are the same as DF distributions without any corrections for the fact that lagged values of ∆y are included in the regression. One last note: If the null hypothesis that zt is non-stationary cannot be rejected, it does not necessarily mean that zt is generated by a I(1) process. It may be non-stationary because it is generated by a I(2) process or by an integrated process of an even higher order. The next step could be to repeat the procedure but this time using ∆yt instead of yt . That is, to test H0 : ∆yt ∼ I(1) against H1 : ∆yt ∼ I(0) which is equivalent to H0 : yt ∼ I(2) against H1 : yt ∼ I(1), and so on. 34 Chapter 4 Cointegration Empirical research in financial economics is largely based on time series. Ever since Trygve Haavelmos work it has been standard to view economic and financial time series as realizations of stochastic processes. This approach allows the model builder to use statistical inference in constructing and testing equations that characterize relationships between economic and financial variables. The Nobel Prize of 2003 for economics has rewarded two contributions, the ARCH model and cointegration from Robert Engle and Clive Granger. This chapter discusses the concept of cointegration and two methods for testing for cointegration, the Engle-Granger and the Johansen method. Other methods are described in, for example, [13] and [14]. In the last section of this chapter a start is made with an alternative method. In this report this alternative method is used for generating cointegrated data but not for testing for cointegration, although this is possible. 4.1 Introducing cointegration An (n × 1) vector time series yt is said to be cointegrated if each of the series taken individually is I(1), integrated of order one, while some linear combination of the series a′ yt is stationary for some nonzero (n × 1) vector a, named the cointegrating vector. 35 Cointegration means that although many developments can cause permanent changes in the individual elements of yt , there is some long-run equilibrium relation tying the individual components together, represented by the linear combination a′ yt . A simple example of a cointegrated vector process with n = 2, which was taken from [1], is: xt = wt + ǫx, t , yt = wt + ǫy, t , wt = wt−1 + ǫt , where error processes ǫx,t , ǫy,t and ǫt are independent white noise processes. The series wt is a random walk, so xt and yt are I(1) processes, though the linear combination yt − xt is stationary. This means yt = (xt , yt ) is cointegrated with a = (−1, 1). Figure 4.1 shows a realization of this example of a cointegrated process, where the error processes are standard Gaussian white noise. Note that xt and yt can wander arbitrarily far from the starting value, but xt and yt themselves are ’tied together’ in the long run. The figure also shows the corresponding spread yt − xt of the realization. . . ..... ..... ..... .. .. .... ..... .. ................... . ............. ......... . . . . . . . . . . . . . . . .. . .. . ... ... .......... . ..... .......... ....................... ........... ... .......... ..... ...... . ... .... ... ... ....... ............ . ......... . .. .......... ....... .......... .... ....... ...... ............... ........ . ...... . ........ .... ..... ... ...... ...... ......... .............. .... . . ........ .. ... ... ............... ..... . ............. .... ........... . ..... .. ..... ..... ..... ...... .... .... ......... .... ........ . . . . . . . . . . . .... .... . . . ..... .. ...... ..... ........... ..... .... .. .. .................. . ..... ......... . ..... .... ... . ..... ......... ...... ......... ..... ....... .... .. .... .. ...... . ................... ......................... .... .... . . . . . . . . . . . . . . . . . . ... ....... ... ..... ............ . ........ ........ . . . . . . . . . . . . . .. ....... ..... . ...................... .. ....... ...... ................. ......... . . . . . . . . . . ...... . ......... ..... .. . .... . . ... .. ... ..... . .. ..... . ..... .... .... .... .. .... ...... .. . . . ........ ........ ........ ..... . . ... ...... ...... .... ...... ...... ..... ........ ...... ....... ...... . . . . . .... ..... ..... ....... ... . .... ..... ....... ....... ....... ...... .. ...... ..... .... ..... .... . .......... ..... ..... .. ... ..... . ..... ..... ... .. .... .. ..... .... .... ... .. .... .. .... .... .... .... .... .... .. .. ... .... . .. .. ... ...... ..... .... ............... ........ ................ ... ..... ....... ......... ............ ..... .............. ... .... ....... .... .... ..... ........ . . . . . . . . . . . . . . . . . . . . . . . ... . .... . ... .. . . . . . . ... . . . . . .. . . . .. ...... ... ... ........ ... ... ......... ... ... .. ... ... ...... ... . ...... ... ......... ... .. .... ... ... ... ...... ....... .... ...... ........... ..... ...... ... ... ... ... ...... ... ... .. ... ... ......... ... ... ..... ... ........ .. ...... .. ....... .. .. ... ... ..... .. .. .... .... .. . ..... .. ..... ..... ... ... ... ... ............................................................................................................................................................................................................................................................................................................................................................................. ....... .... ....... .... ... ........ .............. .... .......... ........ ........ .. ............ ... ........ ........ ........ ........ ... ....... ....... .... .... ....... ........ .... ........ .... ........ .... .... ..... .... .. .. .... ... ......... .. .. .... .... .... .. .... ... .. ..... .... ......... ..... ..... .... ... .. .... ....... .... .. .. ...... .... ..... ...... ..... ..... . ..... .. ... .. .... ......... ...... ...... .... ... ...... ...... ... ...... ......... ...... ...... ...... .. ... .. . .... .. .... .... .... . ..... ..... ..... ...... ... .... ... . ... ........... ....... . ..... ..... ..... .... .... ...... ..... . .. .. ...... ... ...... ...... ... ...... ... ........ ... ...... .. ..... ... . . .... ... . ...... ... .. ... .. Figure 4.1: Realization of cointegrated process and spread of realization. 36 Correlation Correlation is used in analysis of co-movements in assets but also in analysis of co-movements in returns. Correlation measures the strength and direction of linear relationships between variables. If xt denotes a price process of a stock, the returns ht are defined by ht = xt − xt−1 , xt−1 with log(1 + ǫ) ≈ ǫ as ǫ → 0, we can approximate this by: ¶ µ xt − xt−1 xt xt . = − 1 ≈ log xt−1 xt−1 xt−1 Correlation can refer to co-movement in the stock returns and in the stock prices themselves, cointegration refers to co-movements in the stock prices themselves or the logarithm of the stock prices. Cointegration and correlation are related, but they are different concepts. High correlation does not imply cointegration, and neither does cointegration imply high correlation. In fact, cointegrated series can have correlations that are quite low at times. For example, a large and diversified portfolio of stocks which are also in an equity index, where the weights in the portfolio are determined by their weights in the index, should be cointegrated with the index itself. Although the portfolio should move in line with the index in the long term, there will be periods when stocks in the index that are not in the portfolio have exceptional price movements. Following this, the empirical correlations between the portfolio and the index may be rather low for a time. The simple example at the beginning of this section shows the same, that is, cointegration does not imply high correlation. For illustration purposes it is convenient to look at the differences, ∆xt and ∆yt , instead of the returns or xt and yt themselves because in this example they do not have constant variances. The variance of ∆xt is Var(∆xt ) = Var(xt − xt−1 ) = Var(ǫt + ǫx, t + ǫx, t−1 ) = σ 2 + 2σx2 , where σ 2 , σx2 , and σy2 denote the variances of ǫt , ǫx, t and ǫy, t respectively. 37 In the same way, given by Var(∆yt ) = σ 2 + 2σy2 . The covariance of ∆xt and ∆yt is Cov(∆xt , ∆yt ) = E(∆xt ∆yt ) − E(∆xt )E(∆yt ) = E(ǫ2t ) − 0 = σ2. The correlation between the difference processes is Cov(∆xt , ∆yt ) Corr(∆xt , ∆yt ) = p Var(∆xt )Var(∆yt ) σ2 . = q (σ 2 + 2σx2 )(σ 2 + 2σy2 ) The correlation between ∆xt and ∆yt is going to be less than 1, and when the variances of ǫxt and/or ǫyt are much larger than the variance of ǫt the correlation will be low while xt and yt are cointegrated. The converse also holds true: there may be high correlation between the stock prices and/or the returns without the stock prices being cointegrated. Figure 4.2 shows two stock price processes which are highly correlated, namely 0.9957. The correlation between the returns is even equal to 1. But the price processes are clearly not cointegrated, they are not tied together, instead they are diverging more and more as time goes on. So, correlation does not tells us enough about the long-term relationship between two stocks: they may or may not be moving together over long periods of time, i.e. they may or may not be cointegrated. Looking from a trading point of view, the ’pair’ in figure 4.2 is not a good one. Figures 4.3 and 4.4 show the spread calculated with the average ratio r and calculated with a 10% moving average ratio ret respectively. In figure 4.3 it is clear that this ’pair’ is not a good one, because the spread is not oscillating around zero. Figure 4.4 looks better, but actually we are loosing money with nearly every trade because the ratios when positions were put on differ a lot from the ratios when the positions were reversed. The ratios differ a lot because the actual ratio rt is moving a lot, which is due to the divergence between the stock prices. So, correlation is not a good way to identify pairs. 38 100 50 .. .... .......... ............ ..... ....... .......... .............. ................. . . . . . . . ..... .. ... ....... ... .. . ........................................ .... ....................... . .. ...... .... . . . . . . . . . ......... ............. ........... .............. ............ ........... ............ .......... ..................... . .... . .... ......................... ........ ............................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .. .... ........... .......................... ...... . ........... .... .............................................................................. . 0 Figure 4.2: Highly correlated stock prices. 5 0 −5 ... ......... .... . .............. .... ........ . ............ . ..... ....... ............. ... ...... ....... ................. .......... ..... ..... . ........ ...... ... ........ . ... .. ...................... . ... ..... ....... ..... ........ . ..... ............. ... .. ... .. ... . ........................ . .. ...... . ......... 1 0 −1 Figure 4.3: Spread st , r = 0.76. .. .... ....... .. ........ .. ... .. . .... ... .... ... .. ............. .............. ........ ............ ........... .... .... ...... ... .. .. ....... .... .......... ......... ... .......... ......... ..... ..... .. ... .. .. .................... ..... ... ... ............... . .......... ..... .................. ................ .... ........ ........ .... ..... ... . . ...... ... .. . ..... ..... ....................................................... ...... . .... .... ...... ......... ... ................ ... ...... ... ... ........ ... ... ......... ................. ..................... ........ ...... ...... ........... ................... ............. ..... ...... ......... ............ .............. ...... ... .... . ... ...... .......... ..... ... .... ....... ..... . ............ .. ..... ... .......... .......... .. ..... ............ ......... .... ... ...... ... .. ..... .... ................. ........... ... ... ... ........ ............ ... ............... .......... ........ . . ... ...... ... . ........... .............. ...... .......... ... .. . ........ .. .... .......... . . . . ....... . . . ....... ..... . . . ..... ... .. Figure 4.4: Spread seα,t , α = 10%. A better way to identify pairs is with cointegration, because we would like the stock prices in a pair to be tied together. If two stocks in a pair are cointegrated, a certain linear combination of the two is stationary. This implies that the spread, defined with the cointegrating vector a instead of the average ratio r̄ or moving average ratio r̃t , is mean-reverting. In paragraph 2.3 was explained that this property is an important one. 4.2 Stock price model In the preceding section cointegration was introduced, the question remains how to test for cointegration. The test should be preceded by examining if each component of yt is I(1) because that is a requirement in the definition of cointegration. Several books and articles are written about modeling stock prices. In this section we will derive a commonly used model which can be 39 found in, among others, [7]. This model is famous for the use in option valuation. We will use this model to show that the logarithm of stock prices are integrated of order one and to show that it is more or less justified to assume stock prices themselves are integrated of order one. In figure 4.5 the daily closing prices of Royal Dutch Shell are plotted. The figure shows the jagged behavior that is common to stock prices. 30 ¿ 25 20 . ..... .... ... .... ... ..... . .. ...... . ....... ...... ........... .......... ... ... .... . .. . .. ... ... ........ .... ....... .... ...................... ............... ............. . ......... ....... ... ....... . ....... .... ...... . .. ... ..... ... ..... .. .... .... . ................ ...... ... ........ ... ... . ....... ..... .... ...... ..... .... ...... ...... .... ... ... .. ..... .. .. ..... ...... .. ... ... .... .... . . ....... ... . . . . . ...... ... ... .... ..... ........ ....... .. .... ..... . ........... ........................... . ....... ... . ..... . .. .... ..... ..... ................................... ................... ...... .... . . ..... ... .. . ..... ...... ..... .......... ............ ... ..... ................................ Figure 4.5: Daily Royal Dutch Shell stock prices. We first examine the returns of the Royal Dutch Shell stock. Figure 4.6 shows the estimated density of the daily returns with the N (0, 1) density superimposed, figure 4.7 the empirical distribution function and figure 4.8 the normal QQ-plot. The daily returns were normalized to ĥt = ht − µ̂ , σ̂ 2 where µ̂ and σ̂ 2 are the sample mean and sample variance. These figures suggest that the marginal distribution of daily returns of the Royal Dutch Shell stock is Gaussian. The QQ-plot indicates that the match is least accurate at the extremes of the range, the returns have fatter tails than the normal distribution. Figure 4.9 shows the sample autocorrelation function of the daily returns. The bounds ±1.96T −1/2 are displayed by the dashes lines, here T = 520. The figure strongly suggests that the returns are uncorrelated. Although uncorrelated does not implies independence, we suggest that for modeling xt we take the returns as normally distributed i.i.d. samples, because the autocorrelation function for a sample from an i.i.d. noise sequence looks similar as figure 4.9. 40 0.4 0.2 0.0 ....... .... ......... ... . .. .. ....... ....... ....... ....... . . ... . ........ ...... .... .... ..... ....... ...... . . . ... . ........ ...... . . . .... . ... .. ......... . ... ... . . ........ ... . . . ..... ... ... . .... . . ..... . ..... . . . . ......... .... . . . ..... ...... ...... . . . ........ . . . . . . .... .................. . . . ..... . . . . . . . ........................ . . . . . . . . . . . . ........ ..... .................................................... ............................................... .. −2 0 2 4 Figure 4.6: Estimated density. 1.0 ......................................................... .................... ................ ....... ............ . . . . . ........... ..... . .......... ...... ...... . . . . ... ........ ....... .... ............ . . ............. ... ........... ........ ............ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................. 0.5 0.0 −2 0 2 4 Figure 4.7: Empirical distribution function. ..... ..... ..................... ................. ............. .............. .................. . . . . . . . . . ............. ......... ............... ............... ................ . . . . . . . . . . .............. ............. ........ ........ ......... . . . . . . . . . . . ............... ........... ........... .......... ......... . . . . . . . . ......... ............... ...................... ........ ........ ................ . . . . . . . . . ................ 2 0 −2 −2 1.0 0.5 0.0 0 2 4 Figure 4.8: Normal QQ-plot. ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ....... .......... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ... .. .. . . ................................................................................................................................................................................................................................................................................................................. .. .. .. . .. .. . . . ....... ....... ....... ....... ....... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... . ....... ....... ....... ....... ....... ....... .......... ....... 0 5 10 15 lag 20 Figure 4.9: Autocorrelation function. 41 25 Given the stock price x(0) = x0 at time t = 0, we like to come up with a process that describes the stock price x(t) for all times 0 ≤ t ≤ T . As a starting point for the model we note that the value of a risk-free investment D, like putting money on a savings account, changes over a small time interval δ as D(t + δ) = D(t) + µδD(t), where µ is the interest rate. There is something that is called the efficient market hypothesis that says that the current stock price reflects all the information known to investors, and so any change in the price is due to new information. We may build this into our model by adding a random fluctuation to the interest rate equation. Let t = iδ, the discrete-time model becomes: √ x(ti ) = x(ti−1 ) + µ δ x(ti−1 ) + σ δ ui x(ti−1 ) , (4.1) where the parameter µ > 0 represents an annual upward drift of the stock prices. The parameter σ ≥ 0 is a constant that determines the strength of the random fluctuations and is called the volatility. The random fluctuations u1 , u2 , . . . are i.i.d N (0, 1). Notice that the returns [x(ti ) − x(ti−1 )]/x(ti−1 ) indeed form a normal i.i.d sequence. We consider the time interval [0, t] with t = Lδ. Assume we know x(0) = x0 , the discrete model (4.1) gives us expressions for x(δ), x(2δ), . . . , x(t). To derive a continuous model for the stock price, we let δ → 0 to get a limiting expression for x(t). The discrete model says that √ over each time δ the stock price gets multiplied by a factor 1 + µ + σ δui , hence x(t) = x0 L Y √ (1 + µδ + σ δui ). i=1 Dividing by x0 and taking logarithms gives log µ x(t) x0 ¶ = L X √ log(1 + µδ + σ δui ). i=1 42 We are interested in the limit δ → 0, we exploit the approximation log(1 + ǫ) ≈ ǫ − ǫ2 /2 + · · · for small ǫ. log µ x(t) x0 ¶ ≈ L−1 X i=0 √ 1 µδ + σ δui − σ 2 δu2i . 2 This is justifiable because E(u2i ) is finite. We have ignored terms that involve the power of δ 3/2 or higher. The expectation and the variance are: √ 1 1 E(µδ + σ δ − σ 2 δu2i ) = µδ − σ 2 δ , 2 2 √ 1 2 2 var(µδ + σ δui − σ δui ) = σ 2 δ + higher powers of δ . 2 The Central Limit Theorem, which can be found in section 5.1, suggest that log(x(t)/x0 ) behaves like a normal random variable: ¶ µ ¶ µ 1 2 x(t) 2 ∼ N (µ − σ )t, σ t . log x0 2 The limiting continuous-time expression for the stock price at fixed time t becomes: 1 x(t) = x0 e(µ− 2 σ 2 )t+σ √ tW , where W ∼ N (0, 1). For non-overlapping time intervals, the normal random variables that describe the changes will be independent. We can describe the evolution of the stock over any sequence of time points 0 = t0 < t1 < t2 < · · · < tm by 1 x(ti ) = x(ti−1 )e(µ− 2 σ √ 2 )(t −t i i−1 )+σ ti −ti−1 Wi . (4.2) This model guarantees that the stock prices is always positive, if x0 > 0. Model (4.2) is used a lot and is often referred to as geometric Brownian motion. We like to model the daily closing prices, we assume that the time intervals ti − ti−1 are equally spaced. That is, we set the time between Friday 43 evening and Monday evening equal to the time between Thursday evening and Friday evening. We can write (4.2) as 1 xt = xt−1 e(µ− 2 σ 2 )δ+σ √ δ ut , (4.3) with δ equal to 1/260, because there are approximately 260 trading days in a year. This is basically the same as the discrete model (4.1). From this model follows that log xt is integrated of order one, because √ 1 log xt = log xt−1 + (µ − σ 2 )δ + σ δ ut , 2 hence √ 1 log xt − log xt−1 = (µ − σ 2 )δ + σ δ ut 2 = constant + Gaussian white noise. This difference process of log xt is I(0) and because the process log xt itself is not, it follows that log xt is I(1). This is one of the reasons why cointegration tests are also applied to the logarithms of stock prices. Unfortunately, translating cointegration between the logarithm of two stocks into a trading strategy is less intuitively clear then translating cointegration between two stock prices themselves. When there is cointegration between the stock prices, trading the pair is very obvious. Let yt = (xt , yt ) be two stock prices processes which are cointegrated with cointegrating vector a. We ’normalize’ this vector to (−α, 1), so yt − αxt is a stationary process with mean zero, which means that yt is approximately αxt . It could be that there is a constant in the cointegrating relation, than yt − αxt does not have mean zero. This will be discussed in the next section, for now we assume that the mean is zero. We treat yt − αxt as our spread process described in chapter 2, so we trade pair (x, y) in the constant ratio α : 1. This is exactly the same as the trading strategy, if we do not use the average ratio to calculate the spread but the least squares estimator. If the logarithms of the stock prices xt and yt are cointegrated with cointegrating vector b, we normalize this to (−β, 1), then log yt − β log xt is a stationary process. So log yt is approximately β log xt , we cannot trade logarithms of stocks so we like to know the relation between xt and yt . 44 Let εt denote the residual process: log yt − β log xt = εt . The relation between xt and yt becomes: yt = xβt eεt . It is not clear how we can trade this relation, not with the strategy from chapter 2. This is the reason why we want to test for cointegration on the stock prices and not on their logarithms, in order to do that we need xt and yt to be integrated of order one. In chapter 9 we will make an attempt to come up with a trading strategy if we have cointegration between the logarithms of the stock prices. Model (4.3) does not imply that xt is I(1), this is more easily seen in (4.1). The difference is √ xt − xt−1 = µδxt−1 + σ δut xt−1 , this has not got a constant expectation, so according the derived stock price model the difference process is not I(0). Fortunately, we look at the stock prices {x}Tt=0 for fixed T , µ is a small number between 0.01 and 0.1 and typical values of σ are between 0.05 and 0.5, so it is not likely that xt−1 becomes very large or very small. That is why the differences divided by the mean value of xt−1 look a lot like the returns: xt − xt−1 xt − xt−1 ≈ . xt−1 x̄t−1 The returns are I(0), this indicates that the difference process ∆xt are also more or less I(0) and stock price process xt more or less I(1). We consider a realization of model (4.3), where we take µ = 0.03, σ = 0.18 and x0 = 20 shown in figure 4.10. The differences of this realizations is shown in figure 4.11, which looks like pretty stationary. This indicates that realizations of model (4.3) behave like they are I(1), while strictly under the model they are not. 45 22 20 18 ......... . .... ............. ...... ......... ... ......... .... ..... .... .. .... ... .... .. .. .... . .. .... ........ ....... . ....... .. ... .... ... ........................... ..... ... .... .. .. ...................... . .......... .... .... .... .......... . . . . . . . . . . . . . . . . . .. . . ... . .. . ............ ....... .. . ..... .. ...... ..... ........ ....... ... .......... ....................... ...... .... ............... ................... .. . ..... ............ ........... .... ...... ....... ..... .. ..... ... ...... ........ .... .... ......... .......... ... .... .. .. . . ............ . . . . ...... ......... .... ..... ... .. .. . . . . . . .... . . . .. . . . . ..... ... ... .... .... .. ............. ....... .. ..... ... .. . ...... ........ .. ... . .. ... .. .. ... ... .. .... . .. .... ......... ..... . ... .. ... .... ... ... ........... ..... ... ....... .. ..... .. .... ............ ...... ...... ... ............ .. ........... .. ......... .......... ... ... ...... ....... . . . . . . . . . . . . . . . . . . . . . ... ... ................... .. ..... ... .. ... .......... . ..... .. ........ ..... .................. .. .... . ... .. .................. ... ....... ...... ................................. . .................. ...... ........ . .... . . ........... ....... ... . ... ..... ... . ... ....... .. . . ........ ... ................ .... ........... ..... . ..... ........ .. .. .................. . ... ... .. ... .. .. .. ..... ... ...... ..... ........ .. Figure 4.10: Realization of model (4.3), µ = 0.03, σ = 0.18. 1 0 . ... ... . .... .. ... . ... ... . ... ... ... ... ... ... ... . ...... ...... ... .. ....... ... .. ... . .... . ... ... . . . . .... .... ... .... . ........ .... .. .. . .. .. ... .. .. . . . . ... ... .... . .... .... .. ..... .. . . .. .. ...... .. . .. ..... .. . . ... ..... ... .. .. . .. . . .. . ....... ... . ... ..... . . ... ... .. . ....... .. .... .... . ... .......... .. .......... ....... ......... .... .................. ....... .. .... ... ...... . ... ... ...... ....................... ... . ..... ... .. ....................... ... ..... ..... ............ ....... ... ................... ... ...................... .. ......... ...... ...... ........ ........... ........ ... .. ...... ............ ....... ................. .......... ............ ...... .. .................... .......................... ............. ....... ...... ... .... ....................... ... .. .................................... ....... ... .............. ..... ........................... . ..... ...... ... ............... ........... .... ............................... ....................................... ... ............. ...... ............... ................................................. ............ ..... .............................................. ......................... ............................................... ............... .............................................. ........... ................................ ........ ..................................................................................................... ....... .............................................................................................................................................. ..................... ........ ........................................... ................................................................ ................................. ............ ...................................................................... ............................ ................................. .......... ......................... ..... ...... .......................... .............. ................................................. ................................... .... .................................................................... .............. ..... ................... ...... .............. ........................................... ..................... ................................................ ............................... ............................................................................. ...... ..................... ......... ................................ ................. ............. ........................................ ........................................................................... ....... ........................................ ............................................................ ........ ......................... .............................. ...... .................................................... ................... ................ ....... ...... ..... ...... .......... .... ................................... ............ ............... .. ............. ...... ... ...................................... ....................... ...... ................ .......................... ................................................................ ....... .... ..... ......................... .......................... ......................................................... ....... ....... ........ ........ ..................................................... .... ... ........ ........ ..... ...... ... ..... ............. .. ........ .... .. .. . .. . .. ...... .. .. .. .... ..... . . ...... .. . .... ... ........ ......... .. ......................... ....... . ..... .... .... ...... .... .... ........ ....................... ...... ... ........ .... ........... .. .... ... .. . .. .... . .. .. .... ... ..... . ... .. ......... ....... .. ... ... ........ ............. ... ...... . .. . . . . . ... . . .... .. ... ... ..... .. . ......... ..... ... .. .. .... ... ........ .. ........... . .... ... .. . . ... .. ........ .. ... . ... ..... .... ... .. . .. . ... ... .......... ... ... ... .... ... . ... ... . .. ... .. ... ... .. ... . .. . ... . ... . −1 Figure 4.11: Differences of realization of model (4.3). An other way to show that it is justifiable to assume the stock prices are integrated of order one, is to examine the differences instead of the returns. At the beginning of this section we examined the returns of Royal Dutch Shell, let us do the same for the differences ∆xt = xt − xt−1 . Figure 4.12 shows the estimated density of the daily returns with the N (0, 1) density superimposed, figure 4.13 the empirical distribution function and figure 4.14 the normal QQ-plot. The daily differences were normalized to c t = ∆xt − µ̂ , ∆x σ̂ 2 where µ̂ and σ̂ 2 are the sample mean and sample variance of the differences. Figure 4.15 shows the sample autocorrelation function of the differences. These figures look pretty much the same as the figures for the returns, this suggests that it is justifiable to see the differences of a stock price process as normal distributed i.i.d samples. This implies that the differences are I(0) and the stock prices I(1). 46 0.4 0.2 0.0 ............... .... ... . ... ......... ....... ....... . . ....... ... ... ..... ...... .... .. .. ..... ....... ...... . . ... . ...... ... .... . ..... . . . ... . ....... ....... . . . ... ........ . . ... ... ...... ... .. . . ..... .... . . . . . . ........ ... . . ...... . . ....... ...... . . ....... .. . . . ........ . . ...... . .............. . . . ..... . . . . ...................... . . . . . . . . . . . . . . . . . ............ ..... ................................................... ....... ......................... ....... .. −2 0 2 4 Figure 4.12: Estimated density. 1.0 ........................................................................ ................. ................. ....... . ............ . . . . ........... .... .. ......... ........ ....... . . . . .. ...... ............ .. ... ........... . . ........... ......... ............. ......... ............. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................... 0.5 0.0 −2 0 2 4 Figure 4.13: Empirical distribution function. ..... ....... ........................... ................. ................ .............. ..................... . . . . . . . . . ....... .......... ................ .............. .................... . . . . . . . . . .. .. .............. ................ ......... ......... ............ . . . . . . . .............. ................. ................ ........ .......... . . . . . . . . .... ............ ........................ ......... ....... ....... ....... ........ ............. . .. ........ 2 0 −2 −2 1.0 0.5 0.0 0 2 4 Figure 4.14: Normal QQ-plot. ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ....... .......... ........ ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ... .. ... . . ....................................................................................................................................................................................................................................................................................................................... .. .. . . . .. . .. . ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .. ....... ....... ....... ....... ....... ....... .......... ....... 0 5 10 15 lag 20 Figure 4.15: Autocorrelation function. 47 25 So far we have discussed why it is likely that stock price processes are integrated of order one, but we can also do unit root tests on the data we want to test for cointegration. The unit root test we use in this report is the (Augmented) Dickey-Fuller test, introduced in chapter 3. The first test is: H0 : xt ∼ I(1) against H1 : xt ∼ I(0). The outcome should be to not reject H0 . The second test is: H0 : xt ∼ I(2) against H1 : xt ∼ I(1), which is equivalent to: H0 : ∆xt ∼ I(1) against H1 : ∆xt ∼ I(0). The outcome of this second test should be to reject H0 , which makes is likely that the price processes are I(1). Which case of the DF-test should be used is discussed in the next section, the critical values of these tests are derived in chapter 5 and the results of these tests for data used in this report are stated in chapter 7. 4.3 Engle-Granger method The question remains how to test for cointegration. There are several methods for testing for cointegration. R.F. Engle and C.W.J. Granger were the first to develop the key concepts of cointegration, which can be found in [5]. They received the nobel prize in economics in 2003 for their work on cointegration and ARCH models. The approach of testing for cointegration will be to test the null hypothesis that there is no cointegration among the elements of an (n × 1) vector yt . Rejection of the null hypothesis is then taken as evidence of cointegration. The Engle-Granger test is a two-step process, which should be preceded by examining if each component of yt is I(1) which was discussed in the previous section. Let us assume that this condition is fulfilled. A vector process yt is cointegrated if there exists a linear combination of its components a′ yt that is stationary. The first step in the Engle-Granger test is to estimate a, this is done with an OLS (Ordinary Least Squares) regression. The second step is 48 to test whether the residuals of the regression are stationary using a DickeyFuller test. Because if the residuals are stationary, the linear combination a′ yt is stationary, which means yt is cointegrated with cointegrating vector a. Looking from a pairs trading point of view we have two stock prices processes, yt = (xt , yt ). We like xt and yt to be cointegrated such that the spread εt = yt − αxt oscillates around zero, again we have ’normalized’ the cointegrating vector a to (−α, 1). A stationary process has constant expectation but it is not necessarily equal to zero. In order to get a stationary process with mean zero, we can include a constant in the cointegration relation such that the spread becomes: εt = yt − αxt − α0 . For example, consider the pair (xt , yt ) which is generated with the relation: yt = 2xt + 20 + εt , so α = 2 and α0 = 20. Figure 4.16 shows xt and yt . 150 100 50 . ....... ....... . ....... ........ ..... ....... ........................ ....... ... ........ ......... .. .. .. ... ......... ....................... ............. . . . . . . . . . . . . ... .... .. ................................. ......................... ... ... ............. ....... ....... ....... ... . . . . . . . . ........... ..... ................. ......................... . . . . . . . . . . . . . . . . . . . . . . ...... ........... .................. ......... .................... .......... . . .............. . . . . . . . ........ . . . . . . . . . . . . . . . ............. ...... .. . . . ... ....... ....................... . . .... ........................... ............ .. ............................. ................ .............................. .............................................................. ........................ ................... ............. ............................ ................ ..................... ......................... ...... ........... . ... ......... ............... . . . . . . . . . . . . . . . . . . . ..................... ........................................... 0 Figure 4.16: Paths xt and yt for α0 = 20. Normally we do not know the exact value of α and α0 , so we have to estimate them. According to the Engle-Granger method we do this with OLS. We have two possibilities, regression with and without intercept. In the first situation, regression with intercept, we get α̂α0 = 2.01 α̂0 = 19.67. In the second, regression without intercept: α̂¬α0 = 2.41. 49 1 0 −1 .. .. ........... . ...... ............ . ......... . .............. ...................... . ................. ... . ......................... ... .. ....... ............. . ..... ......... ........ ............. .......... .. ........ ......... ....... ... . ... ..... . . . ... ..... . .. ........... ........... ..... ..... ....... .... ...... ...... ............. ....... ..... ......... .......... .. ..... .... .. .... ... ...... .... ... ......... .. ... . . ... ...... . . . .......... . ... .... . .. .. .... ... .. . ...... ...... .... ... ..... .... ......... ..... ..... .. .... ... ....... ... ......... ... ... ......... ... ....... .. ..... . ........ ...... ......... . ......... . .... . .. ....... .. ..... ..... ...... ...... ..... . ... ...... ...... ..... ...... ..... ..... . ... .... .. .... ..... ... .. .. .... ..... ........ ... . . ... . . . . . ...... ... . .. .... ... ......... ... ... ...... ..... .... ... .... ....... ..... ........ ........ ...... ....... ...... .... ... ...... ..... ... .. .. ..... . . . . . . . . . . . .... . ........ . ....... .. ........ ... ............. ......... ........ ........ ....... ..... ...... ......... ....... ..... .... . ..... ......... ........ ............ ..... .......... ....... ......... ... .... ....... .............. ......... ................. ........ ...... ....................... ..................... .... ............ ...... ............... ............... .................. ...... .... ................ ..... . .... .. . 0 −5 −10 . ....... .. . .. ... .................... ... ....... ......... .... .... .............. ............ .. .. .... .. ... ... .. ... .. ..... ................ ........... ............ .. .. .. .. ... .. .. .. .......... ...... .. ........... ... ...................... .. .... . ...... ... . .. .............. ............... . . ......... ............... . ...... .............. ...... ..... ... ............. ..... ...... .. .... ... ...... ....... ..... ... ..... .. ... ....... . . ... .. ..... .. ....... . ........... . .... .... . . ..... . . . ...... ... .. ... ... .. . .. ..... ..... . . ... .. ............... ..... ...... .... .................... ... ....... .. ......... ......... ... ....... . ...... .... ..... .......... ....... ....... ....... ... ...... ..... ...... ......... ...... ........... .... ....... ................ ... .............. ........ ...... ...... . ....... .. Figure 4.17: Spread with and without α0 , α0 = 20. Figure 4.17 shows the corresponding spread processes. The left is yt − α̂α0 xt − α0 , the right is yt − α̂¬α0 xt . The left figure of 4.17 looks a lot better but there is a disadvantage. In section 2.3 about the properties of pairs trading was described that pairs trading is more or less cash neutral. The trading strategy is cash neutral up to Γ if we neglect costs for short selling, in other words each trade costs or provides us with Γ. The cash neutral property is a property we like to keep. If we trade the spread from the left figure of 4.17 it is not cash neutral anymore. Assume that the predetermined threshold Γ is equal to 1. The first time the spread is above 1, the value of x is ¿ 43.73 and the value of y is ¿ 108.59, which gives that the spread at this time equal to 108.59-2.01*43.73-19.67=1.02. Then we sell y which provides us with ¿ 108.59 and buy 2.01 x which costs us 2.01*43.73=87.90. So we are left with a positive difference in money of ¿ 20.69. The first time the spread is below -1, the value of x is ¿ 52.66 and y is ¿ 124.51, which gives that the spread at this time equal to 124.51-2.01*52.66-19.67=-1.01. Then we buy y which costs us ¿ 124.51 and sell 2.01 x providing us 2.01*52.66=105.85. So this trade costs us ¿ 18.66. This way of trading is not cash neutral, each trade costs or provides us approximately α0 . 50 A possibility to resolve this is to neglect α0 , so we trade the spread from the right figure of 4.17. In this example it is probably not worthwhile, because the spread has a clear downward trend. Let us consider the two different spreads for α0 = 1: εt = yt − 2xt − 1, shown in figure 4.18. 1 0 −1 . ... ............ ...... ..... ............ .. ............. ............ ............. .... .......... ........... ....... .... ..................... ..... ....... ... ... ..... ............ ........ ............ .. .......... .. . ..... . . . . . . . . . . . . . . . . . . . . ... . .. .. . ........ ...... ....... ......... ....... ....... ... ...... ...... ...... ....... ... ....... ... ... ... .... ... .... .... .. ...... ...... ...... ...... .. ..... .. . . . ... . . . . . . . . . . ... ..... . ....... .. .... .... ... .... ....... ........ ........ .. ... .... .. ..... ........ ....... ...... ..... .... ... .... ..... ...... ..... ....... ..... .... ...... ...... ...... ....... . ... ..... .... . . . .. ... .... .. .. ....... ... .. ..... .... .. .... ..... ...... ..... .... ... ..... ...... ..... ...... ... ...... ....... ...... ......... ........ . . . . . . . . ....... ..... ...... .. .. ..... .... .... ...... .. ........ .... ... ........ ......... ...... ........... ........ ....... ........... ........ ....... ..... .. . . ... ....... . . . . . . . . . . . . . . . . . .... ......... ....... .......... .... ..... ........................ ......... ............. ........ .... ................. .. .................... ................... ............ ........ .................. .. ..... ....... . 1 0 −1 .. ........... ............................ ................ ....... .... . .. ... .... . .... ...... ....... ........... ............. ............. .... ......... ...... ...... ...................... ........... ............... . . .. . . . . . . . ..... ..... .... ... ..... . .... ... .... ....... .... ... ...... ........ .... ...... ................ ... ... .... ... ... ........ . . . . ...... . . . . ... ..... ... . .. ..... ....... ..... ... ...... ...... .... .. ....... ..... ....... ... ..... ...... ..... .. ........ .. ... ... .... ..... ....... ..... ... .... ... ....... . . . . . . . . . ....... ..... .. ... ... .... .... .... ........ ...... .... ..... ... ....... ...... .. ....... ....... .. ... .... ....... ..... ... .. . .... ..... ... . ..... . . . . . ..... . ..... ..... ......... ........ ...... .... ............. .. ..... .. ...... .... ... ...... .... ... ... .... ....... .. ....... ... ... .. ...... ...... ....... . . . ... .. . ........ ......... ......... .... .... .... ...... .......... ....... ...... .... .... ....... ................... ... .. ... .. ....... ...... ..... ............. .... . ...... . ........ ............ ........ ........... . ........................ ...... ........... ......... ........ .............. ....... .................... .. .............. . ....... .. .. Figure 4.18: Spread with and without α0 , α0 = 1. Now the spread where α0 is neglected looks almost as good as the spread with α0 . In conclusion, in order to keep the cash neutral property and the trading strategy from chapter 2, α0 should be close to zero such that neglecting it still gives a stationary spread process. So when testing real stock price processes xt and yt for cointegration, we only estimate α and test the residual process yt − α̂xt for stationarity. A suggestion for an alternative trading strategy is stated in chapter 9, which is able to trade a pair when α0 cannot be neglected. It is not standard to do OLS regression on non-stationary data. OLS regression applied to non-stationary data is quite likely to produce spurious results. There is only one circumstance when the OLS estimation gives a consistent estimate of the cointegrating vector and that is when there is a cointegrating relation. Note that if εt = yt − αxt is stationary, then T T 1X 1X 2 P εt = (yt − αxt )2 → E(ε2t ). T t=1 T t=1 51 (4.4) By contrast, if (−1, α) is not a cointegrating vector between x and y, then yt − αxt is I(1) and from proposition 2 in section 5.6, Z 1 T 1 X 2 D 2 W (r)2 dr, (yt − αxt ) → λ T 2 t=1 0 where W (r) is standard Brownian motion which will be defined in chapter 5 and λ is a parameter determined by the autocovariances of ∆εt . Hence, if (−1, α) is not a cointegrating vector, the statistic in (4.4) would diverge to +∞. This suggests that we can obtain a consistent estimate of a cointegrating vector by choosing α so as to minimize (4.4). It turns out that the OLS estimator for α, also when α0 is included in the regression, converges at rate T . This is analyzed by Philips and Durlauf in [12]. Now that we have a method for estimating the cointegrating vector, the second step in the Engle-Granger method is examining the residuals with a Dickey-Fuller test. In chapter 3 was described that there are several cases, so the question remains which case do we use when testing for cointegration. In most literature about cointegration, it is not stated which case is used and why, but from the critical values used can be seen that case 2 is used most often. One discussion found in Hamilton [6], is the following: Which case is the ’correct’ case to use to test the null hypothesis of a unit root? The answer depends on why we are interested in testing for a unit root. If the analyst has a specific null hypothesis about the process that generated the data, then obviously this would guide the choice of test. In the absence of such guidance, one general principal would be to fit a specification that is a plausible description of the data under both the null and the alternative. This principle would suggest using case 4 for a series with a obvious trend and the case 2 for series without a significant trend. For example, the nominal interest rate series used in the examples in this section. There is no economic theory to suggest that nominal interest rates should exhibit a deterministic time trend, and so a natural null hypothesis is that the true process is a random walk without trend. In terms of framing a plausible alternative, it is difficult to maintain that these data could have been generated by it = ρit−1 + ut with |ρ| significantly less than 1. If these data were 52 to be described by a stationary process, surely the process would have a positive mean. This argues for including a constant term in the estimated regression, even though under the null hypothesis the true process does not contain a constant term. Thus, case 2 is a sensible approach for these data. We do not have a specific null hypothesis, so according to this quote we should use case 2 because there is no trend in spread processes. In the next chapter we investigate the power of the three different tests, case 1 through case 3, maybe we can find another reason to use Dickey-Fuller case 2. So far we have looked at cointegration between two stocks because of pairs trading. However, pairs trading with three of more stocks in a ’pair’ is very interesting. Cointegration is defined for a (n × 1) vector yt and the trading strategy is easily extended for three or more stocks. For example, consider a pair of three stocks yt = (xt , yt , zt ) who are cointegrated with cointegrating vector (−α1 , −α2 , 1). Then we calculate the spread as st = zt − α1 xt − α2 yt , which we trade the same way as before. When the spread reaches Γ we sell 1 z and buy α1 times x and α2 times y. When the spread goes below −Γ we reverse our position and lock in a profit of at least 2Γ. The threshold Γ is determined in the same way as in chapter 2, we just try a few on historical data and take the best one. If the number of stocks in a pair is greater than two, n > 2, the Engle-Granger method has a disadvantage. We estimate the cointegrating vector with OLS regression, if yt = (y1t , y2t , . . . , ynt ) we regress y1t on (y2t , y3t , . . . , ynt ). So the first element of the cointegrating vector is set to be unity. This normalization is not harmless if the first variable y1t does not appear in the cointegrating relation at all, in other words, its coefficients is equal to zero but is set to one. A second disadvantage, which exist also when n = 2, is that the method is not symmetric. Suppose n = 2, we regress y1t on y2t : y1t = αy2t + ut . We might equally well have normalized the coefficient of y2t , so the regression would be y2t = βy1t + vt . 53 Then the OLS estimate β̂ is not simply the inverse of α̂, meaning that these two regression will give different estimates of the cointegrating vector. Thus, choosing which variable to call y1 and which to call y2 might end up making a difference for the evidence one finds for cointegration. For these reasons we discuss the Johansen method in the next section. First a summary is given for testing on cointegration with the Engle-Granger method: ˆ Given is (n × 1) vector yt = (y1t , y2t , . . . , ynt ). ˆ Examine or assume that each individual variable yit is I(1). ˆ Then yt is cointegrated if a′ yt is I(0) for some nonzero vector a. ˆ Regress y1t on (y2t , . . . , ynt ), a constant maybe included but with our trading strategy we do not want to include this constant. This regression gives the estimation â. ˆ Then the residuals of this regression, which is our spread process, are given by e = y1t − â2 y2t − · · · − ân ynt , which resembles the real error process: εt = y1t − a2 y2t − · · · − an ynt . ˆ The Dickey-Fuller test assumes that εt follows an AR(p) model, with an unit root and with or without a constant term. If we use case 2, which is suggested by the above quote, we assume that the true model does not have a constant but we include a constant in the estimated model. So the Dickey-Fuller case 2 assumes the true model of εt is εt = ρεt−1 + β1 ∆εt−1 + · · · + βp−1 ∆εt−p+1 + ηt , with ρ = 1. ˆ To estimate p we fit an AR(k) model with OLS on et for k = 1, . . . , K. The value of k with the smallest information criteria AIC(k), BIC(k) and HIC(k) is the estimate for the model order p̂. If the information criteria give different values, we take the rounded mean. 54 ˆ We use the AR(p̂) fit et = ĉ + ρ̂et−1 + β̂1 ∆et−1 + · · · + β̂p̂−1 ∆et−p̂+1 + nt , to calculate the Dickey-Fuller test statistic ρ̂ − 1 , σ̂ρ̂ where ρ̂ is the OLS estimate of ρ and σ̂ρ̂ is the standard error for the estimated coefficient. ˆ Compare the outcome with the critical values of the Dickey-Fuller test. The critical values of the Dickey-Fuller test will be derived and simulated in the next chapter. Engle-Granger is a two-step method, first we do an OLS regression and then a Dickey-Fuller test. In chapter 6 we will examine if the first step influences the critical values, in other words, are the critical values for Engle-Granger really the same as for Dickey-Fuller. 4.4 Johansen method The Johansen method also known as ’full-information maximum likelihood’ was developed by Søren Johansen in [8] and [9]. This method allows us to test for the number of cointegrating relations. An (n × 1) vector yt has h cointegrating relations if there exists h linearly independent vectors a1 , a2 , . . . , ah such that a′i yt is stationary. If such vectors exist, their values are not unique, since any linear combination of a1 , a2 , . . . , ah is also a cointegrating vector. With the Engle-Granger method this was resolved by setting the first element in the cointegrating vector equal to one. As mentioned before this has some disadvantages. In this section the Johansen method is summarized, no proof or argumentation is given. These can be found in [8] and [9]. Let yt be an (n × 1) vector. The Johansen method assumes that yt follows a VAR(p) model yt = c + Φ1 yt−1 + · · · + Φp yt−p + εt , where c is an (n × 1) vector and Φi is an (n × n) matrix. 55 (4.5) Model (4.5) can be written as yt = c + ρyt−1 + β 1 ∆yt−1 + · · · + β p−1 ∆yt−p+1 + εt , (4.6) where ρ = Φ1 + Φ2 + · · · + Φp , β i = − (Φi+1 + Φi+2 + · · · + Φp ) , for i = 1, 2, . . . , p − 1. Subtracting yt−1 from both sides of (4.6) results in ∆yt = c + β 0 yt−1 + β 1 ∆yt−1 + · · · + β p−1 ∆yt−p+1 + εt , (4.7) with E(εt ) = 0, ½ Ω for t = τ, E(εt ετ ) = 0 otherwise. Johansen showed that under the null hypothesis of h cointegrating relations, only h separate linear combinations of yt appear in (4.7). This implies that β 0 can be written in the form β 0 = −BA′ , (4.8) for B an (n × h) matrix and A′ an (h × n) matrix. If we consider a sample of T +p observations, denoted (y−p+1 , y−p+2 , . . . , yT ), and if the errors εt are Gaussian, the log likelihood of (y1 , y2 , . . . , yT ) conditional on (y−p+1 , y−p+2 , . . . , y0 ) is given by (4.9) L(Ω, c, β 0 , β 1 , . . . , β p−1 ) = T Tn log(2π) − log |Ω| − 2 2 T 1X£ − (∆yt − c − β 0 yt−1 − β 1 ∆yt−1 − · · · − β p−1 ∆yt−p+1 )′ 2 t=1 ¤ × Ω−1 (∆yt − c − β 0 yt−1 − β 1 ∆yt−1 − · · · − β p−1 ∆yt−p+1 ) . 56 The goal is to choose (Ω, c, β 0 , β 1 , . . . , β p−1 ) so as to maximize (4.9) subject to the constraint that β can be written in the form of (4.8). The Johansen method calculates the maximum likelihood estimates of (Ω, c, β 0 , β 1 , . . . , β p−1 ). The first step of the Johansen method is to estimate a VAR(p − 1) for ∆yt . That is, regress ∆yit on a constant and all elements of the vectors ∆yt−1 , . . . , ∆yt−p+1 with OLS. Collect the i = 1, 2, . . . , n regressions in vector form ∆yt = π̂ 0 + Π̂1 ∆yt−1 + · · · + Π̂p−1 ∆yt−p+1 + ût . (4.10) We also estimate a second regression, we regress yt−1 on a constant and ∆yt−1 , . . . , ∆yt−p+1 yt−1 = θ̂ + χ̂1 ∆yt−1 + · · · + χ̂p−1 ∆yt−p+1 + v̂t . (4.11) The second step is to calculate the sample covariance matrices of the OLS residuals ût and v̂t : Σ̂vv = Σ̂uu = Σ̂uv T 1X v̂t v̂t′ , T t=1 T 1X ût û′t , T t=1 T 1X ût v̂t′ , = T t=1 Σ̂vu = Σ̂uv ′ . From these, find the eigenvalues of the matrix −1 Σ̂−1 vv Σ̂vu Σ̂uu Σ̂uv , (4.12) with the eigenvalues ordered λ̂1 > λ̂2 > · · · > λ̂n . The maximum value attained by the log likelihood function subject to the constraint that there are h cointegrating relations is given by L∗0 = − h Tn Tn T T X log(1 − λ̂i ). log(2π) − − log |Σ̂uu | − 2 2 2 2 i=1 57 (4.13) The third step is to calculate the maximum likelihood estimates of the parameters. Let â1 , . . . , âh denote the (n × 1) eigenvectors of (4.12) associated with the h largest eigenvalues. These provide a basis for the space of cointegrating relations. That is, the maximum likelihood estimate is that any cointegrating vector can be written in the form a = b1 â1 + b2 â2 + · · · + bh âh , for some choice of scalers b1 , . . . , bh . Johansen suggests normalizing these vector âi such that â′i Σ̂vv âi = 1. Collect the first h normalized vectors in a an (n × h) matrix Â: £ ¤  = â1 â2 · · · âh . Then the maximum likelihood estimate of β 0 is given by β̂ 0 = Σ̂uv ÂÂ′ . The maximum likelihood estimate of c is ĉ = π̂0 − β̂ 0 . Now we are ready for hypothesis testing. Under the null hypothesis that there are exactly h cointegrating relations, the largest value that can be achieved for the log likelihood function was given by (4.13). Consider the alternative hypothesis that there are n cointegrating relations. This means that every linear combination of yt is stationary, in which case yt−1 would appear in (4.7) without constraints and no restrictions are imposed on β 0 . The value for the log likelihood function in the absence of constraints is given by L∗1 n Tn Tn T T X =− log(2π) − − log |Σ̂uu | − log(1 − λ̂i ). 2 2 2 2 i=1 (4.14) A likelihood ratio test of H0 : h relations against H1 : n relations, can be based on 2(L∗1 − L∗0 ) = −T n X i=h+1 58 log(1 − λ̂i ). (4.15) An other approach would be to test the null hypothesis of h cointegrating relations against h + 1 cointegrating relations. A likelihood ratio test of H0 : h relations against H1 : h + 1 relations, can be based on 2(L∗1 − L∗0 ) = −T log(1 − λ̂h+1 ). (4.16) Like with the Dickey-Fuller test, we need to distinguish several cases. There are also three cases for the Johansen method, but they are different than the Dickey-Fuller cases: Case 1 : The true value of the constant c in (4.7) is zero, meaning that there is no intercept in any of the cointegrating relations and no deterministic time trend in any of the elements of yt . There is no constant term included in the regressions (4.10) and (4.11). Case 2 : The true value of the constant c in (4.7) is such that there are no deterministic time trends in any of the elements of yt . There are no restrictions on the constant term in the estimation of the regressions (4.10) and (4.11). Case 3 : The true value of the constant c in (4.7) is such that one or more elements of yt exhibit deterministic time trend. There are no restrictions on the constant term in the estimation of the regressions (4.10) and (4.11). For both tests, which can be based on (4.15) and (4.16), the critical values for the three different cases can be found in [10] and [11]. Unfortunately, the critical values are for a sample size of T = 400. Although the data of the ten pairs IMC provided consist of 520 observations, these critical values will be used when testing the ten pairs for cointegration. I assume that the critical values are not that different for a sample size of 520. For case 1 this is very likely because Johansen showed that the asymptotic distribution of test statistic (4.15) is the same as that of the trace of matrix ¸ ¸−1 ·Z 1 ¸′ ·Z 1 ·Z 1 ′ ′ ′ W(r)dW(r) W(r)W(r) dr W(r)dW(r) Q= 0 0 0 where W(r) is g-dimensional standard Brownian motion, with g = n − h. 59 And fortunately, case 1 is the case we will use because we do not want an intercept in the cointegrating relations, as was explained in the previous section, and we assume there is no deterministic time trend in the price processes. The Johansen case 1 test can be compared with the Dickey-Fuller case 1 test, there is no constant and we do not estimate one. There is not really a Johansen case which can be compared to the Dickey-Fuller case 2, with the Johansen case 2 test, the constant c is not necessarily equal to zero. The critical values for case 1 and T = 400 for both test statistics (4.15) and (4.16) are shown in tables 4.1 and 4.2 respectively. Table 4.1: Critical values for test statistic (4.15). Case 1 g 1% 5% 10% 1 6.51 3.84 2.86 2 16.31 12.53 10.47 3 29.75 24.31 21.63 4 45.58 39.89 36.58 5 66.52 59.46 55.44 Table 4.2: Critical values for test statistic (4.16). Case 1 g 1% 5% 10% 1 6.51 3.84 2.86 2 15.69 11.44 9.52 3 22.99 17.89 15.59 4 28.82 23.80 21.58 5 35.17 30.04 27.62 Note that if g = 1, then n = h + 1. In this case the two tests are identical. For this reason the first rows of the tables are the same. With two stocks in pair, we can do several hypothesis tests: 1) 2) 3) H0 : 0 relations against H1 : 2 relations, H0 : 0 relations against H1 : 1 relation, H0 : 1 relation against H1 : 2 relations. 60 For the first test, we use the second row of table 4.1. We basically test the null of no cointegration between the two stocks against the stocks themselves being stationary. Although the alternative hypothesis does not imply ’real’ cointegration, because every linear combination of yt is stationary since yt is already stationary, rejection of the null is taken as evidence of cointegration. For the second test, we use the second row of table 4.2. We test the null of no cointegration between the against the alternative of a single cointegration relation. For the third test, we use the first row of either table. We test the null of one cointegrating relation against the stock prices being stationary already. Basically we test if the relation is a ’real’ cointegrating relation. If the third null hypothesis is rejected, the test indicates there are two cointegrating relations which means the stock prices themselves are stationary. As we saw in section 4.2 we do not think that stock prices are stationary, but if they are we can trade them as a pair like any other pair. We could even trade each stock as a spread process. That means, we apply the trading strategy on the price process instead of the spread process. But this would not be cash and market neutral anymore, and is seen as far more risky. So with two stocks in a pair, we would like there to be one or two cointegrating relations, but we expect there is only one. In chapter 7 the results of the different tests for the 10 pairs are given. They are compared with the results from the Engle-Granger method. In the previous section was stated that the Johansen method has an advantage compared to the Engle-Granger method when there are more than two stocks in a pair, n > 2. With Johansen we do not impose the first element of the cointegrating relation to be unity, we normalize the estimated cointegrating relation such that the first element is unity or as Johansen proposed, normalizing such that â′i Σ̂vv âi = 1. With three stocks in a pair, we would like there to be one, two or three cointegrating relations but we expect that there are no more than two. With our pair trading strategy it does not matter how many relations there are as long as the stock are cointegrated, because we only trade one relation. This relation will be the eigenvector corresponding to the largest eigenvalue of the matrix in (4.12) because, according to Hamilton [6], this results in the most stationary spread process. 61 4.5 Alternative method In this section a start is made with an alternative method. Assume, like the Engle-Granger and Johansen method, price processes xt and yt are integrated of order one: xt , yt ∼ I(1). Denote with zt the vector of the differences of these price processes ¶ µ xt − xt−1 . zt = yt − yt−1 Then each component of zt is I(0), i.e. stationary. Notice that ¶ X µ t xt − x0 = zi . yt − y0 i=1 Two price processes are cointegrated if a linear combination of them is stationary, i.e. constant mean, constant variance and autocovariances that do not depend on t. In this section we like to find out if zt can be represented as an VAR(p) or as a vector MA(q) process. Engle and Granger showed that a cointegrated system can never be represented by a finite-order vector autoregression in the differenced data ∆yt = zt . The outline of the deduction is that if zt is causal, i.e. zt can be written as a linear combination of past innovations, and (xt , yt ) are cointegrated then zt is non-invertible. This implies that if (xt , yt ) are cointegrated zt cannot be represented by a VAR(p). If we assume zt to be an vector MA(q) process, we can find restrictions on the parameters of the model to ensure a linear combination of (xt , yt ) such that it is stationary exists. Let us examine this for q = 2: zt = Θ2 wt−2 + Θ1 wt−1 + Θ0 wt , where wt is i.i.d N2 (0, Σ) and Θ0 = I. Notice that a MA(q) process is always stationary. Then t X i=1 zi = Θ2 w−1 + (Θ2 + Θ1 )w0 + (Θ2 + Θ1 + Θ0 ) + (Θ1 + Θ0 )wt−1 + Θ0 wt . 62 t−2 X i=1 wi P If v is a cointegrating vector, i.e. a vector such that v ti=1 zi is stationary, than every multiple of v is also a cointegrating vector. We can make some kind of normalization so we can write v = [−α 1]. For t > 2 P (yt − αxt ) − (y0 − αx0 ) = [−α 1] ti=1 zi = [−α 1]Θ2 w−1 + [−α 1](Θ2 + Θ1 )w0 + [−α 1](Θ2 + Θ1 + Θ0 ) Pt−2 i=1 wi (begin) (4.17) (middle) + [−α 1](Θ1 + Θ0 )wt−1 + [−α 1]Θ0 wt (end) The mean of (4.17) is constant for every Θ1 , Θ2 and α. The variance, however, is not. The number of terms in (begin) and (end) are the same for every t, so only the variance of the (middle) part of (4.17) is depending on t. To resolve this, Θ2 , Θ1 and α have to satisfy: [−α 1] (Θ2 + Θ1 + Θ0 ) = 0. The matrix (Θ2 + Θ1 + Θ0 ) must have an eigenvalue zero with eigenvector [−α 1]. Then (4.17) is a stationary process. The same argument goes for q > 2. So if the difference process zt is assumed to be an MA(q), then for (xt , yt ) to be cointegrated the parameters have to satisfy: matrix (Θq + Θq−1 + · · · + Θ0 ) has eigenvalue 0. (4.18) The corresponding eigenvector is the cointegrating relation. Now we have a method to generate cointegrated data that is unlikely to satisfy the assumptions of the Engle-Granger method as well as the Johansen method. Engle-Granger assumes that yt − αxt is an AR(p) process and Johansen assumes that the vector (xt , yt ) is a VAR(p). In section 6.4 we will see if the Engle-Granger method is robust enough to identify data generated in the way here described as cointegrated. 63 It should be possible to construct a new method for testing for cointegration. With real data it is obvious we can determine the difference process zt . It is, however, pretty difficult to estimate the parameters of the MA(q) with only 500 observations, specially when q becomes large. But if we could, than we could base a hypothesis test on the estimated eigenvalue closest to zero of the estimated matrices. We do not proceed with this in this report. 64 Chapter 5 Dickey-Fuller tests In the literature that describes Dickey-Fuller tests there a lot of differences in the critical values. Some do not state clearly which true model is used, so the null hypothesis is not clear. Sometimes it seems that different models are used at the same time and sometimes there is the exact same model but the critical values are just different. That is why this chapter discusses the asymptotic distributions of the (Augmented) Dickey-Fuller test statistic to find the critical values for this test. In other words, this section discusses the asymptotic distributions for OLS estimated coefficients of unit root processes. They differ from those for stationary processes. The asymptotic distributions can be described in terms of functionals of Brownian motion. In the first section some notions and facts from probability theory, used to establish these distributions, are stated. In the next three sections the asymptotic distribution of the estimated coefficients for a first-order autoregression when the true process is a random walk are derived, i.e., the asymptotic distribution of the DF test statistic for case 1 to case 3. These distributions turn out to depend on whether a constant is included in the estimated regression. In section 5.5 the power of the three different cases is investigated. In section 5.6 the properties of the estimated coefficients for a pth-order autoregression are derived, i.e., distributions of the ADF test statistics. The book of Hamilton [6] is used for the derivation of the asymptotic distributions, this book clearly distinguishes the different models. 65 5.1 Notions/ facts from probability theory First we need some definitions and theorems. For the following three definitions we assume that {XT } is a sequence of random variables, and X is a random variable, and all of them are defined on the same probability space (Ω, F, P). Convergence almost surely: The sequence of random variables {XT }∞ T =1 converges almost surely towards random variable X if P({ω ǫ Ω : lim XT (ω) = X(ω)}) = 1. T →∞ Notation: XT → X a.s. Convergence in probability: The sequence of random variables {XT }∞ T =1 converges in probability towards random variable X if ∀ε > 0 lim P({ω ǫ Ω : |XT (ω) − X(ω)| > ε}) = 0. T →∞ P Notation: XT → X. Convergence in distribution: The sequence of random variables {XT }∞ T =1 converges in distribution towards random variable X if for all bounded continuous functions g it holds that E g(Xt ) → E g(X). D Notation: XT → X. Central limit theorem: Let X1 , X2 , . . . be a sequence of i.i.d variables such that E X12 < ∞. Define E X1 = µ and var(X1 ) = σ 2 . Then √ D T (X̄T − µ) → N (0, σ 2 ), for T → ∞, P where X̄T = T1 Tt=1 Xt . 66 Law of large numbers: Let X1 , X2 , . . . be a sequence of i.i.d. variables such that E|Xt | < ∞, then T 1X Xt → E X1 a.s. for T → ∞. T t=1 Continuous mapping theorem(random vectors): D Let X1 , X2 , . . . be a sequence of random (n × 1) vectors with XT → X and let g : Rn → Rm be a continuous function, then D g(XT ) → g(X). A similar results hold for sequences of random functions: Continuous mapping theorem(random functions): D Let {ST (·)}∞ T =1 and S(·) be random functions, such that ST (·) → S(·) and let g be a continuous functional, then D g(ST (·)) → g(S(·)). Definition Brownian motion: Standard Brownian motion W (·) is a continuous-time stochastic process, associating each time point t ∈ [0, 1] with the scalar W (t) such that: (i) (ii) W (0) = 0 , For any time points 0 ≤ t1 ≤ t2 ≤ . . . ≤ tk ≤ 1, the increments [W (t2 ) − W (t1 )], [W (t3 ) − W (t2 )], . . . , [W (tk ) − W (tk−1 )] are independent multivariate Gaussian with [W (s) − W (t)] ∼ N (0, s − t) , (iii) W (t) is continuous in t with probability 1. Although W (t) is continuous in t, it cannot be differentiated using standard calculus: the direction of change at t is likely to be completely different from that at t + δ, no matter how small we make δ. 67 Now we like to derive something that is known as the functional central limit theorem. Let ut be i.i.d variables with mean zero and finite variance σ 2 . Given a sample size T , we can construct a variable XT (r) from the sample mean of the first rth fraction of observations, r ∈ [0, 1], defined by ⌊T r⌋ 1X ut , XT (r) = T t=1 where ⌊T r⌋ denotes the largest integer that is less than or equal to T times r. For any given realization, XT (r) is a step function in r, with   0 for 0 ≤ r < 1/T,     u /T for 1/T ≤ r < 2/T,  1 (u1 + u2 )/T for 2/T ≤ r < 3/T, XT (r) =  ..   .    (u + · · · + u )/T for r = 1. 1 T Then √ ⌊T r⌋ 1 X T XT (r) = √ ut = T t=1 By the central limit theorem p ⌊T r⌋ X ⌊T r⌋ 1 √ p ut . T ⌊T r⌋ t=1 ⌊T r⌋ ³p 1 X D ut → N (0, σ 2 ) ⌊T r⌋ t=1 √ ´ √ √ while ⌊T r⌋/ T → r. Hence the asymptotic distribution of T XT (r) √ is that of r times a N (0, σ 2 ) random variable, or √ D T [XT (r)/σ] → N (0, r). Consider the behavior of a sample mean based on observations ⌊T r1 ⌋ through ⌊T r2 ⌋ for r2 > r1 , than we can conclude that this too is asymptotically normal √ D T [XT (r2 ) − XT (r1 )] /σ → N (0, r2 − r1 ). √ More generally, the sequence of stochastic functions { T XT (·)/σ}∞ T =1 has an asymptotic probability law that is described by standard Brownian motion W (·): √ D T XT (·)/σ → W (·) . (5.1) 68 There is a difference between the expressions XT (·) and XT (r), the first denotes a random function while the last denotes the value that function assumes at time r, it is a random variable. Result (5.1) is known as the functional central limit theorem. The derivation here assumed that ut was i.i.d. Proposition 1: Suppose that zt follows a random walk without drift zt = zt−1 + ut , where z0 = 0 and ut is i.i.d. with mean zero and finite variance σ 2 . Then (i) (ii) (iii) (iv) P T −1/2 Tt=1 ut P T −3/2 Tt=1 zt−1 P 2 T −2 Tt=1 zt−1 P T −1 Tt=1 zt−1 ut D → D → D → D → σW (1) , R1 σ 0 W (r)dr , R1 σ 2 0 W (r)2 dr , σ 2 (W (1)2 − 1) /2 . Proof of proposition 1: (i) follows from the central limit theorem. W (1) denotes a random variable with a N (0, 1) distribution, so σW (1) denotes a random variable with a N (0, σ 2 ) distribution. (ii): Note that XT (r) can be written as   0 for 0 ≤ r < 1/T ,     z /T for 1/T ≤ r < 2/T ,  1 z2 /T for 2/T ≤ r < 3/T , XT (r) =  ..   .    z /T for r = 1 . T The area under this step function is the sum of T rectangles, each with width 1/T : Z 1 0 XT (r) dr = z1 /T 2 + · · · + z T −1 /T 2 . Multiplying both sides with Z 1 √ √ T: T XT (r) dr = T −3/2 0 T X t=1 69 zt−1 . Statement (ii) follows by the functional central limit theorem and the continuous mapping theorem. (iii): Define ST (r) as ST (r) = T [XT (r)]2 . This can be written as   0   2   z  1 /T z22 /T ST (r) =  ..   .    z 2 /T for 0 ≤ r < 1/T , for 1/T ≤ r < 2/T , for 2/T ≤ r < 3/T , for r = 1 . T It follows that Z 1 0 ST (r) dr = z12 /T 2 + · · · + zT2 −1 /T 2 . By the continuous mapping theorem: i2 h√ D T XT (r) → σ 2 [(W (·)]2 . ST (r) = Again applying this theorem: Z Z 1 D 2 ST (r) dr → σ 1 W (r)2 dr , 0 0 which gives statement (iii). (iv): Note that for a random walk 2 zt2 = (zt−1 + ut )2 = zt−1 + 2zt−1 ut + u2t , summing over t = 1, 2, . . . , T results in T X zt−1 ut = 1/2(zT2 t=1 − z02 ) Recall that z0 = 0 and dividing by T gives T −1 T X zt−1 ut = t=1 = − 1/2 T X u2t . t=1 T 1 X 2 zT2 u − 2T 2T t=1 t T ST (1) 1 X 2 u . − 2 2T t=1 t 70 But ST (1) →D σ 2 W (1)2 and by the law of large numbers T −1 which proofs (iv). PT t=1 P u2t → σ 2 Now we are ready to construct some asymptotic properties of OLS estimators of AR(1) processes when there is an unit root. 5.2 Dickey-Fuller case 1 test Consider a AR(1) process zt = ρzt−1 + ut , for t = 1, . . . , T , (5.2) with ρ ≥ 0 and where ut ∼ i.i.d with mean zero and finite variance σ 2 . The OLS estimate of ρ is given by PT zt−1 zt ρ̂ = Pt=1 . T 2 t=1 zt−1 The t statistic S, used for testing the null hypothesis that ρ is equal to some particular value ρ0 , is given by S= ρ̂ − ρ0 σ̂ρ̂ where σ̂ρ̂ is the standard error of the OLS estimate of ρ: !1/2 à T X 2 σ̂ρ̂ = rT2 / zt−1 t=1 with T rT2 = 1 X (zt − ρ̂zt−1 )2 . T − 1 t=1 When (5.2) is stationary, i.e. ρ < 1, S has an limiting Gaussian distribution: D S → N (0, 1). But Dickey-Fuller tests the null hypothesis that ρ = 1, so we like to know the limiting distribution of S when ρ = 1. Then we can write S as: S= ρ̂ − 1 =³ σ̂ρ̂ ρ̂ − 1 . PT 2 ´1/2 2 rT / t=1 zt−1 71 (5.3) The numerator of (5.3) can be written as: PT ρ̂ − 1 = Pt=1 T zt−1 ut 2 t=1 zt−1 . (5.4) Substituting this in (5.3): PT t=1 zt−1 ut ´1/2 1/2 2 z (rT2 ) t=1 t−1 S = ³ PT P T −1 Tt=1 zt−1 ut . = ³ PT 2 ´1/2 2 1/2 −2 (rT ) T t=1 zt−1 Apart from the initial term z0 , which does not affect the asymptotic distributions (unfortunately it could affect the finite sample size distributions, we will see this later on), the variable zt is the same as in proposition 1. So it P follows from proposition 1 (iii) and (iv) together with rT2 → σ 2 , that as T → ∞: D S→³ 1 (W (1)2 − 1) σ 2 (W (1)2 − 1) /2 2 = ´1/2 ´1/2 . ³R R1 1 2 2 2 2 1/2 W (r) dr σ 0 W (r) dr (σ ) 0 (5.5) In conclusion, when the true model is a random walk without a constant term (ρ = 1, c = 0) and we only estimate ρ and not a constant, basically a regression without intercept, the t statistic S has limiting distribution (5.5). This test statistic is referred to as the Dickey-Fuller case 1 test statistic. Note that W (1) has a N (0, 1) distribution, meaning that W (1)2 has a χ2 (1) distribution. We can approximate this asymptotic distribution and the corresponding critical values by simulating a lot of paths W on the interval [0, 1]: ˆ Divide the interval [0,1] in n equal pieces. ˆ Take u1 , u2 , . . . , un i.i.d from a N (0, 1/n) distribution. ˆ Set W(0)=0. 72 ) + ui ˆ Build path W by: W ( ni ) = W ( i−1 n for i = 1, 2, . . . , n. For each path the fraction in the right-hand side of (5.5) can be calculated, approximating the integrals with Riemann sums. Then the density of S can be estimated with applying a Gaussian kernel estimator on all these values. Figure 5.1 shows the estimated density for 5,000 paths and n = 500. 0.4 0.2 0.0 ......... ..... .......... .... ..... ... .... ... ... . ... ... ... .. .... . . ... . . ... ... . ... .. ... . ... .. . ... .. ... . ... .. . ... .. . ... .. . ... .. ... . .. ... . ... .. . ... .. . ... .. ... . ... .. . .... . . .... . .... ... . .... . . ..... . .. .... . . ..... ... . . . ..... .... ...... . . . ...... ... . . . . ....... ... . . . . ......... . . ..... . ............... . . . . . . . . . ........................................................................ .................................................................... −4 −2 0 2 4 Figure 5.1: Asymptotic density of DF case 1 test statistic. We can approximate the 1%, 5% and 10% critical values by calculating the corresponding quantiles of all the calculated fractions. Table 5.1 shows the critical values according to this simulation and the values according to Hamilton [6]. Table 5.1: Critical values for DF case 1. 1% 5% 10% Hamilton -2.58 -1.95 -1.62 simulation -2.56 -1.95 -1.60 These critical values belong to the asymptotic distribution (5.5), which describes the distribution of the DF case 1 test statistic if the sample size T goes to infinity. 73 We approximate the critical values for finite sample sizes T , by simulating in a different way: ˆ Take u1 , . . . , uT from a N (0, σ 2 ) distribution. ˆ Set z0 = 0. ˆ Build path zt : zt = zt−1 + ut , t = 1, . . . , T . ˆ Calculate ρ̂. ˆ Calculate σ̂ρ̂ . ˆ Calculate test statistic: (ρ̂ − 1)/σ̂ρ̂ . ˆ Repeat the preceding steps 5,000 times. We can approximate the 1%, 5% and 10% critical values by calculating the corresponding quantiles of the simulated test statistics. For finite T , the critical values are exact only under the assumption of Gaussian innovations. As T becomes large, these values also describe the asymptotic distribution for non-Gaussian innovations. Table 5.2 shows the critical values according to this simulation for different values of T and σ 2 . Table 5.3 shows the critical values according Hamilton [6]. The critical values should be independent of σ. Table 5.2 shows roughly the same values for different σ 2 , but as σ 2 becomes large there is more dispersion. Figure 5.2 shows the estimated density of the simulated test statistics for different values of σ 2 and T = 500, the graph of figure 5.1 is also displayed. Table 5.2: σ2 = 1 T 1% 5% 100 -2.61 -1.98 250 -2.61 -1.96 500 -2.56 -1.95 Simulated critical values for σ2 = 5 10% 1% 5% 10% -1.63 -2.62 -1.98 -1.63 -1.62 -2.56 -1.94 -1.59 -1.61 -2.59 -1.95 -1.62 74 DF case 1. σ 2 = 10 1% 5% -2.55 -1.94 -2.54 -1.93 -2.59 -2.00 10% -1.61 -1.59 -1.63 Table 5.3: Hamilton’s critical T 1% 5% 100 -2.60 -1.95 250 -2.58 -1.95 500 -2.58 -1.95 0.4 0.2 0.0 values DF case 1. 10% -1.61 -1.62 -1.62 ............. ................ .... ........................................... ......... .................... . ......... ................ ........ ......... ......... ........... .......... . ........... ............ ............ . ........ ......... . . .......... ........ .......... . ........ ..... . ........... .... . . ......... ......... ......... . ....... ........ . ........ ........ ...... . ....... ....... . ..... ....... . ........ ...... ....... . ....... ........ . ...... ...... ....... . ...... ......... . . .......... .... . . .............. .... . . ........... .. . . ................. . ...... .............. . . . . ............. ..... . . . . ............. . . . .............. ......... . . . . .............. ........ . . .............. . . . ......... . ........... . . . . . . . ........... ........... . . . . . . ................. . . ........ . . . ........................................... . . . . . . . . . . . . . . . . . . . . ......................................................................... ............................................................................................. −4 −2 0 2 4 Figure 5.2: Estimated density of DF case 1 for different σ 2 and T = 500. The initial term z0 does not affect the asymptotic distribution. Unfortunately it does affect the distribution when the sample size is finite. With DickeyFuller case 1, we basically fit a line that goes through the origin. If the initial term is large the slope of this line, ρ̂, is closer to one then when the initial term is small. The standard error for ρ̂, σ̂ρ̂ is a lot smaller for a large initial value then for a small initial value. That is why the test statistic for a large initial value is likely to be larger than the test statistic for a small initial value. The estimated densities for initial values z0 = 0, 1, 10, 50, 100, 500 are shown in figure 5.3. The solid lines correspond to z0 = 0, 1, 10 , the dashed lines correspond to z0 = 50, 100, 500. We see a shift to the right as the initial value increases. The density found with simulating Brownian motion is not displayed, it lies among the three solid lines. 75 0.4 0.2 0.0 ......................... .......................................... ........ ... ................. ......... ......... .......... .................. .. . .. . ...... .... ..... .......... ..... ......... ........ ........ . .......... ...... .. .. . . . ..... .. .... ......... . ...... . . . . . . . .......... ... .. .. ...... .......... .. ...... .. . .. .......... ...... .... ... .......... ....... .. ..... . ... . . ........ . . ... ..... .... .... . ....... .. . ...... ....... .. . . . . ....... .. . ... ... ... . .. ....... . .. .. .. ... .. ...... ..... .. ... ...... . . ... ... ... .... ........ .... ..... . .. .. . ....... ..... ... ..... ... .... ...... . . ...... . .. .. ......... ........ . . ........ .. ... ........... ..... . ........ ... .. ...... . . .. ....... ..... . ........ .. ... .......... ..... . ...... ...... ... . . ...... ..... . ....... . .. . ...... . .... .......... .... ..... ....... ........... .. ..... ....... .. ............ ........ . . . . ....... ........ . .......... . ..... . . ........ ......... . . . . . .......... .. ....... ........... ........ . . . . . .......... .......... . . ................. ... . . .......... .... ...... . . . . . .............. ....... ............ .............. .. . . . . . . . . . . . . ............................ . ...................... ....... . . . . . . .................................................. . . . . . . . . . . . . . .............................................................................. ....... ........................................................................................................ −4 −2 0 2 4 Figure 5.3: Estimated density of DF case 1 for different z0 and T = 500. 5.3 Dickey-Fuller case 2 test In this section we consider the AR(1) process with a constant: zt = c + ρzt−1 + ut , for t = 1, . . . , T, where ut ∼ i.i.d with mean zero and finite variance σ 2 . We are interested in under the null hypothesis that c = 0 the properties of test statistic S = ρ̂−1 σ̂ρ̂ and ρ = 1. The OLS estimates are given by ¸−1 · P · ¸ · ¸ P zt ĉ T zt−1 P P P = 2 ρ̂ zt−1 zt−1 zt zt−1 here Σ denotes summation over t = 1, . . . , T . The deviation of the estimates from the true values is ¸ · ¸ ¸−1 · P · P T z u ĉ t−1 t P 2 P = P . zt−1 zt−1 ρ̂ − 1 zt−1 ut (5.6) The estimates ĉ and ρ̂ have different rates of convergence, a scaling matrix Y is helpful in describing their limiting distributions. Note if v = A−1 w , 76 then Yv = YA−1 w = YA−1 YY−1 w = (Y−1 AY−1 )−1 Y−1 w. (5.7) Here we use the scaling matrix Y= · T 1/2 0 0 T ¸ . With (5.7), equation (5.6) results in · T 1/2 ĉ T (ρ̂ − 1) ¸ = · T 1P −3/2 zt−1 ¸−1 · ¸ P P T −3/2P zt−1 T −1/2 ut P (5.8) 2 T −1 zt−1 ut T −2 zt−1 From the proposition in paragraph 5.1 follows that the first term of the right side of (5.8) converges to R · ¸ ¸ · P 1 σ R W (r)dr 1P T −3/2P zt−1 D R −→ 2 σ W (r)dr σ 2 W (r)2 dr T −3/2 zt−1 T −2 zt−1 R ¸· ¸ · ¸· 1 W (r)dr 1 0 1 0 R R (5.9) = 0 σ 0 σ W (r)dr W (r)2 dr where the integral sign denotes integration over r from 0 to 1. The second term of the right side of (5.8) converges to ¸ · · ¸ P σW (1) T −1/2 ut D P −→ σ 2 (W (1)2 − 1)/2 T −1 zt−1 ut ¸ ¸· · W (1) 1 0 = σ (W (1)2 − 1)/2 0 σ (5.10) Substituting (5.9) and (5.10) into (5.8) establishes ¸ · T 1/2 ĉ D −→ T (ρ̂ − 1) R · ¸ · ¸−1 · ¸ σ 0 1 W (r)dr W (1) R R (5.11) 0 1 W (r)dr W (r)2 dr (W (1)2 − 1)/2 77 The second element in the vector in (5.11) states that R 1 (W (1)2 − 1) − W (1) W (r)dr D 2 T (ρ̂ − 1) −→ . £R ¤2 R W (r)2 dr − W (r)dr (5.12) We like to know the properties of the t statistic S: S= ρ̂ − 1 , σ̂ρ̂ where rT2 £ ¤ · ¸−1 · ¸ P T z 0 t−1 P P 2 , 1 zt−1 zt−1 σ̂ρ̂2 = rT2 1 X (zt − ĉ − ρ̂zt−1 )2 . = T − 2 t=1 0 1 T (5.13) If we multiply both sides of (5.13) by T 2 , the result can be written as T 2 σ̂ρ̂2 = rT2 From (5.8) follows £ 0 1 ¤ Y · ¸−1 · ¸ P 0 T z t−1 P 2 P Y . 1 zt−1 zt−1 ¸−1 P T z D t−1 P 2 Y −→ Y P zt−1 zt−1 R · ¸−1 · ¸−1 · ¸−1 1 0 1 0 R 1 R W (r)dr 0 σ W (r)dr W (r)2 dr 0 σ · P From equation (5.14) and rT2 → σ 2 follows T 2 σ̂ρ̂2 R ¸−1 · ¸ 0 1 W (r)dr R 0 1 −→ 2 1 W (r)dr W (r) dr 1 = R £R ¤2 . W (r)2 dr − W (r)dr D £ ¤ · R 78 (5.14) Finally, the asymptotic distribution of test statistic S is S = T (ρ̂ − 1) [T 2 σ̂ρ̂2 ]1/2 R (W (1)2 − 1) /2 − W (1) W (r)dr −→ ³R £R ¤2 ´1/2 . 2 W (r) dr − W (r)dr D (5.15) In conclusion, when the true model is a random walk without a constant term (ρ = 1, c = 0) but we do estimate a constant c and of course ρ, the t test statistic S has the asymptotic distribution described by (5.15). This test statistic is referred to as the Dickey-Fuller case 2 test statistic. We can find this asymptotic distribution and the corresponding critical values by simulating a lot of paths W in the same way as in the preceding paragraph. The results are shown in figure 5.4 and table 5.4. Figure 5.4 shows that the distribution of the DF case 2 statistic is shifted more to the left than the DF case 1 statistic. 0.4 0.2 0.0 ......... .... ....... ... ... ... ... .. ... . ... ... ... .. . ... .. . ... .. . ... .. ... . .. ... . ... .. . ... .. . ... .. . ... .. . ... .. ... . .. ... . .. ... . ... .. . ... .. . ... .. . ... .. ... . ... .. . ... .. . ... .. . ... .. . ... . . ... . ... ... . ... .. . .... .. .... . . .... . . .... .. . .... . ... .... . . . ..... ... . ..... . . .. . ...... . . . . .......... .... . . . . ............. . . . . ....... . .......................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................... .................................................. −6 −4 −2 0 2 Figure 5.4: Asymptotic density of DF case 2 test statistic. 79 Table 5.4: Critical values for DF case 1. 1% 5% 10% Hamilton -3.43 -2.86 -2.57 simulation -3.43 -2.85 -2.59 These critical values belong to the asymptotic distribution (5.15), which describes the distribution of the DF case 2 test statistic if the sample size T goes to infinity. We find the critical values for finite sample sizes T , by simulating paths zt as in the preceding paragraph. The only difference is the way we calculate ρ̂. Again we simulate for different values of σ 2 . The results are shown in table 5.5, table 5.6 shows the critical values for DF case 2 according to Hamilton [6]. Figure 5.5 shows the estimated density of the simulated test statistics for different values of σ 2 and T = 500. Table 5.5: σ2 = 1 T 1% 5% 100 -3.54 -2.85 250 -3.40 -2.84 500 -3.42 -2.86 Simulated critical values for σ2 = 5 10% 1% 5% 10% -2.54 -3.50 -2.88 -2.56 -2.53 -3.46 -2.87 -2.56 -2.58 -3.40 -2.86 -2.55 Table 5.6: Hamilton’s critical T 1% 5% 100 -3.51 -2.89 250 -3.46 -2.88 500 -3.44 -2.87 80 DF case 2. σ 2 = 10 1% 5% -3.54 -2.89 -3.46 -2.86 -3.49 -2.87 values DF case 2. 10% -2.58 -2.57 -2.57 10% -2.58 -2.56 -2.58 0.4 0.2 0.0 ............ ................................ ............... ..................... ......... ......... ....... ......... . ...... ....... ......... ...... . ....... ..... ..... . .... ... . ..... .. . ..... ..... ...... . ..... ...... . ..... ...... . ...... ... . ...... ...... . ...... ..... . ...... .... . ......... ..... . ......... ..... ......... . ...... ........ . ...... ..... . ....... ..... ....... . ..... ...... . ..... ..... . . ...... ..... . . ......... ........ ........ . ....... . ........ . . ........ ............. ....... . ............. ......... . ............. .... . . .............. . ...... . ................ . . . ................ ....... . . . . ............... ...... . . . . ................. . . ....... .................. . . . . . . ........................ ...... . . . . . . . . . . ..................................... . . . . . . . . .................................................................... ................................................................................................................ −6 −4 −2 0 2 Figure 5.5: Estimated density of DF case 2 for different σ 2 and T = 500. Like in the preceding section, the initial term z0 does not affect the asymptotic distribution. And fortunately, with case 2, it does not affect the distribution when the sample size is finite as well. With case 2 we estimate a constant even though it is not present in the true model, we basically fit a line which does not have to go through the origin. Then the slope of line, ρ̂, is not closer to one if the initial value is large, such as with case 1. That is why the test statistic for a large initial value is likely to be the same as the test statistic for a small initial value. The estimated densities for initial values z0 = 0, 1, 10, 50, 100, 500 are shown in figure 5.6. 0.4 0.2 0.0 .......... ............................. ............ ......................... ........ .............. ........... ........... . ........... ......... ............ ......... . ............. .......... . ........... ......... . ............. ......... . ........... ....... ............ . ........ ........ . ........ ......... . .......... ....... . ........ ......... . ........ ......... . ......... ......... . .......... ............. ............ . ........ ......... . ...... ...... . . . ....... ....... ......... . ... ....... . . ...... ..... . . ....... ..... . ....... . ...... ....... . . ......... ....... . . .......... ...... . ........... . ........ .............. . . ...... ................ . . . ............. ........ . . . . ............. ...... . . . . ........... . ....... ....... . . . . . .......... ........... . . ............ . . ....... ............. . . . . . . . . . . ................. ........... . . . . . . . . ..................... . . . . . . . . .......... ............................................................................................................... ............................................................................................................. .. −6 −4 −2 0 2 Figure 5.6: Estimated density of DF case 2 for different z0 and T = 500. 81 5.4 Dickey-Fuller case 3 test In this section we consider again the AR(1) process with a constant zt = c + ρzt−1 + ut , for t = 1, . . . , T, (5.16) where ut ∼i.i.d with mean zero and finite variance σ 2 . We are interested in the properties of test statistic S = ρ̂−1 under the null hypothesis that ρ = 1 σ̂ρ̂ and c 6= 0. The deviation of the estimates from the true values is ¸ · ¸ ¸−1 · P · P T z u ĉ − c t−1 t P 2 P = P , zt−1 zt−1 ρ̂ − 1 zt−1 ut (5.17) here Σ denotes the summation over t = 1, . . . , T . We examine the four different sum terms in the right side of (5.17) separately. First notice that (5.16) can be written as: zt = z0 + c t + (u1 + u2 + . . . + ut ) = z0 + c t + vt , where v t = u1 + . . . + ut , for t = 1, . . . , T, with v0 = 0 . Consider the behavior of the sum T X zt−1 = T X t=1 t=1 [z0 + c(t − 1) + vt−1 ] . (5.18) The first term in (5.18) is T z0 , so divided by T is a fixed value. The second term is equal to T X c(t − 1) = (T − 1)T c/2 . t=1 In order to converge, this term has to be divided by T 2 : T 1 X c(t − 1) → c/2 . T 2 t=1 82 The third term in (5.18) converges when divided by T 3/2 , according to proposition 1 ii ): Z 1 T X D −3/2 W (r)dr . vt−1 → σ T 0 t=1 The order in probability of the three terms in (5.18) is: T X zt−1 = T X T X z0 + |t=1 {z } t=1 c(t − 1) + T X O(T 2 ) Op (T 3/2 ) |t=1 {z O(T ) } vt−1 . |t=1{z } The time trend c(t − 1) asymptotically dominates the other two components: T 1 X P zt−1 → c/2 . 2 T t=1 In the same way, we have T X 2 zt−1 = t=1 = T X t=1 T X [z0 + c(t − 1) + vt−1 ]2 z02 + {z } |t=1 + t=1 | 2 2 c (t − 1) + |t=1 O(T ) T X T X {z O(T 3 ) 2z0 c(t − 1) + {z O(T 2 ) } T X t=1 | } T T X t=1 tvt−1 = T −3/2 T X t=1 2 vt−1 |t=1{z } Op (T 2 ) 2z0 vt−1 + {z Op (T 3/2 ) where the order of the last term follows from −5/2 T X } D (t/T )vt−1 → σ T X t=1 | Z 2c(t − 1)vt−1 {z Op (T 5/2 ) } 1 rW (r)dr . 0 The time trend c2 (t−1)2 is the only term that does not vanish asymptotically if we divide by T 3 : T 1 X 2 P 2 z → c /3 T 3 t=1 t−1 83 From the central limit theorem follows that And finally T X zt−1 ut = t=1 T X t=1 = z0 T X ut + | Op (T 1/2 ) T T X t=1 T X t=1 | {z } −3/2 t=1 ut is of order Op (T 1/2 ). [z0 + c(t − 1) + vt−1 ] ut t=1 from which PT P zt−1 ut → T c(t − 1)ut + {z Op (T 3/2 ) −3/2 T X t=1 } T X t=1 | vt−1 ut {z Op (T ) } c(t − 1)ut . This results in the deviation of the OLS estimates from their true values satisfy ¸ ¸−1 · ¸ · · Op (T 1/2 ) Op (T ) Op (T 2 ) ĉ − c = Op (T 2 ) Op (T 3 ) ρ̂ − 1 Op (T 3/2 ) In this case the scaling matrix is Y= · T 1/2 0 0 T 3/2 ¸ Using (5.7) we get · T 1/2 (ĉ − c) T 3/2 (ρ̂ − 1) ¸ = · T 1 P −2 zt−1 ¸−1 · ¸ P P T −2 P zt−1 T −1/2 u t P (5.19) 2 T −3 zt−1 T −3/2 zt−1 ut where the first term of the righthand side converges to · ¸ ¸ · P 1 c/2 1 T −2 P zt−1 P P −→ = A. 2 c/2 c2 /3 T −2 zt−1 T −3 zt−1 The second term of (5.19) satisfies · ¸ · ¸ P P −1/2 T −1/2 u T u t t P P = + op (1) T −3/2 zt−1 ut T −3/2 c(t − 1)ut 84 (5.20) Therefore · ¸¶ ¸ · µ· ¸ P T −1/2 0 u 1 c/2 D t 2 P ,σ −→ N c/2 c2 /3 0 T −3/2 zt−1 ut N (0, σ 2 A) . = (5.21) It follows from (5.19)-(5.21) that · 1/2 ¸ T (ĉ − c) D −→ N (0, A−1 σ 2 AA−1 ) = N (0, σ 2 A−1 ) 3/2 T (ρ̂ − 1) We like to know the properties of the t statistic S: S= ρ̂ − 1 , σ̂ρ̂ (5.22) where σ̂ρ̂2 = rT2 with £ 0 1 ¤ · ¸−1 · ¸ P T z 0 t−1 P P 2 1 zt−1 zt−1 T rT2 = 1 X (zt − ĉ − ρ̂zt−1 )2 . T − 2 t=1 Test statistic (5.22) can be written as: S= The denominator is: à T 3/2 σ̂ρ̂ = = rT2 à rT2 £ £ 0 T 3/2 0 1 ¤ T 3/2 (ρ̂ − 1) . T 3/2 σ̂ρ̂ ¤ Y · · PT zt−1 PT zt−1 85 ¸!1/2 ¸−1 · P 0 P zt−1 2 zt−1 T 3/2 ¸−1 · ¸!1/2 P 0 z P t−1 . Y 2 1 zt−1 (5.23) We have already shown that µ ¸−1 ¸ · · ¶−1 P P T z T z t−1 t−1 −1 −1 P 2 P 2 P Y = Y Y Y P zt−1 zt−1 zt−1 zt−1 ¸ · P −2 1 T z t−1 P P 2 = T −2 zt−1 T −3 zt−1 converges in probability towards A. P Because rT2 → σ 2 , the denominator converges towards √ P T 3/2 σ̂ρ̂ −→ σc/ 3 . Thus, the test statistic S is asymptotically Gaussian. The regressor yt−1 is asymptotically dominated by the time trend c(t − 1). In large samples, it is as if the explanatory variable yt−1 were replaced by the time trend c(t − 1). That is why the asymptotic properties of ĉ and ρ̂ are the same as those for the deterministic time trend regression. Therefore, for finite T test statistic S has a t distribution. In conclusion, when the true model is a random walk with a constant term (ρ = 1, c 6= 0) and we estimate both ρ and c then the t test statistic S has an asymptotic distribution equal to the standard Gaussian distribution: D S −→ N (0, 1) This test statistic is referred to as the Dickey-Fuller case 3 test statistic. The critical values for T → ∞ are given in table 5.7. Table 5.7: Critical values for DF case 3. 1% 5% 10% N (0, 1) -2.33 -1.64 -1.28 For finite T the Dickey-Fuller case 3 test statistic is t distributed, but the degrees of freedom are large so it is almost standard normal. We can also find the critical values for finite T , by simulating paths zt as in the preceding paragraphs. Again we simulate for different values of σ 2 . The results are shown in table 5.8. Figure 5.7 shows the estimated density of the simulated test statistics for different values of σ 2 while T = 500 and c = 2.5, the standard normal density is also displayed. 86 Table 5.8: σ2 = 1 T 1% 5% 100 -2.36 -1.76 250 -2.37 -1.68 500 -2.38 -1.75 0.4 0.2 0.0 Simulated critical values for σ2 = 5 10% 1% 5% 10% -1.37 -2.51 -1.82 -1.46 -1.33 -2.41 -1.74 -1.35 -1.33 -2.41 -1.69 -1.35 DF case 3. σ 2 = 10 1% 5% -2.48 -1.84 -2.40 -1.79 -2.45 -1.76 10% -1.49 -1.40 -1.38 ............................. ....................................... ............ ....... ........................... ........................... .............. . . ... . ........... ................... ........... ........... ............. ............ .............. . ............ ....... . . ............. ........ . . ............ ........ ............ . ........ . ............ . ............ .......... . ........... ......... . . ........... ........ ........... . ....... ......... . ......... ........... . ........... ............... ............. . .............. .............. . ......... ........... . . . . ............ ............. . . . ........... ........... . ............ . . . ............. ................... . . . . ............... ............... . . ............... . . ............... ............... . . . . ............... ............... . . . . . . ................ . ................ . . . ..................... . . . . . . ...................... ......................... . . . . . . . . . ............................. ...................... . . . . . . . . . . . . . . . . . . . . . . . . ....................................................................................................... . . . . . . . . . ............................................................................. ... −4 −2 0 2 4 Figure 5.7: Estimated density of DF case 3 for different σ 2 and T = 500. To see if the value of the constant c has an impact on finite sample distribution of the Dickey-Fuller case 3 statistic, we simulate paths zt for c = 0.1, 0.5, 1, 2.5, 10. The results are shown in figure 5.8, the standard normal density is also displayed. The graph most left corresponds with c = 0.1. 87 0.4 ....................... ....................................... ....... ........ ..................................................................................... . . ..... ................ ........................ .... ............... . . . . . . . . .... ................... ............ ..... ... .................... .. ................. ..... . . ................. .... .. ............... ... ........... . . ... ............ .. ... .......... . . . ... ... .......... ......... .. . . . ... . .......... ... ...... . . . . . ... ... ....... ............... ... . . ... ... ....... ........... .. . . ... ........ . ... . .......... . . ... ...... . . . ... ... ........ ... .............. . . . ... . .......... ... ......... . .. . . ... ... ......... .. .................. . .... ....... . . ... .. .. ........ ... ........ . . . ... . .......... .... ....... . .. . . ... ............ . .. ......... . . ............. . . . . ... ............. ............ .. . . . . . . .... ................ ............. ... . . . . . . .............. . . .... ............... ... .............. . . . . . . . . ..... ................. ................. .. . . . . . . . . . . . . ............... ..... ........................ ... . . . . ................ . . . . . ..... ........................ ... ................. . . . . . . . . . . . . ...... .................. ........................... . .... . . . . . . . . ................ . . . . . . . . . . ........... ........................ ..... . . . . . . . . . . . . . . . . . . . . . . . .......................................................................................... . . . . . . . .. ... .. ........ ...................................................................................................... .................................................................................................................................................................. 0.2 0.0 −4 −2 0 2 4 Figure 5.8: Estimated density of DF case 3 for different c and T = 500. It looks like a small value of c causes a shift to the left. In figure 5.9 the estimated distribution for c = 0.01, 0.05, 0.1 is plotted with solid lines. The dashed line is the distribution of case 1 and the dotted line is the distribution of case 2. For c = 0.01 the estimated density is almost the same as the density found for case 2. This makes sense because the steps taken in the case 2 en case 3 tests are exactly the same, the only difference is the true model for case 2 has no constant and for case 3 it has. So for a decreasing constant the case 3 test statistic converges to the case 2 statistic. The other way around is also valid: for an increasing constant, in absolute value, the case 2 test statistic converges to the case 3 statistic because the tests are the same but now there is a constant in the true model. 0.4 0.2 0.0 ............. ..... ............ .... .... .. ..... . . ... .. ....................... ...... . . . . . ......... ...... ....... .... ...... . .......... .... .... .... . .............. . .... ... .............. ........... .. . . . . ..... ... .... ... ... .. ..... ..... . . . . ..... .. .... ... ... .. ... .... . ..... . . . ... .. ... .. ... ... ... . ... .... ... . ..... ... .. ... .... ... ... .. .. ...... .... . . ... ... . . . ..... .... .. . ..... ... ... . . . . ..... ... . ... ..... . ..... ... . . . . .... .. ... . . . ..... ... . . . . . ... ... . ..... ... . .. ..... ... . ... ... . . . .... ... . ..... ... ... . . . . . . ..... .. ... ..... . .. ..... ... . . . . . .. ... ... ... . ..... ... . . . . . .. .. ... ... ... . .. ..... ... . . . .. . ... . . .. ... ... . ..... ... . ... . . ... .. .. .... .. . .... ... ... . . . .. . . . . . . . ... . . ... .... .... ... .. . . . . . . ..... . ... .... .... ... .. . .... ... .. .. .... .... ... ..... .... ....... .... . . . . . . . . ..... . . . . . . .... .... . . .. ......... . .. .... ..... ..... .... ..... ........ ... .... . . ..... ............... ....... . . . . . . . ..... ....... ..... .. .. .......... ...... . . . . . . . . . . . . . . . . ..... . .. ......... ..... .......... ....... . . . . . . . . . ... ...... ...... ...... ..... .. ........... ........ .... . . . . ..... . . . . . . . . . . . . . . .............. ......... .......... ......... .......... . . . . . . . . . . . . . . . . . . . . . . . . .................... .............. .......................... ....... ... ..... .. ............................................................................................................. ....... ....... .... ................................................................................. ....... ....... −4 −2 0 2 4 Figure 5.9: Estimated density of DF case 3 for small c and T = 500. 88 To be consistent, we also simulate for several different initial values z0 . Figure 5.10 shows the results for z0 = 0, 1, 10, 50, 100, 500 while T = 500 and c = 2.5. The figure suggests that the initial value z0 does not affect the density of the test statistic for finite sample sizes. 0.4 0.2 0.0 ......... ............................... ............................................................ .................. .......... ........ .................... . . . . ....... .. . ........ .......... ...... ............ . . . ..... ....... . . ...... . . . . ...... .............. . ...... ........ .... . . ..... .......... . .... ...... . ..... . . ......... .... . . ....... ....... . . ....... ...... . . . . .......... ......... ........ . ....... ........ . ......... ...... . . ......... ...... . . ............ . . ......... . . ............ . . ............ ......... . . . . .......... ........... . . . . ........... . ............ . . ........... . . . . . . . ........... ................. . . . . . . . ............. .............. . . . . . . . .................. . . . ................. . ................ . . . . . . . . . ................... ........... . . . . . . . . . . . . . . . .................................... . . .. . .............................................................................................. ...................................................................................................... −4 −2 0 2 4 Figure 5.10: Estimated density of DF case 3 for different z0 There also exist a case 4 for the Dickey-Fuller test, this includes a deterministic time trend in the true model. We are not interested in spread processes with deterministic trends so we do not discuss this case. 5.5 Power of the Dickey-Fuller tests In this section we investigate how ’powerful’ the different cases of the DickeyFuller test are. We generate paths zt which do not have a unit root, ρ < 1 in zt = c + ρzt−1 + ut and see if the different tests see it as stationary. In other words whether the outcome of the test is to reject the null hypothesis that there is a unit root, ρ = 1. For different values of c and z0 we generate 1,000 paths zt , t = 0, . . . , T where T = 500 and count the number of rejections. The results are presented in tables. 89 First we summarize the previous sections. Case 1: The true model of case 1 is zt = zt−1 + ut where ut ∼ i.i.d with mean zero and finite variance σ 2 . We estimate the model zt = ρzt−1 + ut . The critical values for test statistic S = ρ̂−1 when T = 500 are σ̂ρ̂ 1% 5% 10% -2.58 -1.95 -1.62 Case 2: The true model of case 2 is zt = zt−1 + ut where ut ∼ i.i.d with mean zero and finite variance σ 2 . We estimate the model zt = c + ρzt−1 + ut . The when T = 500 are critical values for test statistic S = ρ̂−1 σ̂ρ̂ 1% 5% 10% -3.44 -2.87 -2.57 Case 3: The true model of case 3 is zt = c + zt−1 + ut where ut ∼ i.i.d with mean zero and finite variance σ 2 and c 6= 0. We estimate the model zt = c+ρzt−1 +ut . The critical values for test statistic S = ρ̂−1 when T = 500 σ̂ρ̂ are 1% 5% 10% -2.33 -1.64 -1.28 We have seen that the initial value z0 does affect the finite sample distribution of Dickey-Fuller case 1 but does not affect case 2 and case 3. The value of c does affect the distribution of case 3: as c becomes smaller the distribution converges to the distribution of case 2. IMC has provided 10 pairs, the range of ĉ of these 10 pairs is (−0.01, 0.1). The absolute initial value z0 of the 10 pairs is less than 1.5 for 9 of the 10 pairs. With one pair z0 is 106. So we are interested in the power of the three tests for small values of c and z0 , but we will also look at large values of z0 . 90 We start with generating paths with c = 0 and z0 = 0. In all following tables T = 500, σ = 1 and the number of generated paths is 1,000. Table 5.9 shows the number of rejections for the different tests and different values of ρ. For ρ = 1 we have simulated paths under the null hypothesis of case 1 and case 2, the number of rejections are in line with what we expected. The case 3 test does not perform very well, with ρ = 1 it rejects the null hypothesis that ρ = 1 632 out of 1,000 times on the 10% level. For ρ just under 1, the case 1 test performs better than the case 2 test. Table 5.9: Case 1 ρ 1% 5% 0.9 1000 1000 0.95 986 1000 0.975 479 910 0.99 73 310 0.995 41 144 1 10 48 1.01 0 0 Number of rejections, c = 0, z0 = 0. Case 2 Case 3 10% 1% 5% 10% 1% 5% 1000 1000 1000 1000 1000 1000 1000 746 966 997 1000 1000 982 135 427 663 835 991 528 22 116 205 338 771 274 21 77 148 231 617 96 12 54 105 171 456 0 0 0 0 1 4 10% 1000 1000 1000 918 783 632 12 Table 5.10 shows the number of rejections for generated paths with c = 0 and z0 = 100. For ρ = 1 we simulated under the null hypothesis of case 1 and 2, the number of rejections for case 1 is small. This was expected because of figure 5.3. Again, the case 3 test does not perform very well when ρ = 1. The case 1 and 2 tests do perform well, with ρ slightly less than 1 they reject the null of an unit root. 91 Table 5.10: Number of rejections, c = 0, z0 = 100. Case 1 Case 2 Case 3 ρ 1% 5% 10% 1% 5% 10% 1% 5% 0.99 1000 1000 1000 1000 1000 1000 1000 1000 0.995 1000 1000 1000 406 683 805 874 974 1 8 26 47 10 52 106 157 461 1.01 0 0 0 0 0 0 0 0 10% 1000 995 631 0 Table 5.11 shows the number of rejections for c = 0.1 and z0 = 0. For ρ = 1 we have simulated under the null hypothesis of case 3 but the case 3 test still rejects to many times. Because the null is fulfilled we expect the number of rejections to be around 10, 50 and 100 for the 1%, 5% and 10% levels respectively. In figure 5.8 we already saw that the case 3 test is dependent of the value of c, when c = 0.1 the distribution of the case 3 test statistic is shifted to the left compared to its asymptotic distribution. We see that with this setting the case 2 test performs more or less the same as with c = 0 and z0 = 0 except when ρ = 1, in which case it rejects less. The null is not satisfied for the case 2 test, so this is not a bad outcome. The less rejections for ρ = 1 the better. The case 1 test performs less compared to case 2 test as well as the setting c = 0 and z0 = 0. Table 5.11: Case 1 ρ 1% 5% 0.9 1000 1000 0.95 758 968 0.975 111 434 0.99 6 37 0.995 0 7 1 0 2 1.01 0 0 Number of rejections, c = 0.1, z0 = 0. Case 2 Case 3 10% 1% 5% 10% 1% 5% 1000 1000 1000 1000 1000 1000 997 848 995 1000 999 1000 675 157 476 729 831 996 87 39 142 262 376 765 20 18 90 158 257 599 3 5 18 37 73 217 1 0 0 0 2 8 92 10% 1000 1000 1000 917 768 329 9 Table 5.12 shows the number of rejections for c = 0.1 and z0 = 100. It is remarkable how good the case 1 test performs, it rejects almost every time when ρ is slightly below 1 and does not reject when ρ ≥ 1 even though the null hypothesis is not satisfied. In section 5.2 was explained that this test basically fit a line through the origin and because the scatterplot starts around (100,100) it estimate ρ very accurately which makes the standard error relatively small. With this setting, we know there is an intercept of 0.1 but this is so small compared to the starting point of 100 that the test does not overestimate ρ too much. So when we generate path for ρ < 1, ρ̂ − 1 is negative and divided by the small standard error the test statistic is a large negative value, so the null is rejected. When generating paths with ρ ≥ 1, ρ̂ is always slightly above 1, so the test statistic is a large positive value, so the null is not rejected. Table 5.12: Number of rejections, Case 1 Case 2 ρ 1% 5% 10% 1% 5% 0.975 1000 1000 1000 1000 1000 0.99 1000 1000 1000 996 999 0.995 997 1000 1000 189 451 1 0 0 0 8 21 1.01 0 0 0 0 0 c = 0.1, z0 = 100. Case 3 10% 1% 5% 1000 1000 1000 1000 1000 1000 586 689 913 45 73 234 0 0 0 10% 1000 1000 961 345 0 For illustration purposes, table 5.13 shows the the number of rejections for c = 1 and z0 = 0. The value of c is now much larger than the values of ĉ for the 10 pairs. We see that case 1 test lost all its power, the case 2 test performs well and the case 3 test is finally performing as it should when ρ = 1 and is very powerful. 93 Table 5.13: Number of rejections, c = 1, z0 = 0. Case 1 Case 2 Case 3 ρ 1% 5% 10% 1% 5% 10% 1% 5% 0.9 0 0 0 1000 1000 1000 1000 1000 0.95 0 0 0 1000 1000 1000 1000 1000 0.975 0 0 0 998 1000 1000 1000 1000 0.99 0 0 0 1000 1000 1000 1000 1000 0.995 0 0 0 997 1000 1000 1000 1000 1 0 0 0 1 7 12 13 59 1.01 0 0 0 0 0 0 0 0 10% 1000 1000 1000 1000 1000 119 0 This section clearly indicates that the Dickey-Fuller case 3 test is not the one we should use when testing pairs for cointegration. Unfortunately it does not clearly distinguish case 1 and case 2. Case 1 performs better for c = 0, z0 = 0 and c = 0, z0 = 100 and c = 0.1, z0 = 100, but case 2 performs better for c = 0.1, z0 = 0 which is most seen in the 10 pairs. In the remainder of this report we will focus on the case 2 test because of Hamilton’s view given in section 4.3 and because this section does not clearly indicate to do otherwise. Another possible reason to use case 2 instead of case 1 could be that that the first step of the Engle-Granger method, which is a linear regression to estimate α, influences the power of the two tests. This will be considered in chapter 6. 5.6 Augmented Dickey-Fuller test So far we discussed the properties of the estimated coefficients for a firstorder autoregression when there is a unit root. In this section we discuss the distribution of the estimated coefficients for a p-th order autoregression. Recall that the Augmented Dickey-Fuller test tests H0 : zt ∼ I(1) against H1 : zt ∼ I(0) , (5.24) when zt is assumed to follow an AR(p) model zt = c + φ1 zt−1 + · · · + φp zt−p + ut , 94 (5.25) where ut ∼ i.i.d(0, σ 2 ). This model can be written as zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut , (5.26) with ρ = φ1 + φ2 + · · · + φp , βi = −(φi+1 + · · · + φp ) , for i = 1, . . . , p − 1 . The null hypothesis is that the autoregressive polynomial 1 − φ1 x − φ2 x2 − · · · − φp xp = 0 , has exactly one unit root and all other roots lie outside the unit circle. The single unit root gives us: 1 − φ1 − φ2 − · · · − φp = 0 i.e., ρ = 1. This implies 1 − φ1 x − · · · − φp xp = (1 − β1 x − · · · − βp−1 xp−1 )(1 − x) . (5.27) Of the p values of x that make the left side of (5.27) zero, one is x = 1 and all other roots are assumed to be outside the unit circle. The same must be true for the right side as well, meaning all roots of 1 − β1 x − · · · − βp−1 xp−1 = 0 . lie outside the unit circle. So, (5.24) is equivalent to H0 : ρ = 1 against H1 : ρ < 1 . We are interested in the properties of test statistic S = ρ̂−1 σ̂ρ̂ in the three cases: Case 1: The true process of zt is (5.26) with c = 0 and ρ = 1, the model estimated is (5.26) except for c. Case 2: The true process of zt is (5.26) with c = 0 and ρ = 1, the model estimated is (5.26). Case 3: The true process of zt is (5.26) with c 6= 0 and ρ = 1, the model estimated is (5.26). 95 We can derive the asymptotic properties in a similar manner as in the preceding sections. To keep this section from being to tedious, we only derive the properties for case 2. We state the outcomes for case 1 and case 3 at the end of this section, the derivations can be found in Hamilton [6]. Before deriving the properties for Augmented Dickey-Fuller case 2, we first state a proposition. Proposition P∞ P 2: Let vt = ∞ j=0 j · | θj | < ∞ and {ut } is an i.i.d sequence j=0 θj ut−j , where 2 with mean zero, variance σ , and finite fourth moment. Define γj = E(vt vt−j ) = σ 2 ∞ X θs θs+j , for j = 0, 1, . . . , (5.28) s=0 λ = σ ∞ X (5.29) θj , j=0 zt = v1 + v2 + · · · + vt , for t = 1, 2, . . . , T , (5.30) with z0 = 0. Then (i) T −1 (ii) T −1 PT t=1 PT t=1 zt−1 vt−j PT (iii) T −3/2 (iv) T −2 (v) T −1/2 (vi) vt vt−j T −1 t=1 zt−1 PT 2 t=1 zt−1 PT t=1 PT vt t=1 zt−1 ut P → γj   for j = 0, 1, . . . 1 2 1 2 (λ2 W (1)2 − γ0 ) (λ2 W (1)2 − γ0 ) →  +γ0 + · · · + γj−1 D D → D → D → D → λ λ2 R1 0 R1 0 W (r) dr . W (r)2 dr . λ W (1) . 1 σλ(W (1)2 2 − 1) . The proof of this proposition can also found in [6]. 96 for j = 0 for j = 1, 2, . . . Asymptotic distribution ADF case 2 We assume that the sample is of size T + p, (z−p+1 , z−p+2 , . . . , zT ) and the model is zt = c + ρzt−1 + β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut = x′t β + ut , where β = (β1 , β2 , . . . , βp−1 , c, ρ) and xt = (∆zt−1 , ∆zt−2 , . . . , ∆zt−p+1 , 1, zt−1 ). Under the null hypothesis of exactly one unit root and the assumption that zt follows above AR(p) model with c = 0 and ρ = 1, we show that zt behaves like the variable zt in proposition 2. Because zt is integrated of order one and vt = ∆zt , vt is stationary and follows an AR(p − 1) model: ∆zt = β1 ∆zt−1 + · · · + βp−1 ∆zt−p+1 + ut , ⇔ vt = β1 vt−1 + · · · + βp−1 vt−p+1 + ut . The autoregressive polynomial of vt is Φ(x) = 1 − β1 x − · · · − βp−1 xp−1 , and all roots of Φ(x) = 0 are outside the unit circle because vt is stationary and we assume it is causal, like all other autoregressive models in this report. Then vt has a MA(∞) representation vt = ∞ X θj ut−j j=0 which polynomial is Θ(x) = 1 + θ1 x + θ2 x2 + · · · and because Φ(x) and Θ(x) are polynomials, we have Θ(x) = 1 . Φ(x) 97 All p − 1 roots, which is a finite number of roots, of Φ(x) are outside the unit circle, so there exists an ε > 0 such that the modus of all roots are larger than 1 + ε, so Φ(x) 6= 0 for |x| < 1 + ε. Within the radius of convergence 1 + ε, the analytic function Θ(x) is differentiable: ′ Θ (x) = ∞ X jθj xj−1 . j=1 And because it is absolutely convergent within its radius of convergence, particularly in point 1, we have ∞ X j=1 j · |θj | < ∞ . This shows we can use proposition 2 without making any further assumptions. The deviation of the OLS estimate β̂ from the true value β is given by β̂ − β = " T X t=1 xt x′t #−1 " T X t=1 # xt u t . (5.31) With vt = zt − zt−1 , the terms in (5.31) are  P P P P 2 vt−1 · · · P vt−1 vt−p+1 P vt−1 P P vt−1 zt−1  v v · · · v v v vt−2 zt−1 t−2 t−1 t−2 t−p+1 t−2  T  . . . .. X .. .. ..  ··· . xt x′t =  P P P 2 P  vt−p+1 vP vP t−p+1 vt−1 · · · t−p+1 zt−1 t=1  P vt−p+1  · · · P vt−p+1 P zt−1 PT P vt−1 2 zt−1 zt−1 zt−1 vt−1 · · · zt−1 vt−p+1   P vt−1 ut   .. T   X   P . xt ut =  . v u t−p+1 t   P t=1   u t P zt−1 ut 98      ,    Like in the derivation of DF case 2 we need a scaling matrix, in this section we use the following (p + 1) × (p + 1) scaling matrix:  √  T √0 · · · 0 0  0 T ··· 0 0    Y= . √  ..  .. . ··· T 0  0 0 ··· 0 T With multiplying (5.31) by the scaling matrix Y and using (5.7) we get " T #) )−1 ( # " T ( X X (5.32) xt ut Y−1 xt x′t Y−1 Y(β̂ − β) = Y−1 t=1 t=1 P −1 ConsiderP the matrix Y xt x′t Y−1 . Elements in the upper left (p × p) block of xt x′t are divided by T , the first p elements of the (p + 1)th row or (p + 1)th column are divided by T 3/2 and the element at the lower right corner is divided by T 2 . Moreover, P P γ|i−j| from proposition 2(i) , T −1 vt−i vi−j → P P → E(vt−j ) = 0 from the law of large numbers , T −1 vt−j P P −3/2 0 from proposition 2(ii ) , T zt−1 vt−j → R P D → λ W (r) dr from proposition 2(iii ) , T −3/2 zt−1 R P 2 D 2 2 −2 → λ W (r) dr from proposition 2(iv ) , T zt−1 where γj = E(∆zt ∆zt−j ) , λ = σ/(1 − β1 − · · · − βp−1 ) , σ 2 = E(u2t ) , and the integral sign denotes integration over r from 0 to 1. Thus,  γ0 · · · γp−2 0 0  .. # " T .. .. ..  . . . . ··· X D  xt x′t Y−1 −→  γp−2 · · · γ0 Y−1 0 0 R  t=1  0 ··· 0 λ R W (r) dr R 1 0 · · · 0 λ W (r) dr λ2 W (r)2 dr · ¸ V 0 = , 0 Q 99        with    V =   Q = · γ0 γ1 .. . γ1 γ0 .. . · · · γp−2 · · · γp−3 .. ··· . γp−2 γp−3 · · ·    ,  γ0 R ¸ λ R W (r) dr R 1 . λ W (r) dr λ2 W (r)2 dr Next, consider the second term in the right side of(5.32)   P T −1/2 vt−1 ut  #  " T ..   . X   −1/2 P −1 xt u t =  T Y . v u t−p+1 t   P −1/2 t=1   T P ut −1 T zt−1 ut (5.33) (5.34) The first p − 1 elements of this √ vector satisfy the central limit theorem. This is because these elements are T times the sample mean of a martingale difference sequence whose covariance matrix is σ 2 V, but this is not discussed further. The result is   P T −1/2 vt−1 ut  D  .. 2  → h1 ∼ N (0, σ V) .  . P T −1/2 vt−p+1 ut The distribution of the last two elements in (5.34) can be obtained from statements (v) en (vi) of proposition 2: · ¸ ¸ · P σW (1) T −1/2 u D t P (5.35) → h2 ∼ 1 . σλ(W (1)2 − 1) T −1 zt−1 ut 2 This gives that the deviation of the OLS estimate from its true value is ¸ · −1 ¸ ¸−1 · · V h1 h1 V 0 D . = Y(β̂ − β) → (5.36) Q−1 h2 h2 0 Q 100 The last two elements of β are c and ρ, which are the constant term and the coefficient on the I(1) regressor, zt−1 . From (5.33),(5.35) and (5.36), their limiting distribution is given by ¸ · 1/2 ¸· ĉ T 0 D → ρ̂ − 1 0 T R ¸−1 · ¸· · ¸ W (1) σ 0 R 1 R W (r)2dr (5.37) 1 W (r) dr W (r) dr 0 σ/λ (W (1)2 − 1) 2 The t test statistic S of the null hypothesis that ρ = 1 is S= ρ̂ − 1 ρ̂ − 1 , = P 1/2 σ̂ρ̂ {rT2 e( xt x′t )−1 e} where e denotes a p + 1 vector with unity in the last postition and zeros elsewhere. Multiplying the numerator and the denominator by T results in S= But eY ³X xt x′t ´−1 By (5.37) we have D T (ρ̂ − 1) . P 1/2 {rT2 eY( xt x′t )−1 Ye} n ³X ´ o−1 Ye = e Y xt x′t Y e · −1 ¸ V 0 D (5.38) → e′ e 0 Q−1 1 nR = ¡R ¢2 o . 2 2 λ W (r) dr − W (r) dr T (ρ̂ − 1) → (σ/λ) R − 1) − W (1) W (r) dr ¡R ¢2 . R W (r)2 dr − W (r) dr 1 (W (1)2 2 (5.39) P Using (5.38) and (5.39) together with rT2 → σ 2 , we finally get R 1 2 D 2 (W (1) − 1) − W (1) W (r) dr S → ³R £R ¤2 ´1/2 , 2 W (r) dr − W (r) dr 101 (5.40) which is exactly the same as the asymptotic distribution of the Dickey-Fuller case 2 test statistic. So the critical values are the same as in table 5.4 in section 5.3 without making any corrections for the fact that lagged values of ∆zt are included in the regression. This is also true for the other cases, Augmented Dickey-Fuller case 1 test statistic has the same asymptotic distribution as Dickey-Fuller case 1 and ADF case 3 the same as DF case 3. Like in the preceding sections we can simulate the density of the test statistic for finite sample sizes, we show the results for the case 2 test when p = 2. We simulate for different values of σ when T = 500 and β1 = −0.1, and naturally ρ = 1, c = 0. We took this value for β1 because this value is seen a few times in the 10 pairs IMC provided. The estimated densities of 5,000 simulated test statistics for σ 2 = 1, 5, 10 are shown in figure 5.11. Also the asymptotic density we found for the case 2 test, figure 5.4, is plotted with a dashed line. The different graphs coincide nicely. With this setting, the ’original’ AR model with lagged terms instead of differenced terms is: zt = 0.9zt−1 + 0.1zt−2 + ut . The autoregressive polynomial 1 − 0.9x − 0.1x2 = 0 , has roots 1 and −10, so the assumption of exactly one unit root is fulfilled. 0.4 0.2 0.0 ................. ............. .......... ...... ........ ....... . .... . . .. ...... ...... .... ...... . ...... ...... . .... . . ...... . . ... ..... . . ....... .... .... .. . ...... .... . ..... ..... ...... . .... ..... . ..... .... . ... ..... ... . ...... ... . ... ... . . .... ...... . .... ....... ..... . ... ..... . . ..... ....... ...... . . . .... ..... . ...... ...... ...... . . ........ ... . . . ......... ....... ........ . . . ............. ........ . . . . ............. ...... . . . . . ............ . ........... . ............ . . . . . ..................... ......... . . . . . . . . . . . . ............................ . . . ..................................................................... ..... .................................................................................... −6 −4 −2 0 2 Figure 5.11: Estimated density of ADF case 2 for different σ 2 , T = 500 and β1 = −0.1. 102 To see what influence β1 has we also vary its value while keeping σ 2 fixed at 1. The results for β1 = −0.9, −0.5, 0, 0.5, 0.9, 1 are shown in figure 5.12. The awkward graph corresponds with β1 = 1, the ’original’ AR model is: zt = 2zt−1 − zt−2 + ut The autoregressive polynomial 1 − 2x + x2 = 0 has twice root 1, so there are two unit roots. That is probably why the graph for β1 = 1 does not look like the other ones. For the other values of β1 the assumption of exactly one unit root is fulfilled. The values of β1 for the 10 pairs when an AR(2) model is fit on the spread process are in a range of (-0.25,0.1). 0.4 0.2 0.0 ..... .... .... ................. ....................... ........... ........................ .......... .............. . .. ............... ....... ................ ........ ........... ........ . ............ ........ . ......... ........ . ......... ............. ......... ....... ......... . . ........... ............ ......... ......... ........... ... . ......... ..... . ......... . ........ ......... .... ....... . ........ ....... . ....... ............................ ...... . ....... ..................... . . . . . . ....... . . . . . . . . . . . . . ....... . . . . . . . . . . . . . . . . . . . ....... ......... .... ...... . . ........ . . . ......... ....... .... ....... . ...... . . ............ . ...... ...... ...... . . . . . . ......... ..... ....... ...... . ..... . . . . ............ ..... ........ ....... . . ........... ..... ..... ...... . . . . . ..... . . ................ ......... ..... . . . . . . . . ............ ..... ............ . . . . ..... . . . . . . ................. .......... ...... . . . . . . . . . . . . ................ ....... ........ . . . . . . . . . . . . . ........ . . . . ................... ......... . . ......... . . . . . . . . . . . . . . . . . . . .................. ................. ................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................................... . . . . . . . . . . . . . . . ............................................................................ ....................................................................... −6 −4 −2 0 2 4 Figure 5.12: Estimated density of ADF case 2 for different β1 , T = 500 and σ 2 = 1. 103 We also show the results for two higher order models. First p = 3, figure 5.13 shows the estimated densities for three different settings of β1 and β2 . We take β2 equal to -0.1 and β1 is -0.2, -0.1 and 0.1 successively, these are also values seen with the 10 pairs. For these values the autoregressive polynomial has exactly one unit root and the other roots are outside the unit circle, so the null hypothesis is satisfied. Also the graph of figure 5.4 is displayed, again they coincide nicely. 0.4 0.2 0.0 ...... ............................................ ............ ........................ .......... .............. . . ......... .. ........ ......... ....... ...... ......... ........ . ......... .......... . ......... ........ . ......... ........... ........ . ....... .......... . ...... .......... . ...... .......... . ...... ..... ..... . ..... ....... . .... ... . . . ...... ... . . . ....... ... . ......... . . . . . . . ...... . ..... ..... . . .... .... .... . . .... ...... . .... .. . . ....... . . .... ....... . . . . ......... .... . . . . ......... ...... . . .......... . . .... ......... . . . .............. ...... . . . . .................. . ...... . . ................... . . . . . . . ....................................... ......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................................................... . . . . . . . . . . . . . . . . . . . . . . . . ....... .............................. −6 −4 −2 0 2 Figure 5.13: Estimated density of ADF case 2 for different β1 and β2 , T = 500 and σ 2 = 1. Lastly, we consider p = 5. Figure 5.14 shows the estimated densities for three parameter settings: Setting 1 : β1 = −0.3 β2 = −0.2 β3 = −0.1 β4 = −0.05 Setting 2 : β1 = −0.1 β2 = −0.1 β3 = −0.1 β4 = −0.1 Setting 3 : β1 = 0.1 β2 = −0.1 β3 = 0.1 β4 = 0.05 These three settings also represents most of the 10 pairs and that the null hypothesis is satisfied was checked with Maple. We see that the densities show more dispersion and do not coincide with the asymptotic density as nicely as for the lower order models above. 104 0.4 0.2 0.0 ...... ................ ............................ .......... .. . . .............. ................... ............. ............. .. ....... ................ ............ ....... .... .. . ............... ........ . .... ......... . ...... ...... ....... . ....... ..... ......... .... . .......... . .......... ......... . . ......... . . ........... ...... . .... ... ..... ...... . .... . ......... . ...... ......... . ............ ........ . ............. ......... ........... ......... ..... . . ........ ......... . ......... .... . .......... . . . ......... ....... . . ......... ..... . . . . ........... ........ . ........... . . . . ....... ................. . . . . . . ......... ........ . . . . . . ................ ....... . . ............. . . . . . . . ................... ...... . . . . . . . . . . . . . . ................................................................................................. . . . . . . ....... ........................................................................... −6 −4 −2 0 2 Figure 5.14: Estimated density of ADF case 2 for different β1 , T = 500 and σ 2 = 1. In the next section we look again at all these models to see if the power of the Augmented Dickey-Fuller case 2 test is influenced by the value of p. 5.7 Power of the Augmented Dickey-Fuller case 2 test In this section we briefly look at how much influence the value of p, the order of the autoregressive model, has on the power of the Augmented DickeyFuller case 2 test. As in section 5.5 we generate 1,000 paths for different values of ρ and see how many times the test rejects the null hypothesis of a unit root. We start with p = 2, so the paths are generated according to the model zt = ρzt−1 + β1 ∆zt−1 + ut , for we take ut i.i.d standard Gaussian random variables, sample size T = 500 and z0 = 0. Table 5.14 shows the number of rejections for several values of ρ and β1 . We use values of β1 which are seen with the 10 pairs IMC provided. When ρ = 1 the null hypothesis is satisfied, the other roots of the autoregressive polynomial lie outside the unit circle, they are -4, -10 and 10 respectively for β1 = −0.25, −0.1, 0.1. We see that under the null hypothesis the test behaves as expected. The power is quite similar to the Dickey-Fuller case 2 test in table 5.9. The power is better for the positive value of β1 . 105 Table 5.14: Number of rejections, β1 = −0.25 β1 = −0.1 ρ 1% 5% 10% 1% 5% 10% 0.9 986 1000 1000 998 1000 1000 0.95 447 845 949 582 919 976 0.975 68 297 491 104 395 589 0.99 18 75 162 13 89 199 0.995 15 70 138 13 66 137 1 9 49 97 7 45 88 1.01 0 1 4 0 1 2 p = 2. β1 = 0.1 1% 5% 10% 1000 1000 1000 792 980 997 179 502 708 28 117 232 10 74 144 9 42 89 0 1 2 Table 5.15 shows the number of rejections for the AR(3) model: zt = ρzt−1 + β1 ∆zt−1 + β2 ∆zt−2 + ut . We use the same values of β1 and β2 as in the previous section: β2 = −0.1 and β1 = −0.25 − 0.1, 0.1. For these values of β1 and β2 and when ρ = 1 the null hypothesis of exactly one unit root is satisfied. The table does not indicate that the power when p = 3 is much less than the power of the test when p = 2. Lastly, table 5.16 shows the number of rejections for the AR(5) model: zt = ρzt−1 + β1 ∆zt−1 + β2 ∆zt−2 + β3 ∆zt−3 + β4 ∆zt−4 + ut , where we used the three parameter settings from the previous section. This table indicates that the power of the test with p = 5 is less than the power of the test for smaller values of p. Specially the first setting of parameters shows that the power of the test is less for p = 5. 106 Table 5.15: Number β1 = −0.2 ρ 1% 5% 10% 0.9 978 998 1000 0.95 406 765 900 0.975 63 283 482 0.99 17 83 163 0.995 13 74 134 1 12 51 114 1.01 0 0 2 of rejections, p = 3 β1 = −0.1 1% 5% 10% 986 1000 1000 487 851 944 86 287 473 22 89 177 13 71 136 13 62 104 0 2 3 and β2 = −0.1. β1 = 0.1 1% 5% 10% 1000 1000 1000 684 941 984 129 399 615 27 113 219 19 67 128 14 58 104 1 5 6 Table 5.16: Number of rejections, p = 5. setting 1 setting 2 setting 3 ρ 1% 5% 10% 1% 5% 10% 1% 5% 10% 0.9 782 975 999 924 995 1000 1000 1000 1000 0.95 198 527 717 307 708 874 731 904 992 0.975 37 176 313 69 253 418 186 531 743 0.99 15 83 152 15 97 179 21 114 228 0.995 14 52 115 15 52 124 14 83 163 1 12 48 99 9 50 101 12 47 97 1.01 0 1 10 0 4 9 0 1 1 107 108 Chapter 6 Engle-Granger method In the previous chapter we derived and simulated the properties of the Dickey-Fuller test. In this chapter we would like to find the properties of the Engle-Granger method, which we use for testing stock price data for cointegration. As explained in chapter 4 the Engle-Granger method consists of two steps, a linear regression followed by the Dickey-Fuller test on the residuals of this regression. The main question is, are the critical values of the Engle-Granger method the same as those for the Dickey-Fuller test. In the first section the critical values of Engle-Granger method are found by simulating price processes xt and yt with random walks, then all the assumptions of the method are satisfied. The second section also simulates the critical values but now the model from section 4.2 is used for simulating the processes xt and yt . Then not all assumptions are completely satisfied, because xt and yt are not strictly integrated of order one. The third section finds the critical values with bootstrapping from real data. In the last section we simulate cointegrated price processes xt and yt with the alternative method from section 4.5 and find out whether the Engle-Granger method recognizes them as cointegrated. The main focus of this chapter is on the case 2 test, but we will also compare the power of this test with the case 1 test. 109 6.1 Engle-Granger simulation with random walks The Engle-Granger assumes we have two prices processes {xt , yt }Tt=0 where each individually is integrated of order one, I(1). Then xt and yt are cointegrated if there exists a linear combination that is stationary: xt , yt are cointegrated ⇐⇒ ∃α, α0 such that yt − αxt − α0 = εt ∼ I(0) . As described in chapter 4, we prefer to set α0 = 0 because of the cash-neutral aspect. So the Engle-Granger method boils down to P P ˆ Estimate α with OLS: α̂ = Tt=0 xt yt / Tt=0 x2t . ˆ Calculate spread et = yt − α̂xt . ˆ Test et for stationarity with ADF case 2 test. In order to simulate pairs of data that are certain to be cointegrated and certain to be not, we like to simulate xt and generate yt belonging to xt such that the spread process is an AR(p) process. This is because the DickeyFuller test assumes that the input series, in our case the spread process, is an AR(p) process. The process xt has to be integrated of order one, then the most simple model for xt is a random walk: xt = xt−1 + ut , (6.1) with ut i.i.d N (0, σx2 ) variables and x0 an initial value. Then the difference xt − xt−1 is white noise, so xt ∼ I(1). Now we have xt , we like to generate yt such that yt − αxt is AR(p) for some α and some p. In this section we look at a few different settings, but only for p = 1 and find out whether the distribution and power of the Engle-Granger test statistic differs from the earlier derived distribution and power of the Dickey-Fuller case 2 test statistic. First, this is done under the null hypothesis of the DF case 2 test, this means that there is no constant in the spread process but the constant is estimated. Second, when there is a small constant present in the spread process. Last, for p = 1, we generate yt with α0 but do not regress on a constant to find out whether this is still cointegrated according to the Engle-Granger method. 110 AR(1) under the null hypothesis of DF case 2 We want the spread process to be an AR(1) process: yt − αxt = εt = β0 + βεt−1 + ηt , for {ηt } we take i.i.d N (0, ση2 ) variables. Then we can generate yt like: yt = αxt + β0 + β (yt−1 − αxt−1 ) + ηt , y0 = αx0 . for t = 1, . . . , , (6.2) (6.3) Under the null hypothesis of the Dickey-Fuller case 2 there is a unit root and no constant in the spread process, β = 1 and β0 = 0. The processes xt and yt are cointegrated if we take β < 1. Figure 6.1 shows a sample path for x and y when β = 1 and figure 6.2 for β = 0.5, with both graphs α = 0.8, β0 = 0, x0 = 25, σx2 = ση2 = 1 and T = 500. 125 75 25 .... ......... ......... . .. ....... ........... .... ... ... . . ........ ... ..................... . . . .. ........ . . . . . . . . . . . . . . ..................... ..... ..... . .. . . . . . . . . . . . . . . .. ..... ............. ...... .. ............. .... .......... ...... . . . . . . . .. .......... .. ............ .... ..... ............................................. ..... . . . . .. . . . . . . . . . . . ...................... . . . . ....... ............ .... . . . . . . . . . . . . . . . . . . . .... ............... ............... .... ............ .... ........ ..... ... ............ ..... .... .. .......... .. .. ..... . ............... . ... .................... ............ .................................... .................. ........... . ............ ........................................ . . . . .. .... ................. ...... ...... ............. ... ...................... ... . .. .. ......... ....... ......................... .. .......... ... ................................................ .................................... ............. ..... . . . . .... .. ... ..... .............. .... ..... 0 100 200 300 400 500 Figure 6.1: Pair x, y not cointegrated. 40 30 20 10 .... ............ ....... ........ ... .... ...... ............. ......... ...... ... ....... .... ................. ..... .. .. .. ...... ...... ... ..... ........ . ........ .. .... ........... ........ ... ..... ... .... .......... ............ . ............... . .......... ...... . ..... ... ........ ..... . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .... ... .. .... ......... . ........ ............ ......... ..... .. ............ ......... ......... .... ..... . .. .. .. .. .. . ..... .. ........... ........ ..... ....... ...... ... .... ............. ...... .. ..... ... ...... .............. .. . ............. .... . .. .. .. . .. ... ........ .. ... ........ .. .... .. .......... ... .... ... .. .. ... .... .. ... . ....... ..... ..... ............ ..... .. .. ........ ............... .... .. ...... ........ .... ....... ....... .... ........... ....... ............. ................ ................ .. ........ ............ ... ............... ..... ....... .......... ...... ................. ...... .......... .................. ... ........ .... .. . . ......... . . . . . . . . . . . .... . . .......... ................. ..... . . .... ........ . ............ .... ..... .. ... . ..... .. ........ ..... .. .. ....... . ......... ......... . ......... ..... . .. .. ... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... . . . . . . . . . . . . . . . . . . . . . . . . . ... .... ..... . . . . . . . . . . . . . . . . . . . ....... ......... ... . ........ . .... ..... ... . ........... ......... ............... ..... . . . . . .. .. ... . .. ............. ....... ........ ...... ... .... ........ ..... ......... ...... ...... ... ... .................................. ...... . ............. .. ....... ..... ..... .. ...... ... ....... .. .. . ... . .... ........ . ................ .... ........ . ..... .... ....... ................. .. .......................... ......... ..... ............... ..... .. .. ........ ...... ........ ............. ....... ... ... ............ ............. ........ ....................... . ........ .... .......... ............... ........ ......... ................ .... ....... . . ... ... .... ........ ..... .. .. ... ..... ........... ................. .............. . .. .. .... ... ... . ...... ..... ....... . .. ... . .. ................. .......... .... ...... ... .. . .... .... ... ............... ..... . ... .... ..... ...... .... ... 0 100 200 300 400 Figure 6.2: Pair x, y cointegrated. 111 500 To see whether the critical values of Engle-Granger are more or less the same as for Dickey-Fuller case 2, i.e. to see if estimating α has an affect on the critical values, we simulate a lot of paths xt and yt under the null hypothesis and calculate the test statistic S. The procedure is: ˆ Simulate xt . ˆ Generate yt according to (6.2). ˆ Calculate α̂. ˆ Calculate the spread et = yt − α̂xt . ˆ Calculate the spread et = yt − α̂xt ˆ DF-test on the spread: Estimate β with OLS. Calculate the OLS standard error for β: σ̂β̂ . Calculate test statistic S = (β̂ − 1)/σ̂β̂ . ˆ Repeat all this 5,000 times. Then we estimate the density of the simulated test statistics, again with a Gaussian kernel estimator. This we can compare to the density we found for the Dickey-Fuller case 2 test in chapter 5. Figure 6.3 shows the estimated densities for different values of α for T = 500, x0 = 25, σx2 = ση2 = 1 and β0 = 0. The figure also shows the Dickey-Fuller case 2 from figure 5.4. Figure 6.4 and 6.5 show estimated densities for different values of σx2 and ση2 respectively, for the same parameters as above and α = 1. When the null hypothesis is completely satisfied, that is β = 1 and β0 = 0, these densities look a lot like the density for the Dickey-Fuller case 2 density. So it looks like the preceding step to the DF test, namely estimating α, does not really affect the critical values. To see if the power of the test is not affected by the preceding step, table 6.1 shows the number of rejections for different values of β and α. It is clear from the table that the power of the Engle-Granger method is not dependent of the value of α. This table should be compared with the columns of case 2 of table 5.9, because there is no constant and the initial value of the spread process is 0. We see 112 0.4 0.2 0.0 ......................... ................................... ....... ........... ..... ......... . ....... ...... ....... ........ .... . ...... ...... . ..... ......... . ........ ...... . ....... ......... . ......... ....... ..... . ...... ........ . ........ ............ ........ . ....... ......... . ............. ......... . ......... ........ . ......... ........ ........ . ...... ....... . . ........... ...... . ......... .... . . ......... .. . ............. . . . ............... ......... . . . . ................... ........ . . ............. . . . . . ................ .......... . . . . . . . ...................... ............. . . . . . . . . . . . . . ................................................................................................................................... . . . . . . . . . . . ....... ............................................................ −6 −4 −2 0 2 Figure 6.3: AR(1) Estimated density.................for EG test statistic, α = 0.1, 0.5, 1. ......... 0.4 0.2 0.0 . .. . .... .......................................... ... .......... ............... ........... ... ...... ... ............. .. .......... . ............ ... ........... ............ .. ......... . ......... .. ........ . ............ .. ......... . .............. .. ........ ............. . .. ......... ............ . .......... ............ . ............ ............ . ............ ........... . ...... ..... ............ . ...... ....... ............ . ...... ........ ............ .................... . .............. ............ ... . ............... ........... . .... ............. .......... . .... ............ ......... . .... ......... . . ............... ................ . . . ...................... .......... . . . ...... ............. . . ................. . ........................... . . . . . . ............................. .................. . . . . . . . . . . . . . . . . . ................................................................................................. ..... . . . . . . ....... ................................................................................................ ..... .......................... −6 −4 −2 0 2 Figure 6.4: AR(1) Estimated density of.. EG test statistic, σx2 = 0.1, 0.5, 1, 5. 0.4 0.2 0.0 ..... . ........................... .................................... ........... ............ ........ ........ . .. ....... ... ...... ........ .. ......... . ........ .. ......... . ........ .. .......... . .......... .. ....... .......... . .. ....... ........... . ......... .. ..... . ........... ......... . ............. .......... . ............ .......... . ............ ........... .............. . .......... .............. . ................. ............. . ... ........ ....... . ... .......... ........ . . .... ...... . .......... .... ........ . . .... ........... .......... . . ................. . ........ . . ...................... . . . . . . ..................... .................. . . . ............... . . ................. . .................. . . . . . . . . . .................................. . ........ . ................................................................................ ............................................................................................... −6 −4 −2 0 2 Figure 6.5: AR(1) Estimated density of EG test statistic, ση2 = 0.1, 0.5, 1, 5. that there is practically no difference between these columns and table 6.1, which indicates that the power of the Engle-Granger method is as good as that of the Dickey-Fuller test. The estimation of α does not have a negative influence on the power of the test, which is a nice property. 113 Table 6.1: Number α=1 β 1% 5% 10% 0.9 1000 1000 1000 0.95 722 975 1000 0.975 169 485 686 0.99 29 133 248 0.995 14 67 149 1 14 52 105 1.01 0 0 3 of rejections, AR(1) α = 0.5 1% 5% 10% 1000 1000 1000 724 964 993 164 466 683 38 123 242 17 75 161 8 51 109 1 1 2 and β0 = 0 α = 0.1 1% 5% 1000 1000 717 961 145 469 36 143 18 87 10 52 1 4 10% 1000 993 691 254 163 115 6 AR(1) with constant in spread When testing for cointegration with the Engle-Granger method, we use the Dickey-Fuller case 2 test which assumes there is no constant in the spread process but does estimate one. It is interesting to see what happens to the properties of the Engle-Granger method if we do include a constant, β0 , in the spread process. From tables 5.11 and 5.12 we already saw that the power of the Dickey-Fuller case 2 test was not really affected by a small constant, when there was no unit root. But the number of rejections when there was a unit root were a bit small. To be more precise, for increasing values of the constant the Dickey-Fuller case 2 test statistic converges to the case 3 statistic, standard Gaussian, as explained in section 5.4. But with the Engle-Granger method we do a preceding step, we estimate α, maybe this first step has a restraining influence on the shift. Figure 6.6 shows the estimated density of the Engle-Granger test statistic when there is a small constant, β0 = 0.1, in the spread process. The dashed line is the density when β0 = 0. With both graphs the paths were generated with T = 500, α = 1, σx = 1, ση = 1 and β = 1. There is a shift to the right, as we could expect from the properties of the Dickey-Fuller test. However, the dotted line is the density when β0 = 1000, so there is a restraining influence on the shift. The Engle-Granger test statistic does not converge to the DF case 3 statistic for large constants. Although this is nice, for small values of β0 we still have the same shift as for the DF case 2 statistic, so the first step in the Engle-Granger method does not have a big influence. This can also be observed in table 6.2, it shows the number of rejections for different values of β when β0 = 0.1. The power of the Engle-Granger test is rather close to the 114 power of the Dickey-Fuller case 2 test with a small constant, as seen in table 5.11. So it looks like the Engle-Granger test statistic has the same properties as the Dickey-Fuller case 2 test statistic, for small constants. ...... ... .. ... .. .. . . ... .. .. .. ... . .. ... . . . ..... .............. . . . . ............ . . . ... . .. . . . .. ....... . .. . ..... . . . . . ..... ... . .. . .... .... .. . . ... .... . . . .... .. . . . . . . .... ... . .. . ... . . ... . .. . . . . ... . ... . . . . . . . . . ... . .. .. . ... . ... . . . . . . ... .. . . .. ... . . ... . . . . . . . ... . ... . . .. . .... . ... . .. . . . . ... ... . . . . . . . . . . ... . .. .... . . ... . ... . .... . . . . . . . .... .. ..... . . .... .... . . . . . . . . . . . . .... . . ... ....... . . . . ..... ..... . . . . . . . . . . . ...... . . ...... ..... ........ . . . . ...... . . . . . . . . . . . . . . ....... ...... ........... . . . ......... ....... ............. . . . . . . . . . . ....... ....... ............................................. . . ...... . . .... ....... ....... ......................................... ............................................................... . ... 0.4 0.2 0.0 −6 −4 −2 0 2 Figure 6.6: Estimated density for EG test statistic, β0 = 0.1. Table 6.2: Number of rejections of the null hypothesis, β0 = 0.1. β 1% 5% 10% 0.9 1000 1000 1000 0.95 733 969 993 0.975 144 444 650 0.99 33 120 240 0.995 25 86 170 1 6 31 62 1.01 0 0 0 Neglecting present constant in the cointegrating relation As explained before, we like the cointegrating relation not to have a constant, α0 . However, if there is a constant we neglect it; we only fit yt on xt and not on a constant. In chapter 4 we saw that if there is relatively large positive α0 , neglecting it results in an overestimation of α which in turn results in a down trend in the spread process. The other way around, neglecting a negative value of α0 results in an up trend in the spread process. Two stock price 115 processes with a trend in their spread process, do not form a good pair for our trading strategy. But for a small value of α0 , there is not a big trend and the price processes form a good pair as seen in figure 4.18. It is interesting to see how the Engle-Granger method performs if there is a small α0 but it is neglected. We can generate cointegrated data where the cointegrating relation has a constant. We generate yt with: yt = αxt + α0 + β0 + β (yt−1 − αxt−1 ) + ηt , for t = 1, . . . , T. From this equation we can see that with this generating scheme including α0 is a bit lame, including α0 is the same as including a larger value of β0 . We have already seen what happens for larger values of β0 in figure 6.6. But at last, we now have found a reason to use Dickey-Fuller case 2 instead of case 1! The power of case 1 is practically zero when there is a constant, see table 5.13. Table 6.3 shows this is also true when we perform the preceding step of estimating α. The table shows the number of rejections when we use case 1 in the Engle-Granger method and the ’normal’ Engle-Granger which uses the case 2 test. When we use the DF case 1 test in the Engle-Granger method instead of the case 2 test, the power is almost zero. The paths were generated with x0 = 25, T = 500, α = 1, σx = 1, ση = 1 and β0 = 0. For the value of α0 used to make table 6.3 and β < 1, we do see xt and yt as a good pair, so we like the Engle-Granger method to see them as cointegrated. Table 6.3: Number of rejections, α0 = 1. Case 1 Case 2 β 1% 5% 10% 1% 5% 10% 0.9 6 51 126 578 835 918 0.95 0 0 3 196 432 577 0.975 0 0 0 139 352 492 0.99 0 0 0 77 256 415 0.995 0 0 0 56 157 264 1 0 0 0 4 14 28 1.01 0 0 0 0 0 0 116 When performing the Engle-Granger test on real data we do not know if there is a small constant in the cointegrating relation, so from now on we only look at the DF case 2 test. Because we use the DF case 2 test within the Engle-Granger method, this method makes the following assumptions: ˆ Processes xt , yt are integrated of order one. ˆ Spread process yt − αxt is an AR(p) process. ˆ There is no constant in the spread process. So far, we have seen that when all assumptions of the Engle-Granger method are fulfilled, the Engle-Granger test statistic has the same distribution and power properties as the DF case 2 test statistic. In other words the first step of estimating α does not have an influence. We have seen that when there is a constant in the spread process, so not all assumptions are fulfilled, the distribution makes a limited shift to the right. With limited we mean that the Engle-Granger statistic does not converge to the DF case 3 statistic, like the DF case 2 statistic does when there is an increasing constant. Last, we have seen when there is a constant in the cointegrating relation it is better to use the DF case 2 test within the Engle-Granger method instead of the DF case 1 test. In the next section we examine what happens when the price processes xt and yt are not strictly integrated of order 1. 6.2 Engle-Granger simulation with stock price model In section 4.2 we derived a stock price model which is commonly used for the valuation of options. This model is more realistic then the random walk from the preceding section. Although with this model the assumptions from the Engle-Granger method are not completely satisfied, it is interesting to find out if the method performs the same. The approach for simulating price processes xt and yt is the same as in the preceding section, only the paths for xt are simulated with the stock price model instead of random walks: p xt = xt−1 + µ δt xt−1 + σ δt ut xt−1 , (6.4) 117 where ut are i.i.d N (0, 1). Then xt is not exactly integrated of order 1, there is an upward drift µ so the expectation of the differences is not constant. We look at small values of µ and for a for a finite sample size T = 500, so xt is almost integrated of order 1. By simulating a lot of paths for xt and corresponding yt we are going to see if this effects the Engle-Granger method. We again simulate yt such that the spread process is AR(p) and to fulfill the remaining assumption of the method, we do not include a constant β0 in the spread process. For p = 1 the results of the simulations are the same as in figure 6.3 through 6.5 and table 6.1, that is why they are not displayed. It looks like the Engle-Granger method is not sensitive for xt not being exactly integrated of order 1. In this section we consider the situation when the spread process is an AR(2) process. First we need to simulate xt . Typical values of the drift parameter µ are between 0.01 and 0.1, and volatility σ between 0.05 and 0.5 when we measure time in years. We like to simulate daily stock prices, so we take δt = 1/260 because there are roughly 260 trading days a year. We want to generate yt such that the spread process yt − αxt is AR(2), for some α: yt − αxt = εt = βεt−1 + β1 ∆εt−1 + ηt , (6.5) for nt we take i.i.d. N (0, ση2 ) variables. Then we can generate yt like: yt = αxt + β (yt−1 − αxt−1 ) + β1 ∆εt−1 + ηt , y0 = αx0 , y1 = αx1 , t = 2, . . . , T, (6.6) where we use within each step: ∆εt−1 = (yt−1 − αxt−1 ) − (yt−2 − αxt−2 ) . We take ση2 = 0.1, because if we take the variance of η equal to 1 then yt is much more jagged than xt and we are trying to model the price processes more realistically. Then xt and yt are cointegrated if β < 1. Again we estimate the density of the Engle-Granger test statistic by simulating for different values of α in the same way as the previous paragraph. The results are shown in figure 6.7 for T = 500, x0 = 25, µ = 0.05, σ = 0.20 , β1 = −0.1 and of course β = 1. The density of the Engle-Granger test statistic is again comparable with the density of the Dickey-Fuller case 2 statistic. 118 ............... ........................................ . .. ............. ........ .............. . . . ... . .. ... .. ... .... .. ........ ... ...... ... ......... . .... ... ... ......... ... ..... ... ...... ... . ......... ... ..... ... . ... ... ... ..... ... . ......... ... ......... ... ........ ... . ..... ... ...... ... . ...... ... ....... ... . ...... ... ....... ... . ....... ... ...... ... . ........ ... ........ ... ...... ... . ..... ... ......... .. . ........ ... ........ ... . .... ..... ...... ... . ... .... ....... ... . .. ..... ... .. ....... ... . ......... ...... ... . ... .... .... ... . ...... .... ..... ... ............ . ..... ... .......... . ........... ..... ... . .......... ...... ... . .... .... ........ ......... . ............ ........ . . ........ ............. ........ . .......... .......... . . ........ .................. . ........... . ................. ............... . . . .............. ........... . . . . . . . . ............. ................. . . . ............... . . . . . ......................... . ..................... . . . . . . . . . . . . . . . . . . . . . . ...................................................................................................................... ....... ................................................................................................. 0.4 0.2 0.0 −4 −2 0 2 Figure 6.7: AR(2) Estimated density for EG test statistic , α = 0.1, 0.5, 1. Table 6.4 shows the number of rejections for three different values of β1 . Compared to table 5.14, which shows the corresponding power of the DickeyFuller case 2 test, the power of the Engle-Granger method has not declined. It seems that the Engle-Granger performs the same for data that is not exactly integrated of order one, as for data that is. β 0.9 0.95 0.975 0.99 0.995 1 1.01 Table 6.4: Number β1 = −0.25 1% 5% 10% 987 1000 1000 477 834 945 98 337 505 15 98 176 12 66 143 8 61 111 1 2 5 of rejections, AR(2) and β0 = 0. β1 = −0.1 β1 = 0.1 1% 5% 10% 1% 5% 10% 999 1000 1000 1000 1000 1000 611 925 986 825 986 998 139 404 617 197 569 756 24 126 234 30 150 262 12 65 144 21 97 163 11 53 101 13 50 104 0 2 3 1 4 4 119 6.3 Engle-Granger with bootstrapping from real data So far we have simulated paths xt and yt from scratch to find the critical values of the Engle-Granger method. In this section we build paths xt and yt by bootstrapping from real data. The data are the ten pairs of stocks that IMC provided. First we describe the bootstrap procedure and then we look at some results of the ten pairs. Bootstrap procedure Assume we have a pair that consists of two stock price processes xt and yt , for t = 0, . . . , T , which are integrated of order one. Let us assume further that there exists an α such that yt − αxt follows an AR(p) process: yt − αxt = εt = β0 + βεt−1 + β1 ∆εt−1 + · · · + βp−1 ∆εt−p+1 + ηt . (6.7) The null hypothesis of no cointegration against the alternative that there is cointegration between xt and yt can be formulated as H0 : β = 1 against H1 : β < 0 . The first step in the bootstrap procedure is to estimate α with OLS, which results in α̂. Then we can calculate the spread process: et = yt − α̂xt , t = 0, . . . , T, this resembles the true spread process εt which is assumed to follow an AR(p) process. In the preceding sections we knew the value of p but now do not, since we are working with real data. The second step is to estimate p with the information criteria described in chapter 3, which results in p̂. The third step is to estimate the coefficients of the AR(p̂) model with linear regression, which results in β̂, β̂0 , β̂1 , . . . , β̂p̂−1 . Then we can calculate the residuals: nt = et − β̂0 − β̂et−1 − β̂1 ∆et−1 − · · · − β̂p̂−1 ∆et−p̂+1 , 120 t = p̂, . . . , T, this resembles the true residuals ηt which are assumed to be white noise. The fourth step is to calculate the test statistic for the real data S= β̂ − 1 , σ̂β̂ where σ̂β̂ is the standard error of β̂. Now we are ready to build a new path yt∗ that belongs to the original xt . This is done in the following way: yt∗ = α̂xt + ε∗t , t = p̂, . . . , T, where ε∗t is built under the null hypothesis, that is β = 1 and β0 = 0: ε∗t = ε∗t−1 + β̂1 ∆ε∗t−1 + · · · + β̂p̂−1 ∆ε∗t−p̂+1 + ηt∗ , with ηt∗ is taking uniform out of nt . We initialize the new path by: ε∗i = yi − α̂xi , i = 0, . . . , p̂ − 1. We treat the new pair {xt , yt∗ } the same way as with the original pair {xt , yt }. That is, we calculate α̂∗ and spread process e∗t = yt∗ −α̂∗ xt which should follow an AR(p̂) process. Then we estimate the coefficients of this AR(p̂) process and calculate the test statistic S ∗ : S∗ = β̂ ∗ − 1 . σ̂β̂ ∗ By building a lot of new paths yt∗ and calculating the corresponding test statistic S ∗ , we can calculate the density of these bootstrapped test statistics. Then we can see if the test statistic of the real pair is exceptional. The estimated density should also give an indication for the critical values of the Engle-Granger method. Results The ten provided pairs are named pair I, pair II,..., pair X. We start with a pair for which all three information criteria indicate that the spread process is AR(1), pair II. By spread process we mean the residuals from the first 121 regression, et . The two stocks used are the same stock but listed on different exchanges. The spread process is shown in figure 6.8. This is not necessarily the spread we trade, in chapter 2 we discussed the adjustment parameter κ which can result in a different spread. With pair II we will find κ = 0, so the spreads for this pair look the same. In pair trading, this is as good as it gets: we have a large number of trades, we never have a position for a long time and the risk of the two stocks walking away from each other is minimal because they are in fact the same. 0.1 0.0 −0.1 . .. ... ... . ... . ... .... ....... .. . .. .. .... .... ...... .. ... .. . ..... .. ... ..... ......... ... ......... ............. .. .... ...... .. .. .... .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .... .. .. . .. .. . . . . . . . ... . ...... ...... .. . ... . . ..... . ... ........ ... ..... ......... ... ..... ......... . ............... ..... ............................ ... ... ..... ... . ...... .. .. ... ..... .. .. ....... . .. ... .. ... .. ..... . ... ... ... .......... ............................. ....... ....... ................................... .................................... ......... .. .... ..................... ............ ................. .... .. ................ ............... ... ........ ........ ....... ........... .... .... ....... .......... .......... ............... ..... ........ ........ ........ ....... .................. ... ............... ...... .................. ....................................... ............ ........................................................................ ............................... ............ .... .............................. ............ ............. ............... .......... ..................... .... ........................ ............. .... ..... ................ .......... ..... ......... ........ ..... . ................. ...... .............................. ............. ............ ... ........................... ... .................................... ........................ ........................... ......... ..................... ................ ............... ................ .......... ... . ........... ............... ....... .......................... .................. ................................................... ............... .... ................................. .... ...... ...... ......................... ................ ................................................. .................................... ...................................................... .............................. ................................ ............................................. .............. ................. ........ ........... .................. ................ ............... ..................... .... .... ..... .. ............................... ...... ... .................... ... ... . .. ................... .. ... .......... . .. ............................... ....... ...... .......... ................................. .................................. ....................... ....................... ..................... ...... ...................................... ................. ................... ....... .............. ...................... ........................ ............ ... ................................... ... ......... .... .... . ... .. ......................... ..... .... ......... ................. ..... ............. ....... ...... ... ..... ......................... ...... ..... ........... .................. ...... ..... ... ...................... ......... ........ ............ .................................. ............................. .. ...... ... ..... ...... .. . ...... . .... .. .. ..................... ..... .... .......... ................. ............ .. ....... ................. ..... ... .......... ........ .... ... ..... ........... ... ............. ....... .... ....... ........ ............... ... ... ... .. .......... .............. ... . ........ ...... .. ... .... ...... . .... ..... . .. ... ...... .......... . ... ... .. .. ...... ...... ..... ...... ......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... .. . . . . . . ... .. .. ........ . . . . ... . ... .. .. ... ... ... .. .... . .. ... . .... ... ... .. . ... . . . .. .. . .. 0 100 200 300 400 500 Figure 6.8: Spread process pair II. The spread series look stationary and according to the Engle-Granger method the two stocks are cointegrated. The test statistic is -17.5, compared to the 1% critical value which is -3.44, we see that the null hypothesis of no cointegration is rejected. Applying the bootstrap procedure on this data set, we get figure 6.9. The dashed line is the density of the Dickey-Fuller case 2 test statistic. This figure does not give an indication that the density of the Engle-Granger test statistic differs from the Dickey-Fuller case 2 statistic. 122 ............. ... ... ... ... .. .. . ......... .... ..... . . . ... . . .... . . . . . ... .... .... ... . .... . . ... .... ..... . ... .. . . ... .... ....... .. .. . .. ....... ..... . .... ..... .. . ... ... . ... ..... ... .. . . ...... ....... ..... . ..... . ..... . ...... . . .. ... . ... ... .. ... . ... .. . ... ... . . ... .... .... . .... ...... ..... ... ...... ......... . ..... ......... ........... .. . . . . . ....... . . . ................ ...... . . . . . . . . ................................. .... ........................................................... ....... ....... .................................................................................... 0.4 0.2 0.0 −6 −4 −2 0 2 Figure 6.9: Estimated density for EG test statistic by bootstrapping from pair II. Let us consider a pair for which all information criteria say that the spread process is AR(2), pair VII. The spread process is shown in figure 6.10. It does not look as good as figure 6.8, but this still is a good pair. According to the Engle-Granger method the stocks in this pair are cointegrated, the test statistic is -4.65. The bootstrap procedure results in figure 6.11. The estimated density coincides with the density of the Dickey-Fuller case 2 statistic. 1 0 −1 ... ..... . ........ ........... .. . ... ....... ... ... . .. ..... . .... .... .... .. ... .... . .............. ....... . ..... . . .. ..... .. . . ...... .............. .... .. .. .. . ...... ..................... .. ... ... ..... ............... ....... ....... ........... .. .. ............ .. ... ... .... ...... ..... .... .... .. ..... ........ ..... . ....... .. . .... ................... .......... .... .... .. ..... ...... ........ .......... ...... ............. .......... . ....... ........ ..... ..... ..... .... ............. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ............... .. . .. . ....... .. ............ .. .... ........... .... .. .... ... ..... . ..... .. .. .. . .. ...... .................. ... .... ...... ... .. ... ..................... ..... .... ....... ....... .......... ......... ....... ... ... .... .. ....... ...... .. ... .. ............... ... .......... .......... ... .... ... . . ... ......... ........... .......... ........ ...... ...... .................... ... ........ ... . .. .. .... ........... .. ..... ............. ............. ............ ........ ................ . .. .... ... .................... . ... ..... ..... ... ........... ........ ...... .... .. .... ............. ... .... ... .... .. . . . . . . ............ . ................. ......... ....... ....... ..... . . . . .................. ........ ......... ...... ... .... ...... .... .... ... ........ ..... . ...... ............... ............ ... . ..................... .......... .. ... .. ..... .. .. ........... ... .. ..... .. ... .... ................. ... ..... ... .... . . .... .. .. ..... . ..... ............... . . .. ... .. . ...... . . . . . .... ... . 0 100 200 300 400 Figure 6.10: Spread process pair VII. 123 500 0.4 0.2 0.0 .............. ... .......... ... ........ ... .... .. . ... ..... ..... .... ..... . .. ... . ... ..... . ... .. ... . ...... ...... . .... .. . ..... .... . ...... .... . ... .... . ...... .... ..... . .... .. . ...... ... . .... . . . ..... .. . ...... . .. ... . . ....... .. . . ...... .. . .... . . ... ........ . . . ...... .. . . . ..... . ....... ... . . . . ......... ... . . . . .............. . . . ..... ..................................................... ....... ................................................................................. −6 −4 −2 0 2 Figure 6.11: Estimated density for EG test statistic by bootstrapping from pair VII. Let us consider a pair for which all information criteria indicate that the spread process is AR(3), pair VI. The spread process is shown in figure 6.12. This looks a lot less interesting than the previous figure: initially the spread is below zero for a long time and at the end the spread is above zero for a long time. This shows that trading the spread would have resulted in only a few trades and we would have had the same position for a long time. But this is not necessarily the spread we trade as stated, in the next chapter we will see the spread we would have really traded. According to the Engle-Granger method the stocks in this pair are not cointegrated, the test statistic is 2.23. The null hypothesis of no cointegration is not even rejected at the 10% level. The bootstrap procedure results in figure 6.13. The estimated density is a bit bumpy but still coincides with the density of the Dickey-Fuller case 2 statistic. Even when the real data is not cointegrated, according to the Engle-Granger method, the bootstrap procedure finds nearly the same density as the density of the Dickey-Fuller test statistic. So far we have seen pairs for which all information criteria find the same small value of p. IMC also provided a pair for which the information criteria find p to be very large, pair V. As described in chapter 3 we fit an AR(k) model, for k = 1, . . . , K, on the data and see for which k the criteria have to lowest values. For this pair, even if we set K = 100 the criteria have to lowest value for p = K. This indicates that the spread process does not follow an AR(p) model. The spread process is shown in figure 6.14. It is obvious that 124 ... ..... ....... .... ...... .. ....... .... ....... . . . .. .. . . . . . ........... ..... ....... .. .......... ... ..................... ... .................... . .............. ...... .. .... .......... ... ........................ .... ............................ ........................ ...... . . . . . . . . . . . . . . . . . . . . . . . . ........ .. . .... ...... .. .... . . ... .............................. . ...... ........ ... . ................................... ....... ..... .. ........... .. ... .... . . .. ........... .... . ...................... ... . ..... ...... . . . . . . . . . ........... .. . .... ......... ................ . . . . . . . . . . . . . . . ... ........... ...... ............................ .... ........ ...... ......... ................. .... . . . . . . . . . . . . . . . . . .. . . . . . ............... . . ........ .... .. .. . ..................... .......... ......................... ...... ...................... ......... ........ ..... .......... ........ .... ...... . .... . ........ ...................... ...................... ....... ..... . ...... .. . ....... ... ............................ .......... ....... ... .................. ... .. ... ... .. . ......... . ... ............................... .. ................. ........ .... .... .......... ..... ...... ..... .... ...... ..... ............. ... ....... ... ........ ..... ... ...... ....... ......... ............. ... .. ... ...... . . . . . . .... ...... .. ..... . . ... .... .. 2 0 −2 0 100 200 300 400 500 Figure 6.12: Spread process pair VI. 0.4 0.2 0.0 .. .. ........... .......... .. ..... ... ....... ... .... .... ...... ... . .... ... . ..... ...... . ...... .. . ... ...... . ....... .. . . ... . .. . ... .. . . ... . ... ... . .. . ... . . ... ... .. .... . ... . .. . ... .. .. .... . ... ... .. . . ... . .... ... ... . .... . .... . .... .. .. ....... ...... ..... . . ...... ... . . ..... ...... . ..... . . ... ............ . . . . . . ........ ......... . . . . . ............ . . ........... . ...................... . . . . . . . . . . . . . . ....................... ....... ....... ....... ............................................ −6 −4 −2 0 2 Figure 6.13: Estimated density for EG test statistic by bootstrapping from pair VI. this ’pair’ is not suitable for pair trading. The Engle-Granger method does not reject the null hypothesis of no cointegration because the test statistic is -1.04 when p = 10 and 0.63 if p = 100. To apply to bootstrap procedure, we set p = 10. The result is shown in figure 6.15, which coincides surprisingly well with the density of the Dickey-Fuller case 2 test statistic. We examine these and the remaining pairs further in chapter 7. In this chapter, we have seen no reason to assume that the test statistic of the Engle-Granger has a different distribution than the Dickey-Fuller case 2 test statistic. The power of the two tests are also comparable. To find out if the Engle-Granger method is also ’robust’, we apply the method on generated data which do not fulfill the null hypothesis. 125 200 100 0 −100 .... ......... ... .... ...... .... ... ...... ..... ... .... .. ........ .... .......... ... .. ... .... ................ ..... ......... ........... ......... ... .. .. . . . . . . . . . . . . ............ ... . .. ......... .... ....................... ............ . ....... ... ...... ...... ......... .. . ... ... ... ... . ......... ..... .. ... .... ...... ... ............ . . ... .. ...... ...... ....... ... . . . . . . . . . .... .... . .... ....... ... .... ... ...... . ... .. ........................ ... ..... .. ............ ... ... ... . ............. ..... .. ..... ... ...... ......... .. ....... .... ...... ...... . ............ .... .. ............. .... ..... .... .......... ...................... .............. .............. . . . . ... ........... ................ . . . . ............. ..... .......... ............... ............ ............... ............ ...... ...... . ..... .. ........ . ........... ... ............ ........ ... .... ...... ..... ..... .. .... .. ... .. .......... ... .. ............ ...... . 0 100 200 300 400 500 Figure 6.14: Spread process pair V. 0.4 0.2 0.0 .. ....................... .... . ... ..... ..... ... ... .... ... ....... ... . . .... ... . ...... ... . ... .. ..... . .. ...... . .... .. . ...... .. . .... . . . ..... ... ...... . ... ... . ...... .... . .... ... . .... ... ..... . ..... .... . .... ... ....... . .... ...... . . .. ...... . . . .......... ....... . . ..... . ... . ............ . . . . ........ ....... . . . . . .............. .... . . . . . .................................. . . . . . . . . . . . ............................................... ....... ....... ................................................................................ . −6 −4 −2 0 2 Figure 6.15: Estimated density for EG test statistic by bootstrapping from pair V. 6.4 Engle-Granger simulation with alternative method In section 4.5 we found a method for generating cointegrated data which do not satisfy the assumptions of the Engle-Granger method. The generated data is integrated of order one, but the spread process is not likely to follow an AR(p) process. It is interesting to find out if the Engle-Granger method is robust enough to see this data as cointegrated. 126 We will generate data such that the difference process zt follows an MA(2) model: ¸ · xt − xt−1 = zt = Θ2 wt−2 + Θ1 wt−1 + Θ0 wt , yt − yt−1 where wt is i.i.d. N2 (0, Σ) and Θ0 = I. Then xt and yt are cointegrated if matrix (Θ2 + Θ1 + Θ0 ) has eigenvalue zero and the corresponding eigenvector is the cointegrating relation, which we normalize to [−α 1]′ . For example, the matrix ¸ · 4 2 , −2 −1 has eigenvalue zero with eigenvector [−1/2 1]′ . So one possibility to generate cointegrated xt and yt , is · ¸ · ¸ · ¸ 2 1 1 1 1 0 Θ2 = , Θ1 = , Θ0 = . −2 −4 0 2 0 1 There are no restrictions on the covariance matrix of the innovations wt , Σ except that it is a covariance matrix, so it must be symmetric. We start with a diagonal matrix, so the innovations are independent. For Σ = cI, table 6.5 shows the number of rejections of the Engle-Granger test for 1,000 different paths xt and yt . Although the spread process in this section is not of autoregressive form, the Engle-Granger method fits an AR(p) on the spread process. The value of p is again estimated with the information criteria from chapter 3, the maximum value of p, K, was set equal to 10. The table also shows the average of the estimated values of p. Unfortunately, the EngleGranger method does not perform very well. It sees on average 12% of the generated paths as cointegrated on a 10% level. The average of estimated p values is high, which means that it is difficult to fit a good AR(p) model on the spread process of the data, which in turn is not strange because the spread process does not follow an AR(p) model. 127 Table 6.5: Number of rejections, Σ = cI. c 1% 5% 10% p̄ 2 47 89 147 9.9 1 25 77 125 9.6 0.5 24 61 116 9.1 0.1 11 49 102 7.6 Consider the situation when the innovations are correlated, we take Σ of the form: ¸ · 1 ρ . A= ρ 1 Table 6.6 shows the number of rejections of the Engle-Granger test for different values of ρ. Even for ρ = 1 the Engle-Granger method does not perform well. Table 6.6: Number of ρ 1% 5% 1 34 86 0.5 31 67 rejections, Σ = A 10% p̄ 140 9.8 134 9.7 To see what happens, figure 6.16 shows the spread process of a realization xt and yt . This does not look stationary, there seems to be a trend in the spread process. This could mean that with this setting there a is a constant in the cointegrating relation, α0 . Figure 6.17 shows the spread process if we regress the same realization of yt on the same xt and a constant. It is clear that there is a constant, α0 , in the cointegrating relation. Neglecting the constant, results in a spread process which is not stationary. That is why the Engle-Granger method does not reject the null hypothesis of no cointegration. Table 6.7 shows the number of rejection when we do not neglect α0 and take Σ = cI. The maximum value of p is set to 5 in the remainder of this section, to reduce computation time. It is clear that the Engle-Granger method performs very well, almost every path is seen as cointegrated (which is true). 128 250 200 150 100 50 0 ... . . ........ ...................... . ......... .......... . ... .. ..... ... .. ... . . . . .... . . ..... .. .. ...... ....... .... ........ .... ......... .......... ... .. ............ .. .............................. ...... ... ...... .. .......... ..... ..... ................ .............. ........ ... .... .. ........ .... .... . ..... .. ... ... ..... ........ ... ... ....... ....... .... . . . . . . .... . ..... ........................... . . . . . . . . . . . .... .... ...... ......... .. . .. .......... ....... .. ........ ......... . ........ ..... .. . ...... . .. .... ..... . . ...... . ............ ... .. .......... . ................................... ........ . . . . . . . . . . . . . . . . . . . ... ... . . .. .... ... . .. . . . ..... .......... ........ .......... . ... .. ............ .... .... . ... . ...... .. .. . ...... ........... .. ................ .... ....... .. ...... .... ... .... ... ........ .. ... ........ ........... ............. .... .. ... .... ......... ... ... .............. . . . . . . ...... ... ...... .. ....... ... ............ ........... ... .. ... ... ...... ........ .. .. ... ............. ... .. ... .. .. . .. ... .. ..... .. ... .. ... .. ... .. .. . .. ........ ... ... .... .. .. ..... ..... ........................ ... ...... .... ... .... . . .... 0 100 200 300 400 500 Figure 6.16: Realization spread setting 1, ρ = 0.5. 10 5 0 −5 −10 . ... ... ... . .... ...... ...... .. . .... . .. .. .. .. ..... ... ..... ... . ..... ... .. ... ..... ...... ... ....... ... ... . ............... ........ . ... . .... . ... ..... ...... ... .... .. .......... . . . .... ...... ... ... ... .................. ... .. ..... ....... . . . ............... ....... .. ..... ... . .... ... ........ . ... .... ....... ... .......... ... . .. ....... .... .... .... ................... ...... ... . ......... ...... ... .. ...... ....... ........ ................... ...... ... ............. ... ... .......... ... .... ...... ........ . ...... . ......... ... ... ... ......... . ... ..... ..... ........... .. ........ ... ... ... ...... ............ .. .... .. ... ..... ....... .. ..... ... .. ... .. .......... ............ .... .. ........... ... .. .. ........... .... ..... .... . ....... .. ..... . . . . . . . . . . . . . ......... ......................... ................ ... ...... . .................. ...... ...... ............... . ...... .......... ... ........................ ........ ... ...... ... .......... . ....... ............ .. ........ ....... .. .. . .... . ..... .. ..... .............. .. ........ .......... ..... ............ .... ..... .. ......... ....................................................... ............... .................. ......... ....... ............ .... .................. ...... ........................ ..................... ...... ... ............ ... ......... ................... .......... ................ ................ ........ ...... ................... ...... .................... ........ ........................... ......................... .. ......................................... ............................................... ............ .... ........ ................................................. ............ ................ ............... ....... ......... ........ ........................... ......... ............. ......... ...................... ...... ... ........... ..... .................................... ...................... . .......... ...... ......... ............... ..... ............................ ................ ................ ................................... ... ........................... ................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................... .......................................... ...... ................ ........................ ........... ........ ......... ........................... .. .................. ................................................... .................. ........ ............... ...... ...... ............ ... .. ....... ........... ..................... .... ... .. ...................... ................... ............................................ ......... ...... ......................... .................... .... .... ...... ..................... .............................. ............... ......... .......... ...... ................................ ...... .................. ............................................................. ........................... .............................. ......................... ...... ...... ... ... . . ............. ........................... ...... ... ... ......... ...... ... ..................... ........ ................... ..... ..... ............. .... .. ............. ................. ...................... ......... .... ...................... ....... .... .......... ...... ......... .................... ...... ......... ... ................ ......... ............. ................................................... ............ ..... ......... ................ ....... .................. ........ .......................... ........................ ..... ...... ... ... ...... ...... .............. ...... ..... ...... ...... ......... ......... .............. ................... .......... .. .................. ...... ... . ... .... ........ ...... ..... .. ............ ................. ........ ........ ............................................. ...... ............... ..... ........ .............. ................ ... ........... ...... ... ..... ... ................. ...... ..... ....... ... .... ............ .... ..... .... ... ......... ...... ............ .... .. .... ........ .... ......... ................ .... .... ................. .......... .. .... .......... .. .... ............ ..... .. . ......... .... ...... .. ........ .... .... ...... .......... ............. ....... ... ...... .......... ... ...... ... ... ........... ...... ........ ...... ...... ... ...... ............ ... ............ ............ ... ...... ...... ............ ......... .......... . ...... ... ...... ...... ............ ...... ... ... ...... ...... ........... ............ ..... ... .......... ....... ......... ..... .. ..... ...... .. ... ....... ... . ........ .. ..... .. ........... ......... ... ........ .... ..... .. . ..... ........ ... ......... ..... ... .. ..... ..... ......... ........ ..... ..... . ... .. ...... ... .. ...... ... . . . . . . . . . .... ... . ... ... .. . . . . . . . . . ...... .. .... .. ... ... .... . ... ... ........ ......... ... ...... . ..... .. ..... ...... ... .. ..... ... ... ..... ... .. .. .. . . . . . ... ... . . . . . . . . . . ... ... .. ..... .... .. . .. .... ... . .... .... .. . ..... ... ... .. .. ... ... ... .... ... ... ... ... ..... ... .. . ... ... ... ... . ... . ... ... ...... . . ... . ... ... . .. ...... ..... ... . ..... . .... ..... .... ... .. 0 100 200 300 400 Figure 6.17: Realization spread setting 1, with α0 . Table 6.7: Number of rejections, with c 1% 5% 10% 2 1000 1000 1000 1 1000 1000 1000 0.5 999 1000 1000 0.1 1000 1000 1000 129 α0 and Σ = cI. p̄ 4.7 4.5 4.3 3.5 500 So far, we have generated cointegrated it to be cointegrated, which is data with at a different setting of parameters: ¸ · · −1 1 −1 , Θ1 = Θ2 = 2 −1 0 data but not in the way we want a small or no constant α0 . We look 2 0 ¸ , Θ0 = · 1 0 0 1 ¸ . The matrix (Θ2 + Θ1 + Θ0 ) has eigenvalue zero with eigenvector [−1 1]′ . Figure 6.18 shows a realization of the spread process, when yt is only regressed on xt and not on a constant. In other words, we neglect a possible α0 . 10 5 0 −5 .. . ... ... ... . .. ... .. .. .. . . . . .... . . . .. ... ... .. ... .. .. ... ... ... ... ... . .. ... ... .... ... . ... ... ... .. ... ... . ... ... .. .. .... .. ... .... .. ... ... . ... .. ... ... ... .. . ..... ... ... ..... .... ...... ... ... ... ... .... ... ..... .. .. . ... ... ... ... ... . ... ...... ........... ... ...... ....... . .. ... . .............. ... ... .. ............ ......... ... .. . ... . .......... ... .................. ...... ...... ........ .......... .. ....... ... . .... .. ... .. ... ... .. .. .. .... ..... ... . .... . .... . ...... ... . .. .. ... ...... .. ... .. ... .......... . ..... .. .. .. .. .. .. ........... ... .... ....... .... ...... .. ........ .. .. ......... ... ......... .. .... .... .. . .... .. .... ........ . .... .. .... ............ .. ........ ... ... .. .... ... ... ...... . .... ......... .. .... ....... ..... ......... ..................... .. ..... ................. ...... ..... ...... .. .......... . . ................................... .................. .. ..... ......... ................. ..... ........ ..... ................ ..... .. ..... ........................ .................. ..... .. ....... ... ............................ ... ...... ............. ... ........ ................. . ........ ............... ............................ ..................................... ........................... ........................ ................... ............................................................................... ............ ............................................. ............ .......................... ...... ...... ............................................... ................... ............................ ........................................................................... ................... ...... .................... ............................ ............................................................................................................................................................. .................................................................................................................................................................................................................................. ..................................................................................................................................... .................................................................................................................................................................................................................................. ............................................................................................................................................................................................................................................. ........................................ ............................................................................................... .................. ..... ............................................... ................................................................... ... ........................................ ..................................................................................................... ...... ....... ................................ ........... ......... ................................................................................... .... .. ....................... .................................................................. .... ........................................................................... ... .............................. ........................................................................ ................................................ ........ ......... .......... .................. ............ .......... ........................ ...................... ... ..... ............................ .... .................... ..... ................... .............................................. ....... ...... ............................................................................. ... ......................................... ........................................................... ........................................ ........ ........................ ............. ..................... .... .................... ... .......... ..... ..... ......... ........ .. .......... ....... . .. .... ... ........ ........ .. .... ........ .......... ............. .............. ... ........ ... ....................... ..... ..... ............................. .... ................... . .......... ........ ......... ..... ................. ... ........... ......... ... ............... ........................................ ......................... . .. .. .... .... .. .. ............... ... ....... .. ... ......... ..... ........ ....... . ... ... ..... ......... .. ... .. .. .. .. ......... .... ... ....... .. . ...... ........ ... ...... ......... .. .. .. .. .. .... ... ... ... ... .... .. .... ... .... .. ...... ........ ... .. ... ... ... ...... . ... ... .. ... ... .. ......... . .. ... ... ..... ... .. . .... . .. .. ..... ..... .. . ... .... . . ... ... . . . . . . . ... ... . .. . .. .. ..... ..... .. . ... .. ... ... ... ... .. ... ... . ... ... ...... ... ... ... ... . . ... ... .. .. ... ... . . ... ... .. ... 0 100 200 300 400 500 Figure 6.18: Realization spread setting 2, ρ = 0.5. Neglecting α0 with this setting is no problem, the spread process seems to be stationary. We wish Engle-Granger method to see the corresponding xt and yt as cointegrated. Table 6.8 and 6.9 show the number of rejections for Σ = cI and Σ = A respectively. The power of the Engle-Granger test is good, almost every time the null is rejected. The Engle-Granger method performs very well, even when the spread process does not follow an AR(p) model, the test behaves exactly how we want it. If there is a large constant α0 in the cointegrating relation, it does not reject the null hypothesis of no cointegration. Although the data is cointegrated, it is not cointegrated the way we want it, that is with a small or no α0 . If there is a small or no constant in the cointegration relation, the test has rejected the null hypothesis almost every time. 130 Table 6.8: Number of c 1% 2 946 1 974 0.5 981 0.1 995 rejections, 5% 10% 980 992 988 997 992 995 998 999 Table 6.9: Number of ρ 1% 1 991 0.5 970 rejections, setting 2 Σ = A. 5% 10% p̄ 996 999 4.7 991 996 4.6 131 setting 2 Σ = cI. p̄ 4.83 4.75 4.60 3.95 132 Chapter 7 Results In this chapter the results for the ten pairs IMC provided are discussed. To be clear, IMC provided 2 years of historical closing prices for each stock of the ten pairs. According to IMC, among these ten are some very good pairs which means they make high profits. Some are losing money and some are mediocre. In the first section we apply the trading strategy to the historical data to see which pairs would have been profitable and put the pairs in order of profitability. We like to see if the stocks in a profitable pair are cointegrated, and if the stocks in a pair that loses money are not cointegrated. In other words if profitable and cointegration coincides. We apply two different cointegration test, the Engle-Granger and the Johansen method, but first we will examine in the second section if the assumption of the price processes being integrated of order 1 is fulfilled. In the third and fourth section the results for respectively the Engle-Granger and the Johansen method are stated, the pairs are put in order of the levels of rejection of the cointegration tests. 7.1 Results trading strategy The 10 pairs are named, pair I, pair II,..., pair X. In chapter 2 the trading strategy was explained. For each pair, we need the first half of observations to determine the parameters of the strategy and we apply the strategy to the second half. In order to compare the results we trade the same amount of money with each pair. With each trade we buy one stock for the amount of ¿ 10,000 and sell the other for roughly the same amount. The sell trade is not exactly ¿ 10,000 because of the positive or negative ’investment’ of 133 threshold Γ, as explained in section 2.3. The results/profits are shown in table 7.1. The traded spread processes of the 10 pairs are shown in figure 7.1, these are the spreads with the adjustment ratio if present. The upper left corner is the spread for pair I, upper right corner for pair II, and so on. To be clear, the spread of the second half of observations is displayed and this is the spread which is traded. The dashed lines are the corresponding thresholds Γ. Table 7.1: Results trading strategy. parameters result pair Γ κ # trades profit I 2.33 5 3 1129 II 0.02 0 25 5536 III 0.16 2 4 506 IV 0.77 5 7 1344 V 19.68 8 0 0 VI 0.48 1 4 495 VII 0.13 1 11 2293 VIII 030 2 10 2091 IX 0.12 2 4 141 X 0.30 2 12 2304 Even the highest profit may look a bit small, but recall that we do not have to invest a lot of money. On the other hand to loose the same amount as the highest profit, the two stocks have to walk 50% away from each other in the wrong direction, which has little chance of occurring. Profits above ¿ 1,000 are considered good enough to trade, profits below ¿ 1,000 are considered not to be worthwhile. But profit is not the only criteria, the number of trades is also important. Obviously, the more trades the higher the profit. But this is not the only reason, in chapter 2 was explained that traders do not want a position for a long time because that involves risk and the number of trades is an indication for this. According to IMC, pair IV is still a good one. We get exactly the same selection of good and bad pairs as IMC if we set the minimal amount of trades equal to 7. IMC already decided which of the 10 pairs is a good one and which is not based on trading experiences, before providing the data. A pair is considered good enough to trade if the profit is 134 5 0 −5 .. .... . ...... ....... ..... ........ ......... .... .. ......... ...... ... ... ...... .... . .......... ... .............. ....... ....... ........ ....... ....... .......... ........... ........ ....... ........... ................ ....... .. ............. . ... .... .. ......... .. .... .... .............. .. ... ........... ... ......... . ... ...... ......... ........... . ........ .... .. .... ..... ........... .......... .......... .... .......... ............ .... .... .......... .... .. .... .... ....... .......... ... ... ............... ........ . ... .. .. .... ....... ..... .... ... ................. . . . . . . . . . . ...... .......... ....... .... ............. . ................. ............ ...... ......... ......... .. ........ .. .... ....... .... .... ............ ..... ...... ... .. .... ... . ..................... .. ..... ...... . ... .... .. . . .. ..................... ..... ........ . . ... ....... . ............ ... . . ....... ....... ....... ............ .......................... ....... ....... ....... ....... ....... ....... ....... .. .. ... .. . ... .. ........ ...... .... .... ...... .. ...... .. ...... ......... ...... ........ ........... ........ ... ....... ... ....... ... ... ..... .... .. . .. . .. ...... . . . . . . ............................. ..... . . . . . . ........ ............. ........ .......... ................................... ....... ....... .................... ....... ....... .......... .. ... ......... ........ . .......... . . . . . . . ....... . . ....... ....... ................ ............ ........... ....... ....... ....... ....... ........... .............. ....... ......................... .. ......... .......... ... ... ................ .... ... ......... ... .............. ... .... ..................... . ...... .... ........... ... 300 1 0 −1 10 0 −10 1 0 −1 0.5 0.0 −0.5 300 400 400 400 500 400 500 ...... . . .. ............................. ....... ....... ....... ....... ....... ....... ........ ........................ ....... ....... .. . ....... ..... ...... .. ... . ...................... ........ ....... .. ... ... . .. ...... ............ ...... ......... . ....... . . . . . . . . . . . . . . ....... ....... ...................................... ....... ....... .......... ........ .............. .............. ...................................... .. ... .... .............. . ... ......... ... .. ..... .. . . ..... ........ ................. .. ..... ..... .... .... .... .... .. ............. ...... ...... .. .. .. ...... ..... .. ... .... ... .. .................... ......................... .............. ..... .. 300 400 −0.1 300 2 0 −2 1 0 −1 500 . ...... . ... .. ... .......... . ........... .... .. .. .. ........................ .......... ....... .... ... ... ...... . . . . . . . . . . . . . . . . ...... . .... . . ..... .... . .. . . .................. ... ... .... ... .......... ................ ..... ..... .. ... .. ........... ............. ....... ................................................................... ....... ....... ................................. .................. ................................... . .. .. ........... ... ......... ......... .. . . .... ... .................. . ... ... .......... ...... ...... . .. .. . ... . ...... ...... ....... ............ ............. ....... .......... ................................. ........................................... ......... ....... ................ ......... .. .... ...... .. . ... .. ..... . .. ...... .... ..... . . . . . . . . . . . ....... . . . . . . . . . . . . . . ............ ........... .. ........ .. . . ...... . . . . . .... ... .. ..... . . . . . . .... ... ... ..... ............ .. ..... 300 0.0 500 . .. ...... ... .... .. ...... ..... . ......... ... ............. ......... . . .... .............. ....... ... ....... . . . . . .. ... ... ... . .... .. .. .. ......... ................. ... ..... .. ... .... .. ............... ... ... .. ... ... .... ... .. ...... ...... ........ ........................... ... .. .... .... .. ..... ................. . ................... ...... .... ......... ............................. . . . . ... . ........ .. . . ....... .. ....... ... ... ... ... . ..... ............ .............. ........ .. ..... ...................... .......... .... .. ... .. .. .. .. .... .. ..... ...................... ......... ...... ...... ......... .... .. . .. .. ... . .. ........... .......... ..... .............. ........ ........ ........ ..... ... ... ..... .. .. ... ... . . . . . . ...... .. ... ... .... .... . . ... .. ........... ........ .. ....... ... ..... .. ... .. ................. . ... ... ............ ...... ........ ....... ... ....... ... ............ ...... ... .......... .... .. .. .. ... ................ .. ...... ..... .. ..... .. ... .. . .............. ... ..... .......... ... ........ .... ................ ... ..... .......... ... .......... ... ........... ......... ........ . . . . ...... ............. ...... . ...... ...... ........ ...... ...... ... .... ... ... .. .. 300 0.1 . ... ... .... .. ... . ....... .. ..... . ... . ..... .. ...... .. ..... ............... ........................ .............. .. ............ ..................... ....... . . .............. ................ ............................... .................................. .......... ....... ... .......... ....... ....... ....... .............. .. .. ............. ........................... ......................... ............. ... .................................... .... .......... . . ....................... . .. ........ . ... ...... ........................ ...... .............. ............. ........ ....................... ......... .. ............................ ........ ... ................. ... ... . ............... . . . ................ ............ ....... ........................................... ............................ ....................... .............................................................. .................................................................................................................................. .. ......... ..... ..... ... .. .. ......... ............................ ...... ....... ...................................... ..................................................... .. ........ ....... ...... ... .... ...... .. ........... ............................. .... .... ... ... .... . ...... . .. ..... ....... .......... ...... ........................ ... .. ... ...... .... ...... . .. .. ... . ..... ..... ........ .... ... . ... .. 0 −1 2 0 −2 −4 500 300 400 500 400 500 ... ..... .. ..... .... .... .... .... ....................... ........ .. ........ .. ...... .. ... .. .............. ... ............ .. .... ........... ... ........ .... ... ...... ... ........ . ..... . . . . . . . . ... ..... ... ...... ........ ....... .............. .... ....... .. ............ ............ .. ......... .... ........ ... . . ..... .. .. ........ ............... ....... ....................... ....................................... ....... ....... .................................... ....... .............. .... .......... ......... .............. .. ...... ... ...... .. .. .... .. ....... .. ... ...... .. ........ .. ....... . .. .. .. .... ..... ... ..... ..... ... .. .................... .... ... .... .... . ... ..... ..... ..... ... .. ...... ..... ... . ......................... ...... ... .. . . . . . . . .......... ....... ......... . ...... ....... ....... ....... ....... ....... ....... ....... ....... ........... .. . .. ... ...... .... ..... .. .. .. ..... . ..... .. . . ...... ... . .... ........ ......... ............. . ........ .... ............... . . . ......... .. ........ . ..... ......... ... .. ... .............. ..... ........ ... . .... .. ... ... ................ .. ...... . .. ...... ... ... ........ ... .... . . . ...... ............. ............... ... . . . ... ...... .. .. .... .. . . .... . . .................................. ........................... ....... .................................. .... ....... ....... ........... .............. ....... .......... ......... . . . .. . .. . . ....... .............. ............. ....... ....... ....... ........ ................. ....... ................ .......... ....... ........................ .... .. .. . ..... ... . . . . . ...... ... . . . . . . .. .. . .... . ...... ... .............. ... ... ... ....... ....... ...... ... ...... ... .... . ... ..... . ...... ... .. .... ..... . ....... .. ... ... .. .. .............. .. .. .. . ... ... .. . ..... . . . ... . ... .... ...... ...... ............. ........... ........ ........ ... ... .. 300 400 500 300 400 500 Figure 7.1: Traded spreads. 135 500 . . .. ............ .................. ........... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .. .. . .... ....... ..... ...... .............. ..... ... . ..... .. ... .......... .... ... ..... . ........ ..... ... ........... .. . .. ... ...... . .. ... .................... ..... .............. ........ . . . . .. ... . .. . ... ....... ... ...... .......... ........... .......... ....................... ......................... ..... ............. .. .... ....... ........ ......... .................... ........................ ..... ............ . .... . . . ....... ........................................ ........... ......... .................. ....... ....... ............................... ................... ....... .. .......................................... . .......... .... .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... .. .................. .. ....... ............. . ................ .... ..... ... ..... ... .............. ................... .. ...... ...... .. .. ... ........ ...... ...... ....... .. .................... ......... .... ... .. . . ... ... ... ... .......... .. .... ... ... . ......... .. ............. ... .. .............. .. ....... .. .... ... ....... ................. ... ....... ... ...... .................. .. .......... ..... .. ............... . ....... ..... ............ ......... .... ... ........ ... . .... .. ... 300 1 400 ... ................ ............. ......... .... . .... .. ...... ....... ... ... .. ..... ... ... ... .. ........... .... ....... .... .... . .. . ... . .. . . . ....... .......... ........................................... ............... ............... ............................. ........... ......... ..................... ....... ......... .. ... ........ ... .................... ...... .... ... ...... .......... . ...... ...... ...... .... ..... ... ...... ..... ...... ......... .. .. ........... ...... .. .. .. .... ....... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . . .. ... .... ............ .... ..... .............. .... .. .... .... ........ ... ....... . ... .. .......... ... . ...... ............ . . ..... ....... ..... .......... .... ... . . . . . . . . . . . . . . .......... ......... ....... ....... ....... .......... ....... ....... ....... ....... ....... .............. ....................... ............. .. .. . ... ..... .. .... .... ..... ... .. .. . ... ... ... ... ......... . ........ . . . above ¿ 1,000 and the number of trades is no less than 7, otherwise the pair is considered not to be worthwhile. The ordering of the 10 pairs based purely on the results from the trading strategy, where the first five are considered good or good enough, the remaining five are not, is: 1 2 3 4 5 6 7 8 9 10 pair pair pair pair pair pair pair pair pair pair II X VII VIII IV I III VI IX V We briefly discuss the spreads from figure 7.1. The spread for pair I is rarely hitting its threshold Γ although the adjustment parameter κ is large. The spread for pair II looks good, but it could be better if we had used κ = 1. After t = 425 the spread is a relatively long time below +Γ, with κ = 1 we would have made a profit of ¿ 6721 in 36 trades. The spreads for pair III, VI and IX are rarely hitting their tresholds Γ, the adjustment parameter κ is small but increasing it does not have a positive effect. For pair III increasing κ to 5, results in a loss of ¿ 2,108. For pair VI the profit gets smaller, while the number of trades increases. The spread for pair IV shows the reason why we use an adjustment parameter, without it this pair would have traded twice with a total profit of ¿ 385. For pair V the threshold Γ is not displayed, because it is 19.68. Lowering the threshold results in a loss when we keep κ = 8, only when we also reduce κ to 1 or zero we get a small profit. The spreads from pair VII, VIII, and IX look good, they hit their thresholds Γ regularly and produce a nice profit. Changing the parameters slightly does not effect the number of trades, it affects the profits slightly. 136 7.2 Results testing price process I(1) Both cointegration tests require that the stock price processes xt and yt are integrated of order one. In section 4.2 was derived that it is reasonable to assume stock price processes fulfill this requirement, but in this section we will do a unit root test on the stocks of the 10 pairs to see if the requirement is fulfilled. The unit root test we use is again the (Augmented) Dickey-Fuller case 2 test, we perform the test twice. The first test is: H0 : xt ∼ I(1) against H1 : xt ∼ I(0) . The outcome should be not to reject H0 . The second test is: H0 : xt ∼ I(2) against H1 : xt ∼ I(1) , which is equivalent to: H0 : ∆xt ∼ I(1) against H1 : ∆xt ∼ I(0) . The outcome of this second test should be to reject H0 , which makes it likely that the price processes are I(1). The Dickey-Fuller test fits an AR(p) model to the stock price process, we estimate p with the information criteria from chapter 3 and set the maximum value of p, which is K, equal to 10. Table 7.2 shows the outcomes of both tests, where we used the following critical values 1% 5% 10% -3.44 -2.87 -2.57 since we have roughly 520 observations, T = 520. The stocks in a pair are denoted with x and y, the test statistic of the first test is stated along with whether the null hypothesis is rejected. The outcome of ’not rejected’ is denoted with symbol ¬ otherwise the level is stated. The average value of the estimated p also stated and the results of the second test are stated in the same way. The table shows that it is likely that all stocks from the 10 pairs are integrated of order one. 137 Table 7.2: Results I(1). stock I -x I -y II -x II -y III -x III -y IV -x IV -y V -x V -y VI -x VI -y VII -x VII -y VIII -x VIII -y IX -x IX -y X -x X -y 7.3 statistic -1.7 -2.1 -1.4 -1.3 -2.1 -2.2 -1.4 -1.5 -1.4 -0.3 -0.6 -0.6 -1.1 -0.9 -1.4 -1.5 -1.4 -0.9 -0.5 -0.7 Test 1 outcome p̄ ¬ 4 ¬ 4 ¬ 1 ¬ 2 ¬ 1 ¬ 1 ¬ 1 ¬ 1 ¬ 8 ¬ 10 ¬ 4 ¬ 4 ¬ 1 ¬ 1 ¬ 1 ¬ 1 ¬ 1 ¬ 1 ¬ 2 ¬ 1 statistic -11 -11 -12 -25 -22 -26 -24 -22 -8 -8 -12 -12 -25 -25 -23 -23 -25 -25 -23 -16 Test 2 outcome p̄ 1% 4 1% 4 1% 3 1% 1 1% 1 1% 1 1% 1 1% 1 1% 8 1% 10 1% 4 1% 4 1% 1 1% 1 1% 1 1% 1 1% 1 1% 1 1% 1 1% 2 Results Engle-Granger cointegration test In this section we perform the Engle-Granger test on the 10 pairs. We have found no reason to assume the Engle-Granger test statistic has a different distribution than the Dickey-Fuller case 2 test statistic, so we will use the same critical values: 1% 5% 10% -3.44 -2.87 -2.57 138 We perform the cointegration test on the whole data set, so we have 520 observations per stock. Recall that the profits were determined for the second half of observations. As stated in section 4.3, the Engle-Granger method is not symmetric. The results can be different for regressing xt on yt and the other way around. That is why we perform the Engle-Granger test twice. The results are stated in table 7.3. Table 7.3: Results Engle-Granger test. pair I II III IV V VI VII VIII IX X statistic 1 -1.57 -17.54 -2.21 -2.67 -1.04 -2.23 -4.65 -2.92 -0.63 -3.48 outcome α̂1 p̄ ¬ 1.53 6 1% 0.997 1 ¬ 0.72 3 10% 0.52 1 ¬ 5.03 10 ¬ 1.08 3 1% 1.60 2 5% 0.53 1 ¬ 2.36 2 1% 0.72 1 statistic 2 -1.56 -17.56 -2.21 -2.61 -1.08 -2.24 -4.64 -2.91 -0.90 -3.03 outcome α̂2 p̄ ¬ 0.65 6 1% 1.003 1 ¬ 1.37 3 10% 1.91 2 ¬ 0.20 7 ¬ 0.93 3 1% 0.63 2 5% 1.87 1 ¬ 0.42 1 5% 1.39 4 We see that there is only one pair where the outcome of the two tests are different, pair X. The estimated cointegrating relation are for all pairs practically the same: α̂1 ≈ 1/α̂2 . So the disadvantage of the Engle-Granger method of not being symmetric does not seem to be very harmful when testing pairs for cointegration. The pairs are put in order of the test statistic, the idea is that the lower test statistic the lower the level of rejection, which is more evidence for being cointegrated. For example, the Engle-Granger method rejects the null hypothesis for pair II even at 0.1% level, while pair VIII is only rejected at 5%. So there is more evidence that pair II is cointegrated than pair VIII, which is why we prefer pair II. 139 The ordering of the 10 pairs based on the Engle-Granger method is 1 2 3 4 5 6 7 8 9 10 pair pair pair pair pair pair pair pair pair pair II VII X VIII IV VI III I V IX where there is evidence for cointegration for the first five pairs, and no evidence for the remaining five. This ordering is not exactly the same as the ordering found with the trading strategy, but they coincide on what is good and what is not. The five pairs which are considered to be worthwhile trading are cointegrated and the five pairs that are not worthwhile trading are not cointegrated according to the Engle-Granger method. The first in both orderings are the same, this is the pair that consists of two equal stocks but listed on different exchanges. In the first half of the ordering, the good ones, only places 2 and 3 are switched, the others are at the same places. The second half of the two orderings differ a lot. 7.4 Results Johansen cointegration test In this section we perform the Johansen test on the 10 pairs. As discussed in section 4.4, we perform three tests: Test 1 Test 2 Test 3 H0 : 0 relations against H1 : 2 relations. H0 : 0 relations against H1 : 1 relation. H0 : 1 relation against H1 : 2 relations. The critical values for each test are in table 7.4, these are for a sample size of T = 400. Although the data of the 10 pairs IMC provided consist of 520 observations, these critical values will be used when testing the 10 pairs for cointegration. 140 Table 7.4: Critical values for Test 1% 5% 1 16.31 12.53 2 15.69 11.44 3 6.51 3.84 Johansen test. 10% 10.47 9.52 2.86 One issue that was not addressed in section 4.4 was how to find p. The Johansen method assumes that the vector process yt = (xt , yt ) follows a VAR(p) model. In S-PLUS, the program used for all simulations and calculations in this report, exists a built-in function called ’ar’ which fits a VAR model using Yule-Walker equations. The function determines the order of the VAR with the Akaike information criterion. This function is used for estimating p. We set the maximum value of p equal to 10 and the minimum value equal to 2 because the first step of the Johansen method is to fit a VAR(p − 1) on the differences ∆yt . The results of the Johansen test are in table 7.5. Table 7.5: Results Johansen test. pair I II III IV V VI VII VIII IX X Test 1 Test 2 Test 3 stat. 1 outcome stat. 2 outcome stat. 3 outcome 6.80 ¬ 6.12 ¬ 0.68 ¬ 154.6 1% 153.8 1% 0.82 ¬ 7.37 ¬ 6.56 ¬ 0.81 ¬ 10.89 10% 10.43 10% 0.46 ¬ 2.86 ¬ 2.79 ¬ 0.07 ¬ 6.47 ¬ 5.91 ¬ 0.56 ¬ 24.25 1% 23.10 1% 1.15 ¬ 13.84 5% 13.16 5% 0.02 ¬ 2.78 ¬ 2.68 ¬ 0.11 ¬ 14.98 5% 10.11 5% 1.98 ¬ 141 Parameters p̂ α̂ 2 1.49 2 0.997 2 0.71 2 0.52 4 5.44 3 1.08 2 1.59 2 0.52 2 2.55 2 0.72 The Johansen method is symmetric, there is no difference if we set yt = (xt , yt ) or yt = (yt , xt ). The test statistics and the estimated cointegration relations are exactly the same. We consider the stocks of the pair being cointegrated if the null hypothesis of the first and the second test are rejected and the null hypothesis of the third test is not rejected. The Johansen method finds the same pairs cointegrated as the Engle-Granger method, pair II, IV, VII, VIII and X. The levels for rejection the null hypothesis of no cointegration are the same. Only for pair X the results differ a bit, but this is because the Engle-Granger method had two different outcomes, the first test had rejected at 1% and the second test at 5%. The Johansen method has rejected pair X at 5%. There are no real differences for the cointegrated pairs, the estimated cointegrating relations are also practically the same. The biggest difference is for pair VIII, where the Engle-Granger method estimates α equal to 5.338 and the Johansen method 5.231. The two methods differ more for pairs that are not cointegrated, the differences between the estimates of α are larger. But according to these methods the pairs are not cointegrated so there does not exist an α such that yt − αxt is stationary. The ordering of the 10 pairs based on the Johansen method is 1 2 3 4 5 6 7 8 9 10 pair pair pair pair pair pair pair pair pair pair II VII VIII X IV III I VI V IX where there is evidence for cointegration for the first five pairs, and no evidence for the remaining five. This ordering differs slightly from the EngleGranger ordering. But most important is that the two methods coincide on which pairs are cointegrated and which are not. And this in turn coincides with the results from the trading strategy. 142 Chapter 8 Conclusion The goal of this project was to apply statistical techniques to find relationships between stocks. The closing prices of these stocks, dating back two years, are the only data that have been used in this analysis. From trading experience, IMC is able to make a distinction between good and bad pairs based on profits. In chapter 2 we derived a trading strategy that resembles the strategy used by IMC. From this strategy, we derived the important characteristics of a good pair. We saw that we like the price processes to be tied together such that their spread oscillates around zero and does not walk away. In this report we tried to identify pairs with cointegration. If two stocks in a pair are cointegrated, a certain linear combination of the two is stationary. This implies that this linear combination, which can be seen as the spread, is mean-reverting. This is in line with the characteristics of a good pair. In chapter 4 we introduced two methods for testing for cointegration, the Engle-Granger and the Johansen method. We have looked at the EngleGranger method in detail. This method makes use of a unit root test, the Dickey-Fuller test. Because there is a lot of ambiguity in the literature of which Dickey-Fuller test and which critical values should be used, we discussed the different cases in chapter 5. The asymptotic distributions of the test statistics were derived and the critical values for finite sample sizes were found with simulation. 143 In chapter 6 we examined the properties of the Engle-Granger method, which consists of a linear regression followed by the Dickey-Fuller test on the residuals of this regression. The main question was, which Dickey-Fuller case to use and whether the critical values of the Engle-Granger method are the same as those for this Dickey-Fuller test. We saw that case 2 was the most appropriate one for the way we want to test for cointegration, that is without a constant in the cointegrating relation. There was no indication, based on simulations, that the critical values from the Engle-Granger test differ from those of the Dickey-Fuller case 2 test. Also the power of the two tests were found similar when the assumptions of the method were fulfilled. The Engle-Granger test appeared to perform well, even when some assumptions were not fulfilled. The Engle-Granger test assumes that the residuals follow an autoregressive model. When we generated cointegrated data with residuals that are not likely to be autoregressive, the method still rejects the null hypothesis of no cointegration often. IMC has provided a selection of ten pairs that are different in quality. In chapter 7 we applied the trading strategy from chapter 2 to the historical closing prices. Based on profitability and the number of trades, we find a distinction between good and bad pairs which coincides with the distinction made by IMC. In this chapter we also tested the ten pairs for cointegration, using both the Engle-Granger as well as the Johansen method. The two methods coincide on which pairs are cointegrated and which are not. Also the estimated cointegrating relations are almost the same. All the good pairs according to the trading strategy are seen as cointegrated, according to both tests. Furthermore all bad pairs are seen as not cointegrated according to both tests. Based on the results of this project, we may conclude that cointegration is an appropriate concept to identify pairs suitable for IMC’s trading strategy. 144 Chapter 9 Alternatives & recommendations In this chapter we briefly discuss some alternative trading strategies in the first section and give some recommendations for further research in the second section. 9.1 Alternative trading strategies In this report we focused on pair trading with two stocks in a pair. Two stocks being cointegrated is easily translated in the trading strategy from chapter 2, we take the spread process as the linear combination of the two stocks corresponding to the cointegrating vector: yt − αxt . If we would take r̄ from chapter 2 as the least squares estimate instead of the average ratio, the spread process of chapter 2 would be exactly the same as the spread process found with the Engle-Granger method. That is if we use the strategy without adjustment parameter κ, i.e., κ = 0. In section 4.3 was stated that we neglect a possible constant in the cointegrating relation, α0 . In this section we will look at a trading strategy that does not neglect the constant. We also look at what can happen if we have cointegration between the logarithms of the stock prices. 145 Trading strategy with constant Consider two stock price processes, xt and yt , which have the relation yt − αxt − α0 = εt , (9.1) where εt is some stationary process. In other words, the two stocks are cointegrated with a constant in their relation. We could trade the pair y, x with ratio 1 : α and give up the cash neutral property, but another possibility is to determine the trading instances with (9.1) and trade a quantity of x such that the whole trade is cash neutral. More clearly, with (9.1) we can determine whether xt is over- or underpriced compared to yt at time t but we do not trade this relation, we trade one stock of y and yt /xt stocks of x if there was a mispricing larger than Γ at time t. Let us consider an example, let x and y be a pair with relation (9.1) where α = 2 and α0 = 20 such that spread εt looks figure 9.1. The corresponding processes for xt and yt are shown in figure 9.2. 1 0 −1 ...... ...... ...... ........ ........ ........ ......... .. .. .. .. .. .. .. ... .. .. .. .. .. ... .. ... .. .. .. .. .. ... .. . ... .. ... ... .. ... .. .... .. .. .... .. .... .. .... ... .... .. .. ... .... .. .. ... ... ... .. . . . . .. . ... . ... . ... . . . . . .. .. .. . .. . . . . .. .. .. ... .. .. .. .. .. .. .. .. .. ... ... ... ... ... .. .. .. .. ... ... ... ... ... .. .. .. .. .. .. ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... .. .. ... .. .. . . . . . . . . . . . . . ... . . . . ... ... ... .. ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... .. ... .. .. .. ... .. ... ... ... ... ... .. ... .. .. ... . ... . . . ... . ... . . . .... . . ..... ... ... ... .. . ... ... ... ... . ... ... ... .. .. ... ... .. .. .. .. .. ... ... ... ... ... ... ... .. .. .. .. ... ... .. .. ... ... ... .. ... ... ... ... .. .. .. .. . . . . . . . . . . ... . . . .. . . . .. .. .. .. . .. . . .. .. .. ... ... .. ... .. ... .. .. .. ... .. .. .. .. .. .. .. .. .. ... .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... .. .. .. .. .. . . . . . . . . . . . . . . . .. .. .. ... ... .. . ... .. ... ... ... ... ... ... .. .... ... ... .. ... ... ... .. ... .. . .. ... .. .. ... ... ... .. .. .. ... ... ... ... ... .. ....... ....... ...... ...... ...... ..... 0 100 200 300 400 500 Figure 9.1: Spread εt . For illustration purposes we took an artificial example. We have 500 observations, where we use the first half to determine the parameters of the strategy and the second half to see if the strategy works. We fit the first half of observations of y on the first half of observations of x and a constant, which results in: α̂ = 1.98 and αˆ0 = 20.29 The threshold Γ is determined in the same way as in chapter 2, but now the spread process is the residuals from this fit. For this example it turned out to be that Γ = 0.91. We apply the new strategy to the second half of 146 80 . .. ....................................... ................. ............ . ......................... ................. ................. . ..... ................................................................................................. .... . . .................... . . . . . . . . . . ........................... ...... ......... ....... ........ .................... ....................................................................................................................................... ...... 60 40 ............................................................ .. .................................................................................................................................................................................. ...................................................................... . ................. ........................................................................................................................................................................... 20 0 100 200 300 400 500 Figure 9.2: Price processes xt and yt . observations. The trades are shown in table 9.1. With the first trade we put on a position for the first time, so we made no profit yet. The second trade consists of two parts, we flatten the position from the first trade which results in a profit and we put on a new position. We always trade one stock of y and we trade yt /xt number of stocks x so it is exactly cash neutral. The actual traded spread is not shown because it is basically the same as the right half of figure 9.1. Figure 9.3 shows the spread if we do not include a constant, i.e., if we neglected α0 . trade 1 2 3 4 5 6 7 Table 9.1: Trading instances. t st position (y,x) price yt price xt 251 0.91 (-1,+2.82) 72.08 25.59 291 -0.93 (+1,-2.80) 66.87 23.90 331 0.93 (-1,+2.85) 70.20 24.63 370 -0.91 (+1,-2.73) 71.03 25.98 409 0.95 (-1,+2.74) 77.65 28.37 452 -0.94 (+1,-2.66) 77.05 29.02 487 0.93 (-1,+2.71) 79.86 29.49 total profit profit 0.44 1.27 2.99 0.07 2.37 1.55 8.69 Although the profit for each trade is not at least 2Γ, as the profit for the trading strategy from chapter 2 with constant ratio was, it is still quite profitable to trade this pair this way. Specially because the trading strategy from chapter 2 would not make any money, even if we would have used a large adjustment parameter κ. 147 4 0 −4 .... ..... ........ ... ... ............ .... ...... .. ...... ........ ....... ........ ......... ..... .. ... ........ .... ....... ............. .................. .. . . . . ..... ........... . . ........ .... ..... ... . .. .. .... .......... . ... . ........ .. . . .......... .............. ...... ... ...... . . . . . ....... . .. .. .. .... . . ..... .... .. . .... .............. ......... .... ............................. ... .... .... . . . . . . . . . .... . .. .. ..... . ... . ... .... ... ....... .. ............... .. ...... ... ...... . ........... .. ..... ................ ....... ......... . ..... ........ .. . ...... .... .... .... . ... .. ... ................ ................. ..... .... . .... ... ....... . .... ... ... . . . . ....... ... .... . .. .... ...... ... ...... ......... .......... ..... ..... ..... ....... .. ........ ... ... ......... ... .... ...... .............. ... 0 100 200 300 400 500 Figure 9.3: Spread when neglecting α0 . Although this strategy can be applied for every α0 , we still do not want α0 to be large because of the market neutral property of pair trading. If the overall market is up 50%, so x increases with 50% then we expect that y also increases with 50%. With a large α0 compared to the stock prices, this does not hold. Actually it does not hold for any α0 6= 0, but there is only a small effect when α0 is small. The value of α0 used in the example is actually too large, it is equal to the first observation of x. Which values of α0 that can be used with this strategy, should be examined further. Trading strategy for the logarithms Assume we have two stock price processes and their logarithms are cointegrated: log yt − β log xt = εt , where εt is some stationary process. Then the relation between xt and yt becomes yt = xβt eεt . (9.2) If β = 1, we can apply a trading strategy on the ratio process yt /xt instead of applying it on a spread process. An example is shown in figure 9.4, where we simulated xt according to the model in section 4.2 and generated yt such that εt follows a stationary AR(1) model. A trading strategy could be to sell one stock of y and buy one stock of x when the ratio is above 1 + Γ, and the other way around if the ratio is below 1 − Γ. Or we could trade really cash neutral, so we trade one stock of y and yt /xt number of stocks of x. 148 1.2 1.0 0.8 .. ... . . . ... ... ...... ... .... ..... ......... ..... . ....... .... ..... ....... .... .............. ........ ........ ... ............ ...... . . . ... . . . . . ...... . ... ....... . . . ... .... . . . . . . .. ... . . ..... .... .... ...... ... ..... .. ... ......... . .. ... ....... ... .... ..... ....... .... .. .. .... .... .. ...... ....... .. ..... ....... .. ... ... ... ............... ........... .... .......... . .. ... .... ..................... ... .............. .... .... ... ........... ....... ..... . ... .. ...... .............. ... .. .................. ... .. ..... ..... ........................... ..... ........... .................................. .... ........ ... .... . ...................... ........... ....... .......... .... ....... ......... .... . . . . . ................. ....... .... ....... . . . . . . . . . . . .. . .... ... .. .................. ....... ............. ....... ... ...... ... ... ... .... ... .. .. ... ..... ...... ... ...... ...... ... ........ .... .. ..... .... .... .. ... .. ...... ..... .. .. .... .. ..... ... ..... . ........... ..... ... ... ....... ....... ...... ....................... .. .. ....... ... .. ...... ....... ...... ...... ..... ....... ........ ..... ........ ......... ........................ ... ....... . . . .. .......... .... ........ ............. ..... ... ............ ....... .. ... .. ....... .. ....... .......... ........................... ..... ... .. ........... ..... .......... ....... ........ ....... .. .. .. ... .. .............. ..... ............... .......................... .. ....... ..... . ........ ....................... .... .... ............ ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... .... .. . ..... ...... .. ..... .. ... ....... .......... ... .... . . ...... ...... ..... .. ..... ... ........... ..... .. ... .... .. ...... ... .......... ...... ... .. . .......... .... ... ........ .. ..... .. .. ... ..... ... ... .... .. .. .... .. ........ .... ......... ..... . ....... ..... ........ ...... ... ........ ............... ....... ..... ...... .... .... ... ..... . ........... ......... ... ... ..... ... .................. .... .... .. .... ... ..... .. ...... ...... .... ... . . . ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... .... .......... .. .. ..................... .... . ........ ...... .... ........ . ..... .... ..... ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... ...... ... ......... .... . ...... ....... ..... ..... ...... ................. .. . . . . . . . . . . . . . . . . . ........ . . . . . . .... ... ...... ... . . ..... .... . ... .. .. . . . . ...... . .. .... ....... . .... 0 100 200 300 400 500 Figure 9.4: Ratio process yt /xt . When β 6= 1, it is not that simple anymore. It is valid to say that when β 6= 1, β > 1. Because if it is not, we take yt to be xt and vice versa. Then we get the same problem as with a large α0 , the relation is not market neutral. If x increases with 50%, y increases more than 50% according to the relation (9.2). And if this happens when we have a long position in x and a short position in y, the profit in x cannot compensate the loss in y, so we lose money. Maybe this can be prevented for values of β close to 1, by adjusting the ratio in which we trade x and y but this should be examined further. In this report we have only discussed trading strategies that trade one line, we put on a position when the spread reaches ±Γ and wait till the spread reaches Γ in the other direction. But it is very interesting to trade more lines. For example, if the spread reaches +Γ1 , we put on a short position in y and a long position in x. If the spread increases further and reaches Γ2 we enlarge our short position in y and our long position in x. A trading strategy could be to trade the same amounts at each threshold, which are equally spaced, Γ2 = 2Γ1 . Figure 9.5 illustrates this idea. To make this more clear, table 9.2 shows the trading instances for this strategy with two lines when we trade x and y in the ratio 1:1. 149 4 2 0 −2 −4 ... ..... .... . ... . .. ............. . ....... . . . . . ... . .. .. . ... .. .. ... . . . . . . . . . . ......... .... ............... .. ..... ..... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... .... . . ... .. .... .. ... .. .. .......... . . ... ....... ... .. .... .. ... ... ... ...... .. .. ... ... ... ......... ....... .. ..... ... .. .... ... ..... ... . . . . . . . . . .. .. ... ... ..... ........... ..... ........ ... .. ... ... ........ ... ...... ......... ... .. ... .. ..................... ............... ...... . . . . . . . ....... ....... .............. ............... ....... ....... ....... ....... ......... ....... ....... ................ ....................... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......... ........... ....... ... ... ... .. .. . . . ...... ... . ........... . . . . . . . ... ... .... .. . .... .... ... .. . . . . . .............. ........ ......... . . ....... ... ... .... ...... . ....... .. ........ ........ ........... ... .. .. ........ .... ... .............. ............. ......... ........ ........... ........ .... ... ... . .......... . . .......... . .... ..... .... .... . .. . .. ... ..... ... ... .... .... ........ . ... ... ......... ........ ... .......... .. .. ...... ... ... .... ............... .... . ..... .. . . . . ................................................................................................................................................................................................................................................................................................................................................................................. ..... ....... ....... .. .. ...... . . ... ...... . . . . . . . . . .. .. ... .. ..... . .. ... . . . . . . . . . . . ......... . ....... .... ... ... .. .... .. .... ...... ..... ... ... ....... . ......... .................... . . . . . . . . . . . . .... ..... .. ... . ....... . .. .. .... ...... ........ ... .. ... ........ .. ..... ... .......... . . ... . .... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ............ ............ .......... ....... .......... ............... ....... ....... ....... ....... ....... ....... ....... ....... ... .. .. ....... .......... ... ... ....... ... .. .. ... ... ... ... ... .. .. ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. . . . . . . . . . . . . . . . . . . ..... . 0 100 200 300 Figure 9.5: A 2 line strategy. This strategy can be easily extended to more lines, it can even be easily extended to three or more stocks in a pair. How to choose the number of lines, the thresholds and the corresponding amount of stocks is very interesting to examine further. 9.2 Recommendations for further research It would be nice to develop the alternative method from section 4.5 further, such that we have a new method for testing for cointegration. In order to do so, we need an accurate algorithm for the estimation of the parameters of the MA(q) model. In this report we used closing prices. It is interesting to apply the trading strategies and cointegration tests to intra-day data because we trade during the day. This is specially interesting if we have a trading strategy with a large number of lines with the thresholds close to each other. We could cut the cointegration test into several pieces. Suppose we have datasets containing four years of closing prices, than we could perform three tests on two years of data with an overlap of one year. More clearly, the first test is on the first and second year, the second test is on the second and 150 Table 9.2: trade t 1 26 2 56 3 97 4 152 5 158 6 199 7 206 8 221 9 284 10 289 11 297 12 306 Trading instances. st position (y,x) 2.12 (-1,+1) 4.22 (-2,+2) 1.94 (-1,+1) -0.01 flat -2.11 (+1,-1) -4.20 (+2,-2) -1.89 (+1,-1) 0.13 flat 2.18 (-1,+1) 4.06 (-2,+2) 1.92 (-1,+1) -0.13 flat third year and the third test is on the third and fourth year. Then we can see if the stocks are cointegrated on each time interval and if the cointegrating relation changes. This could be very helpful to determine a good adjustment parameter κ. There exists several representations for cointegrated processes, one is the VAR representation we saw briefly with the Johansen method in section 4.4. It would be interesting to see if it is possible to use one of the representations to build a monitoring system; a set of confidence intervals to see if the spread behaves according to the model and attach certain actions when the intervals are exceeded. For example, if the first confidence interval is exceeded we stop with enlarging our positions, if the second interval is exceeded we revert a part of our positions with a loss and if the third interval is exceeded we close out our entire positions and stop seeing the stocks as a pair. 151 152 Bibliography [1] C. ALEXANDER. Market Models. John Wiley & Sons, 2001. [2] P.J. BROCKWELL and R.A. DAVIS. Introduction to time series and forecasting. Springer-Verlag, 2002. [3] P.J. BROCKWELL and R.A. DAVIS. Time Series: Theory and Methods. 1987, Springer-Verlag. [4] D.A. DICKEY and W.A. FULLER. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74:427–431, 1979. [5] R.F. ENGLE and C.W.J. GRANGER. Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2):251– 276, 1987. [6] J.D. HAMILTON. Time Series Analysis. Princeton University Press, 1994. [7] D.J. HIGHAM. An introduction to financial option valuation. Cambridge University Press, 2004. [8] S. JOHANSEN. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12:231–254, 1988. [9] S. JOHANSEN. Estimation and hypothesis testing of cointegration vectors in guassian vector autoregressive models. Econometrica, 59:1551– 1580, 1991. [10] S. JOHANSEN and K. JUSELIUS. Maximum likelihood estimation and inference of cointegration - with application to the demand for money. Oxford Bulletin of Economics and Statistics, 52:208, 1990. 153 [11] M. OSTERWALD-LENUM. A note with quantiles of the asymptotic distribution of the maximum likelihood cointegration rank test statistics. Oxford Bulletin of Economics and Statistics, 54:462, 1992. [12] P.C.B. PHILIPS and S.N. DURLAUF. Multiple time series regression with integrated processes. Review of Economic Studies, 53:473–495, 1986. [13] P.C.B. PHILIPS and S. OULIARIS. Asymptotic properties of residual based tests for cointegration. Econometrica, 58(1):165–193, 1990. [14] J.H. STOCK and M.W. WATSON. Testing for common trends. Journal of the American Statistical Association, 83(404):1097–1107, 1988. [15] G. VIDYAMURTHY. Pairs Trading. John Wiley & Sons, 2004. 154