Data Science and Finance

Mayank Kale
OverPotential
Published in
3 min readJul 10, 2023

Problem Statement

• Let’s say you have a pair of securities X and Y that have some underlying economic link, for example two companies that manufacture the same product like Pepsi and Coca Cola. You expect the ratio or difference in prices (also called the spread) of these two to remain constant with time. However, from time to time, there might be a divergence in the spread between these two pairs caused by temporary supply/demand changes, large buy/sell orders for one security, reaction for important news about one of the companies etc. In this scenario, one stock moves up while the other moves down relative to each other. If you expect this divergence to revert back to normal with time, you can make a pairs trade.

• When there is a temporary divergence, the pairs trade would be to sell the outperforming stock (the stock that moved up )and to buy the underperforming stock (the stock that moved down ). You are making a bet that the spread between the two stocks would eventually converge by either the outperforming stock moving back down or the underperforming stock moving back up or both — your trade will make money in all of these scenarios. If both the stocks move up or move down together without changing the spread between them, you don’t make or lose any money.

• Hence, pairs trading is a market neutral trading strategy enabling traders to profit from virtually any market conditions: uptrend, downtrend, or sideways movement.

2 Dataset

Find some securities that display cointegration from yahoo finance data in Python. Let’s work with a basket of US large-cap tech stocks — in S&P 500. These stocks operate in a similar segment and could have cointegrated prices. We scan through a list of securities and test for cointegration between all pairs. It returns a cointegration test score matrix, a p-value matrix, and any pairs for which the p-value was less than 0.05.

3 Methodology

Because two cointegrated time series (such as X and Y above) drift towards and apart from each other, there will be times when the spread is high and times when the spread is low. We make a pairs trade by buying one security and selling another. This way, if both securities go down together or go up together, we neither make nor lose money — we are market neutral.

Going back to X and Y above that follow Y = ⍺ X + e, such that ratio (Y/X) moves around it’s mean value ⍺, we make money on the ratio of the two reverting to the mean. In order to do this we’ll watch for when X and Y are far apart, i.e ⍺ is too high or too low:

Going Long the Ratio This is when the ratio ⍺ is smaller than usual and we expect it to increase. In the above example, we place a bet on this by buying Y and selling X.

Going Short the Ratio This is when the ratio ⍺ is large and we expect it to become smaller. In the above example, we place a bet on this by selling Y and buying X.

Note that we always have a “hedged position”: a short position makes money if the security sold loses value, and a long position will make money if a security gains value, so we’re immune to overall market movement. We only make or lose money if securities X and Y move relative to each other.

What could relevant features be? We want to predict the direction of ratio move. We’ve seen that our two securities are cointegrated so the ratio tends to move around and revert back to the mean. It seems our features should be certain measures for the mean of the ratio, the divergence of the current value from the mean to be able to generate our trading signal.

The following features were used:

60-day Moving Average of Ratio: Measure of rolling mean

5-day Moving Average of Ratio: Measure of the current value of mean

60-day Standard Deviation

z score: (5d MA — 60d MA) /60d SD

--

--

Mayank Kale
OverPotential

Engineer and Analyst. reach me at noteefy7@gmail.com for accessing high quality study materials for cracking Analytics interviews