Sector-Based Pairs Trading with Python

Published in

The Financial Journal

8 min readDec 13, 2022

A new, profitable approach to an under-rated trading strategy.

For this strategy, all you need to get started is a Python IDE (preferably Spyder) and access to the QuantGlobal API. You can quickly authenticate yourself with the API here.

To understand how this strategy works, we first need to get through some background theoretical knowledge. First, let’s start with what pairs trading even is.

Background

Pairs Trading is a market-neutral trading strategy that allows the trader to profit from the divergence in returns of correlated assets. The most familiar example of this is the Coca-Cola & Pepsi pairs trade. Both Coca-Cola & Pepsi operate in the same industry and are exposed to similar risks/upsides, so generally, the performance of these 2 stocks tend to be similar.

When the performance of these 2 stocks diverge by a significant amount (e.g. Coca-Cola stock goes up 5%, Pepsi only goes up by 1%), the trader profits by buying the underperforming shares (in the above example, this would be Pepsi since the % return is lower than that of Coca-Cola) and they would go short the overperforming shares (in this case, Coca-Cola since the % return is higher).

A profit is realized when the spread between the two stocks decreases. It isn’t necessary for both legs to converge in opposing directions, often times profit is made when one leg of the trade “catches up” and closes the spread. In that scenario, you would lose money on one leg, but you would make more than you lost through the leg that “catches up”.

That is the meat and bones of how pairs trading works, but this strategy has been used/studied for decades by now, so while opportunities in stocks like Coke and Pepsi exist, they aren’t significant enough to be worth the effort. However, there are a few other areas of the market where this strategy can be extraordinarily effective.

Sector-Based Pairs Trading

Pairs trading works because the stocks involved often have valid economic reasons for being correlated (e.g., similar industry, different securities for the same underlying company, similar macroeconomic exposure, etc.). As you expand the scope of stocks looked at, you will notice that while the correlations remain strong, they become imperfect — but this is where the largest opportunities lie.

Let’s look at this from a sector-based perspective. We first ask the question, how correlated are the top 10 stocks of the Technology sector (by market cap)? To answer that, we pulled about a year of data and ran a correlation matrix for those stocks and this was the output:

Values represent correlations; In the given data set, (11/21–12/22) AAPL had a correlation to MSFT of ~87%.

As demonstrated, the top stocks of the given sector are strongly, positively correlated. Does this relationship work for other sectors? Let’s try the same thing, but for the Financial sector:

While the correlations aren’t as high as the Technology sector, it is clear that there are still strong, positive correlations between the 10 largest components.

Implementation

Since we know that the top n<10 stocks of a given sector will have strong correlations, we can use it as the basis for a pairs trading strategy. This strategy will take the top 10 stocks of a sector, then it will split the basket in 2. Considering the high correlation, the performance of these 2 baskets should be very similar, so when the performance of the baskets diverge (the “spread”), we can buy the underperforming basket, then short the overperforming basket.

Thankfully, the QuantGlobal API handles the bulk work of the calculations and we can get our universe with just a few lines of code.

For this example, we’ll use the Financials sector:

# First, import the necessary packages
import pandas as pd
import QuantGlobal as qg

# This variable will store the underlying basket data which includes
# the ticker symbols, prices, and performance.

underlying_data = qg.download(key = "authenticated_user@email.com",
                             strategy = 'pt_extended',
                             underlying = 'financials',
                             from_date = '2022-11-29',
                             end_date = '2022-11-30')


# This variable will store the spread index values to act as a trade signal.

index_data = qg.download(key = 'authenticated_user@email.com',
                         strategy = 'pt',
                         underlying = 'financials',
                         from_date = '2022-11-29',
                         end_date = '2022-11-30')

The underlying_data variable will return a dataframe structured as follows:

The index_data variable will return a dataframe structured as follows:

Trading

Now that the data is stored, we can focus on building the actual strategy that will be traded.

We will put on a pairs trade when the index spread increases above our designated threshold. Just to clarify, the spread represents the absolute difference between the two equal-weighted baskets. To better understand this, consider the below scenario:

Basket A: BRK/B, V, JPM, MA, BAC
Basket B: WFC, MS, SCHW, GS, HSBC
Both baskets are equal-weighted
If Basket A returns 2%, but Basket B returns 1%; the index spread will be 1 (abs(Basket A Performance — Basket B performance)).
If we buy Basket B and short Basket A; we make money when the index spread drops to a number lower than 1 (or whatever the number was when we entered the trade)

It might be a bit tricky to navigate buying/selling 10 different stocks at once, so for this example, we’ll trade using just 2 stocks out of the 10. The two stocks will be the highest performing share and the lowest performing share. This pair will have the largest effect on the spread as they are the “outliers”, and as such, they offer the largest profit opportunity.

Every broker’s API looks different, but here is the general code flow of how we would put on this trade:

# When the index spread crosses to/above this level, we want to put on a trade

opening_threshold = 0.75

# When the spread crosses to/below this level, we close the trade

closing_threshold = 0.25

# We check the most recent value of the spread to see if it is at the threshold

if index_data['Spread'].iloc[-1] >= opening_threshold:

  # If the index is indeed above/at our threshold, we get the symbols we need.
  # This line gets the ticker with the smallest/largest cumulative return

    most_recent_underlying_data = underlying_data[underlying_data.index == underlying_data.index[-1]]

    lowest_performer = most_recent_underlying_data['Ticker'][most_recent_underlying_data['Cumulative Returns'] == most_recent_underlying_data['Cumulative Returns'].min()].iloc[0]
    highest_performer = most_recent_underlying_data['Ticker'][most_recent_underlying_data['Cumulative Returns'] == most_recent_underlying_data['Cumulative Returns'].max()].iloc[0]

  # Now, we submit the pairs trade order

  # "broker_api" is a placeholder for the api used to submit your orders
  # Any broker capable of buying/selling stocks is compatible with this strategy.

    long_order = broker_api.buy(lowest_performer)
    short_order = broker_api.short(highest_performer)
else:

  # If the most recent value isn't above our threshold, do nothing.
  pass

Using the logic above, the first condition would be triggered at around 11:21 on that day:

The output for the index_data variable at 11:21

The output for the underlying_data variable at 11:21

Considering that the index is above the threshold, the script would buy the weakest performing share (V) and sell-short the strongest performing share (MS).

Now that the position is on, the program needs to wait for the index to duck back down below/to the closing threshold. Again, each broker’s API works differently, so here’s just a boilerplate version of how that would flow:

# When the spread crosses to/below this level, we close the trade

closing_threshold = 0.25

# We check the most recent value of the spread to see if it is at the threshold

if index_data['Spread'].iloc[-1] <= closing_threshold:

  # If the index is indeed below/at our threshold, we close the open orders.
  
  close_long_position = broker_api.sell(lowest_performer)
  close_short_position = broker_api.buy(highest_performer)

else:

  # If the most recent value isn't below/at our threshold, do nothing.
  pass

The closing condition would be triggered at 3:07 on that day:

The output for the index_data variable at 3:07

The output for the underlying_data variable at 3:07

As demonstrated, this trade yielded a profit of ~1.72%. You can intuitively calculate PnL by tracking the Cumulative Returns column. In this example, we bought V at 98.88, then sold it for 99.27. Since the index starts each day at 100 and tracks returns, this means a 0.39% profit on the V leg. Then, we were short MS at 101.04, and bought it back at 99.71 for a 1.33% profit. So when tallied, the net profit on the position was 1.72% (0.39% + 1.33%).

Final Thoughts

While this was just one implementation of the strategy, there is essentially an infinite degree of flexibility in how it can be further optimized. For example, while trading all 10 stocks in the baskets may be difficult to mentally manage, it can significantly reduce the overall volatility of the strategy. Trading the full baskets also allows for maximal scalability. Another possible optimization of this is continuing the pattern of taking just the 2 outlier stocks, but instead of just those in the financials sector, how about 2 from all the sectors?

I recommend backtesting some of these options and seeing what kind of combinations yield the most profit. The API provides unlimited access to live and historical data of these strategies, so the barrier to get started is low. Feel free to share what you find in the comment section. If you’d like to get in touch to discuss implementation/feedback, visit me at The Quant’s Playbook! :)

Data Source: Quantitative Global

Happy trading!