Testing the Validity of Arbitrage Pricing Theorem

Published in

MagniData

12 min readNov 11, 2019

Introduction

Data science has a wide application area in finance. Recently, the tools of machine learning and data science has been gaining increasing attention mainly because of the ability to address current financial issues. Data science can tackle the following issues via newly designed approaches:

Automatization in Risk Management
Predictive Analysis
Algorithmic Trading
Asset Pricing
Fraud Detection
Tailorized Customers Solutions

In this post, I aim at checking the validity of the Arbitrage Pricing Theory (APT). Before moving forward, I would like to talk about risk and return relationship.

After Markowitz’s, work, risk-return relationship becomes clearer in that he basically formulate risk with standard deviation. That is to say, a stock with higher standard deviation should higher return in exchange for the risk assumed by the investors.

Based on this fact, efficient portfolios with different combinations of expected returns and standard deviation (or expected volatility) can be found. Each point lying on the line represents an optimal combination of stocks that maximises the expected return for the given level of risk.

This graph shows that if an investor choose a portfolio with highest returns and lowest volatility, she will be on the positively-sloped boundary line of this graph. From this point on, APT makes sense.

Theoretical Background of the APT

APT is used to identify the causes of stock returns and has acknowledged in finance circle.The Arbitrage Pricing Theory (APT) is mainly based on the assumption that the returns of assets can be estimated through the asset and many common risk factors. The Arbitrage Pricing Theory was proposed by Stephen Ross (1976) and predicts the relationship between the return of the portfolio and the return of a single asset through several independent macroeconomic variables. It was developed mainly as a result of deficiencies of the celebrated Capital Asset Pricing Model (CAPM).

The APT considers the systematic factors that affect the average values of long-term returns. This method does not ignore individual assets, but rather focuses on assets in portfolios. These factors play a key role in estimating portfolio returns. The ultimate goal in this model is to improve portfolio performance by better understanding portfolio creation and evaluation.

Arbitrage Pricing Model model can be defined mathematically as follows:

The Arbitrage Pricing Theory is based on the positive relationship between performance and risk, which is one of the basic assumptions of modern financial theory. In addition, the return of assets depends on the several factors ranging from sector and market-related ones. These factors include macroeconomic variables such as GDP, interest rates, inflation and unemployment. In APT, the return on the asset is the sum of the risk-free interest rate and the risks of the aforementioned factors.

Following Stephen Ross and his colleagues, some variables that can be employed in APT analysis identified. These are:
* Rate of inflation
* Growth rate in industrial production
* Spread between long-term and short-term interest rate
* Spread between high-grade and low-grade bond
* Rate of interest
* Rate of change in oil

In the application part, “pandas”, “matplotlib”, “seaborn”, “yfinance”, “fredapi”, and “quandl” libraries are used. “yfinance”, “fredapi”, and “quandl” libraries are used to collect data directly from the database.

Application

Here, in this post, rate of inflation, rate of change in oil as well as GDP per capita growth are used to detect the relationship between excess stock return and listed macroeconomic variables. I apply APT analysis on leading IT firms namely Apple, Microsoft, and Intel Corporation. So, in a nutshell, this study turns out to estimate the macroeconomic determinants of excess stock return of IT companies. Due to the dictation of the GDP data revealed quarterly, the study is restricted to quarterly data. The period covered is 2010/03–2018/12.

Let’s begin the empirical part. First step, as always, is to import the necessary Python libraries.

import pandas as pd
import matplotlib.pyplot as plt
import datetime
import seaborn as sns
import warnings
warnings.filterwarnings(‘ignore’)

At this point, I start gathering data to be used in the analysis. To do that, first option is to go and visit related database website. But if you do that you need to download the data store in your computer and import it here. After these, some cleaning steps need to be taken to use the data.

Instead, I directly access to `yahoo finance` to collect the data. To do that, the required libraries are `pandas_datareader` and `yfinance`.

from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()

`yfinance` code allows us to collect multiple stock at once. From 2010/03 to 2018/12, I collect the stock price of Apple, Microsoft, and Intel Corporation.

stocks = [‘AAPL’,’MSFT’,’INTC’]
start = datetime.datetime(2010,1,1)
end = datetime.datetime(2019,1,9)
stock_prices = yf.download(stocks,start=start,end = end, interval=’3mo’)
stock_prices.head()

As you see, adjusted close, close, high, low, and open prices can be used in modeling but I prefer to use close price due to certain advantages that are out of scope of this study. As I use quarterly data, `NaN` values appears which needs to be handled and the following code is to fulfill this task.

stock_prices=stock_prices[‘Close’]
stock_prices.dropna(inplace=True)
stock_prices.head(10)

Now, let’s examine the main characteristics of the stock prices of Apple, Intel, and Microsoft. Accordingly, `mean` tells us that stock price of Apple is the highest among 3 companies and, not surprisingly, it has the most volatility indicating that Apple may be the most risky stock. Additionally, the difference between minimum and maximum value imply there may be a slightly higher skewness compared to Intel and Microsoft.

stock_prices.describe()

Visualization helps us to better understand the characteristics of the data. So, to this respect, below is given the line plots of the three stocks.

fig = plt.figure()
ax1 = fig.add_axes([0.1,0.1,0.8,0.8])
ax1.plot(stock_prices.AAPL,label=’Apple’)
ax1.plot(stock_prices.INTC,label=’Intel’)
ax1.plot(stock_prices.MSFT,label=’Microsoft’)
ax1.set_xlabel(“Date”)
ax1.set_ylabel(“Stock Price”)
ax1.set_title(“Stock Prices”)
plt.legend()
plt.show()

All three stocks have similar upward trend with different pace. In 2010, stocks prices of the companies are close to each other. From this year on, Apple diverges from the other two companies. 2018 was a record-setting year for stocks, but it’s one investors would rather forget in that most of the IT companies reached their peaks and then plunged. To CNN Business, it was the worst year for stocks since 2008 and only the second year the Dow and S&P 500 fell in the past decade.

As the APT requires returns as input, it is time to calculate the inputs of these three stocks. Of different techniques, I prefer to use ‘.pct_change()’ due to its easiness.

returns=stock_prices.pct_change()
returns.dropna(inplace=True)
returns.head()

k=1
for i in range(0, len(returns.columns)):
 plt.subplot(3, 1, k)
 plt.hist(returns[returns.columns[i]])
 plt.title(“{}”.format(returns.columns[i]))
 k+=1
 plt.tight_layout()
 plt.show()

Histogram gives clue about the distribution of the stock and existence of outliers, if any. Without going into further detail, by eyeballing, Apple looks closer to the normal distribution.

Correlation Analysis

Correlation is of considerable importance before running the analysis, in particular, you are modeling a multivariate case. Heatmap, used to detect the correlation among variables, is presented below and exhibits that the highest correlation of 0.58 means that Microsoft and Intel has a considerable correlation. However, the correlation coefficiet of 0.27 implies that Apple and Intel has low level of correlation.

heat_corr=returns.corr()
sns.heatmap(heat_corr, annot=True)
plt.title(“Correlation Matrix”)
plt.show()

Collecting the Macroeconomic Variables

At this point, I start collecting macroeconomic variables to be used in APT. These are: Quarterly GDP per capita growth, quarterly inflation growth, and quarterly oil price growth. I use two main database which are:
* FRED
* QUANDL

I again directly access FRED, economic database maintained by the St. Louis Federal Reserve, simply by using Python API for FRED. First step is to install API for FRED. Here is the code:

!pip install fredapi

As a reminder, you have to have a API key to download data from FRED.

from fredapi import Fred
fred = Fred(api_key=’insert your API here')

Let’s make a quick search by “risk free” keyword to find best-fit risk free rate.

fred.search(‘risk free’)

As a proxy of risk free rate 3-Month Treasury Constant Maturity Rate is used. This rate is widely used in financial analysis as a proxy of risk free rate because government-issued paper is theoretically considered as riskless.

So, the thing here is to find our data from a large set of data pool. In order to locate the 3-Month Treasury Constant Maturity Rate, I use `fred.search` code which make it possible to list all datasets in which `risk free` keyword exists.

If you go and see the 3-Month Treasury Constant Maturity Rate, you will realize that series id is GS3M which is required to call the data.

risk_free=fred.get_series(‘GS3M’)
risk_free=risk_free[‘2010–01–01’:’2018–12–01']
rf=risk_free/100

After conducting some preliminary actions, I list first five rows of the risk free rate data.

rf.head(5)

For the sake of simplicity, in order to convert monthly interest rate to quarterly interest rate, I basically `resample` the data, take the mean and multiply by 3.

rf=rf.resample(‘Q’).mean()*3

Below is given visualization, 3-Month Treasury rate was rather stabile roughly during 2010–2015. But from this year on, interest rate was started to raise sharply and reached at 2.5% towards the end of 2018.

plt.plot(risk_free)
plt.xlabel(‘Date’)
plt.ylabel(‘%’)
plt.title(‘3-Month Treasury Constant Maturity Rate’)

So, we have the risk free rate. Let’s move on and gather GDP growth, inflation growth, and oil return data.

GDP

Note that this is the same as simply calling `get_series()`

gdp=fred.get_series(‘GDP’)
gdp=gdp[‘2010–01–01’:’2019–01–01']
gdp.tail()

After collecting GDP data, it is time to calculate the GDP growth.

gdp_growth=gdp.pct_change().dropna()

Line plot exhibits that GDP growth rate oscillates around 0.01 indicating that GDP growth rate in USA is rather stabile and never goes down to below zero during the investigation period.

plt.plot(gdp_growth)
plt.ylabel(‘GDP Growth Rate’)
plt.xlabel(‘Date’)
plt.title(‘GDP Growth Rate, 2010–2019’)

Inflation

fred.search(‘potential inflation’)
inf=fred.get_series(‘CPIEALL’)
inf=inf[‘2009–12–01’:’2018–12–01']

As this data is an index with 1982=100, it became 232.7 in 2009 and 273.8 in 2018 meaning that inflation rate goes up at a moderate pace.

print(inf.head())
print(inf.tail())

I have monthly inflation data here but I need to have quarterly data. So, a quick convertion makes it happen.

inf_quarterly=inf.resample(‘Q’).mean()
inf_growth=inf_quarterly.pct_change().dropna()
inf_growth.head()

Visualization of inflation growth rate confirms my initial observation about slowly raising inflation. According to the plot below, aside from the period of 2014–2016, the growth rate of inflation has been steadily remained on positive side.

plt.plot(inf_growth)
plt.ylabel(‘Inflation Growth Rate’)
plt.xlabel(‘Date’)
plt.title(‘Inflation Growth Rate, 2010–2019’)

Oil Price

!pip install quandl
import quandl
oil=quandl.get(“ODA/POILBRE_USD”, authtoken=”insert your API here”, start_date=”2009–12–01", end_date=”2019–01–01")

Likewise, convertion from monthly to quarterly is needed here.

oil_quarterly=oil.resample(‘Q’).mean()

Oil price shows considerable instability. Oil price, for instance, has its peak at 118 USD in 2012 but it plunged and dropped to 34 USD in 2016 following the short-term restriction on oil supply. As is seen, there exists wild volatility in the oil price in this study period.

I have quarterly oil data but, as in other variable, I work with growth rate. So, “pct_change” code is applied here.

oil_return=oil_quarterly.pct_change().dropna()
oil_return.head()

High volatility in oil price can be observed simply by looking at the oil return during 2010–2018. To me, the most striking observation is oil price dropped and increase by nearly 30% in mid-2015 and mid-2016, respectively.

plt.plot(oil_return)
plt.ylabel(‘Oil Return’)
plt.xlabel(‘Date’)
plt.title(‘Oil Return, 2010–2019’)

Now, in order to merge all the data to be used in APT analysis, I need to be some data cleaning. First step is to play with the index and column names. Then, putting all data in a single dataframe which makes easier to process.

returns=returns.reset_index()
returns.drop('Date',axis=1,inplace=True)

I do some manipulation on “risk free” data before merging.

rf=pd.DataFrame(rf)
rf=rf.reset_index()
rf.drop(‘index’,axis=1,inplace=True)
rf.columns=[‘rf’]
returns['rf']=pd.DataFrame(rf)

Now, I form a single dataframe including stock returns and risk free rate. Using below-given code, I am able to calculate the excess returns of every stocks to be used as dependent variables.

stocks=[“AAPL”,”INTC”,”MSFT”]
for i in stocks:
 returns[“excess_return_”+str(i)]=returns[i]-returns.rf

Let’s check what it looks like.

returns.head(5)

The same procedure applies to the macroeconomic variables as well.

gdp_growth=gdp_growth.reset_index()
inf_growth=inf_growth.reset_index()
oil_return=oil_return.reset_index()
gdp_growth.drop('index',axis=1,inplace=True)
inf_growth.drop('index',axis=1,inplace=True)
oil_return.drop('Date',axis=1,inplace=True)
gdp_growth.columns=["gdp_growth"]
inf_growth.columns=["inf_growth"]
oil_return.columns=["oil_return"]

Apple, Intel, and Microsoft are examined separetaly. Therefore, I have to have three different data which is labeled as “data1”,”data2", and “data3”. Macroeconomic variables are the same in all data. The only difference is the stocks, That is to say, “data1” includes excess return of Apple, “data2” includes excess return of Intel, “data3” includes excess return of Microsoft.

data1=pd.concat([gdp_growth, inf_growth,oil_return,excess_return_aapl], axis=1)
data2=pd.concat([gdp_growth, inf_growth,oil_return,excess_return_intc], axis=1)
data3=pd.concat([gdp_growth, inf_growth,oil_return,excess_return_msft], axis=1)

As is seen below, I have three macroeconomic variables and excess return of apple in data1.

data1.head()

For the sake of simplicity, let’s play with the column names.

data1[‘excess_return’]=returns.excess_return_AAPL
data2[‘excess_return’]=returns.excess_return_INTC
data3[‘excess_return’]=returns.excess_return_MSFT

Finally, I am ready to empirically test the validity of APT by simply using some main US macroeconomic variables. To do that, I employ “statsmodels” library

import statsmodels.formula.api as smfmodels = []
for i in (data1,data2,data3):
    formula =  "excess_return~ gdp_growth+inf_growth+oil_return"
    models.append(smf.OLS.from_formula(formula, data = i).fit())

First model output shows the association between excess return of Apple and pre-defined macroeconomic variables. Accordingly, the estimated coefficient of the model is not statistically significant indicating that these macroeconomic variables do not account for the excess return of Apple. More specifically, by looking at p-values, I conclude that the estimated coefficients of gdp growth, inflation growth, and oil return is not statistically significant at conventional levels.

models[0].summary()

This model presents the nexus between excess return of Intel and the same macroecomic variables provided below. Differently from the Apple case, this analysis confirms that there is a statistically significant relationship between excess return of Intel and gdp growth of US at 10% level (p-value is 0.093). To be interpret, one unit increase in gdp growth boosts excess return on Intel by nearly 8.86. So, there is a positive relationship between gdp growth and excess return of Intel. However, no relation is detected for the rest of the variables.

models[1].summary()

The regression result of Microsoft is given below but the result is similar to what I have in Apple case in terms of statistical significance. Generally speaking, p-values are less that 0.1 indicating that the estimated coefficients are not statistically significant. Thus, as in Apple, macroeconomic variables cannot account for the excess return of Microsoft.

models[2].summary()

Similar to the output of Apple case,given above, none of the estimated coefficients are statistically significant at conventional levels. Generally speaking, p-values are less that 0.1 indicating that the estimated coefficients are not statistically significant. Thus, as in Apple, macroeconomic variables cannot account for the excess return of Microsoft.

Last Words

The results of the models suggest that APT cannot be validated by using US macroeconomic variables except for the GDP growth in Intel case. As a final note, the variance in dependent variable explained by independent variable is quite low. This also implies that model has very weak explanatory power and this needs to be improved. There are some ways to do that but two main possible modification may be:

Using additional variables which might be more related to excess stock return
Using a different model
Testing in a different time period and country