Pearson’s Correlation, Linear Regression, And Why ‘Beta’ Grossly Underestimates Portfolio Sensitivity To Market Returns
This is the first part of a two-parts post illustrating the practical importance of accounting for both nonlinearities and temporal dependencies when assessing portfolio risk, which the widely adopted (Pearson’s) correlation coefficient fails to do.
In this post we provide a basic introduction to Pearson’s correlation, its relation to linear regression and portfolio beta, and its limitations as far as measuring dependence between assets is concerned.
In Part II, we provide empirical evidence that the i.i.d. Gaussian assumption for asset returns does not hold for U.S. stocks and futures, and we present an alternative to Pearson’s correlation, namely the information-adjusted correlation, which measures the association between time series (not random variables), while fully capturing nonlinearities and, more importantly, temporal structures. We then use information-adjusted correlation to construct an information-theoretic alternative to the (CAPM’s) beta of a portfolio, which we call information-adjusted beta.
What Is Correlation?
Broadly speaking, correlation (or co-relation) is any measure of association or dependence between two phenomena, each characterized by a number.
In practice, phenomena of interest are typically too complex to be characterized by precise invariant deterministic laws such as the laws of physics. Instead, we settle for expressing our lack of a granular understanding as randomness, and we turn to the branch of mathematics that is probability theory to study ensemble properties of our phenomena. One such ensemble property is the association between two random phenomena.
Throughout the rest of this post, we consider that every random phenomenon of interest manifests itself through a single number, which we denote by its associated random variable, and we will denote by correlation between two random variables the correlation between associated random phenomena.
As you would expect, numerous ways of measuring the correlation between two random variables have been proposed throughout the years.
Pearson’s correlation coefficient, by far the most popular measure of correlation, is a number between -1 and 1 that reflects the propensity for two random phenomena to have a linear association. That is, Pearson’s correlation measures the extent to which, if we were to plot observations from one random variable against those from the other in a scatter plot, the plot would look like a straight line.
The closer Pearson’s correlation is to -1 or 1 (resp. 0), the stronger (resp. weaker) the evidence of linear association between the two random variables. A positive Pearson’s correlation indicates that when one random variable takes a large (resp. small) value, so does the other. A negative Pearson’s correlation indicates that when one random variable takes a large (resp. small) value, the other takes a small (resp. large) value.
Why Is Pearson’s Correlation So Popular?
The popularity of Pearson’s correlation as a measure of dependence between two random variables can be attributed to its general nature in the special case of Gaussian distributions.
Indeed, when the two random variables are jointly Gaussian, Pearson’s correlation fully captures the extent to which there is any association whatsoever (linear or nonlinear) between them. Moreover, Gaussian distributions play a pivotal role in probability theory and its applications.
To mathematicians, they are the easiest family of probability distributions to work with, they have been studied extensively, they often arise unexpectedly in seemingly unrelated probability problems, and results about them abound in the literature.
To statisticians, Gaussians are a blessing because of their analytical tractability, the fact that errors incurred while estimating properties of unknown distributions usually behave like Gaussians — even when the estimated distributions aren’t — thanks to a family of results often referred to as Central Limit Theorems, and the fact that they belong to the exponential family of distributions — which itself plays a key role in Bayesian statistic — , to name but a few reasons.
To machine learning researchers, the Gaussian distribution arises naturally as the solution to some important optimization problems over probability distributions. One such problem is the maximum-entropy problem, which aims at finding among all probability distributions that are consistent with observed empirical evidence, the one the is the most ignorant about everything else.
Pearson’s Correlation And Linear Regression
Pearson’s correlation and linear regression can be viewed as two sides of the same coin.
In the case of two scalar random variables x and y that have been standardized, the Pearson’s correlation coefficient ρ between y and x can be interpreted as the slope of the best linear fit between y and x:
As it turns out, even in the case of a linear fit between a random scalar y and a of random vector X, both standardized,
the vector providing the best linear fit, namely β, is fully determined by the Pearson’s correlation coefficients between y and inputs of vector X, and the Pearson’s correlation coefficients between coordinates of X.
When coordinates of input vector X are decorrelated (have Pearson’s correlation 0), the best linear fit as per the equation above is obtained by setting the i-th coordinate of β to the Pearson’s correlation between y and i-th coordinate of X.
More generally, the linear fit is fully determined by Pearson’s correlation coefficients and reads
where each ∑ is the matrix of Pearson’s correlation between the (coordinates of the) left index and the (coordinates of the) right index.
Noting that all coordinates of β are 0 if and only if all input coordinates of X are decorrelated with y, it follows that linear regression can be thought of as a multivariate generalization of Pearson’s correlation.
Linear Regression In Finance
Linear regression is widespread in finance. From the CAPM, to the APT, to Fama-French factor models, to premium commercial factor models, nearly all factor-based risk models used in finance rely on linear regression together with the assumption that asset returns are i.i.d. across time.
Predictive models commonly used by Quants also heavily rely on linear regression, either in its original formulation (e.g. OLS, autoregression and vector autoregression, etc.), or regularized (e.g. Ridge regression, LASSO, etc.) to deal with dependent input variables and/or perform variable selection.
Linear regression also finds applications in causality detection (e.g. Granger causality).
Pearson’s Correlation And The (CAPM’s) Beta Of A Portfolio
The Capital Asset Pricing Model (CAPM), to which we owe the notions of alpha, beta, and market-neutrality, postulates that the excess return of a portfolio over the risk free rate can be decomposed into a random market component whose magnitude is driven by a coefficient beta, a deterministic idiosyncratic average excess return term alpha, and a mean-zero idiosyncratic residual term.
By definition, a portfolio is considered to be market-neutral when its beta is 0. It can be shown that, under CAPM,
meaning that a portfolio is market-neutral if and only if its returns are decorrelated with market returns.
Limitations of Pearson’s Correlation in Finance
The i.i.d. Gaussian assumption under which Pearson’s correlation captures any type of association between two random variables presents severe limitations in financial applications.
Decorrelation does not imply independence; so market-neutral portfolios aren’t always market-neutral!
As (should be) taught in any Probability 101 course, decorrelation does not always imply independence! In particular, if the returns of a market-neutral portfolio are not jointly Gaussian with market returns, then that market-neutral portfolios might in fact depend on the market! This is an obvious statement that is too often overlooked by practitioners.
The worst part is, in such a case, the residual coupling between a so-called market-neutral portfolio and the market can only manifest itself in the tails (higher moments of the distribution) — returns of a market-neutral portfolio are free of first order associations with market returns, but are not necessarily free of higher orders/nonlinear associations.
A market-neutral portfolio might be well hedged in normal market conditions, but it could be significantly exposed to extreme moves, a.k.a. marketwide Black-Swans.
A great fund manager always assuming linearity is like the French chef Paul Bocuse asking his groceries suppliers to run his restaurant!
A related flaw of the use of Pearson’s correlation in finance is the assumption that linearity captures the essence of any association between two random variables. The reality though is that all regression models are both linear and nonlinear, depending on what observations of input phenomena are used as reference.
Let us consider the regression model
where f is a nonlinear function. Clearly, this model posits a nonlinear association between y and x. However, the same model can also be regarded as positing a linear association between y and f(x).
That said, x and f(x) can be viewed as two different ways of observing the same random phenomenon. The process of observing the underlying random phenomenon would typically be undertaken by a data vendor, or implied by arbitrary conventions (e.g. exchange contract specifications etc.). Always assuming linearity holds in the representation provided to you by your data vendor or exchange is like assuming that the data vendor or exchange did (most of) your job. If they were good at your job, they would be running your fund.
Let’s take a concrete toy example. Let’s assume that we are interested in studying the relationship between the value of an Australian dollar in U.S. dollar, and the value of a New Zealand dollar (NZD) in U.S. dollar. Let’s assume we are determined to use linear regression. Should we be regressing rates themselves? The logarithms of rates? Their daily returns? Any other transformation of rates? Obviously, each scenario would result in a completely different model, and not asking these questions is as good as letting data vendors and exchanges build our models for us.
In order to use linear models, one should additionally seek to understand, better yet to learn from the data, what transformation would make the linear assumption useful.
At this point, it is worth stressing that the t-test, beloved by economists, and other tests of statistical significance, cannot be used to prove the validity of the linear assumption. They test the existence of (non-null) first order effects, they do not test the absence of (non-null) higher order effects.
In summary, we should all be like the late French chef Paul Bocuse: worry a whole lot about the quality of the ingredients we use, but worry even more about how to put them together to make the perfect meal (or predictive/risk model).
Always assuming returns are i.i.d. is like swearing that markets are fully efficient, and that you can generate alpha nonetheless!
The very essence of the pursuit of alpha is the premise that, one way or the other, the past is related to the future, that the future can somewhat be anticipated, or in mathematical terms that markets, as a stochastic system, exhibit memory.
It is therefore counter-intuitive to both seek alpha and assume market dynamics are memoryless. If markets were memoryless, how would it be possible to learn systematic strategies from data (i.e. make money as a Quant fund)? Yet, a very large number of risk models are based on factor models that assume time series of returns are memoryless. This assumption is not just inconsistent with empirical evidence, it can also hide a lot of tail risk as we will see in Part II.
To find out about an alternative to Pearson’s correlation for measuring risk in your portfolio while accounting for nonlinearities and temporal dependence, and a similar alternative to CAPM’s beta, read Part II of this post, and our Yellow Paper for a more technical discussion.