Information-Theoretic Alternatives To Pearson’s Correlation And Portfolio ‘Beta’

Published in

The Principled Machine Learning Researcher

10 min readAug 12, 2018

This is the second part of a two-parts post illustrating the practical importance of accounting for both nonlinearities and temporal dependencies when assessing portfolio risk, which the widely adopted (Pearson’s) correlation coefficient fails to do.

In Part I we provided a basic introduction to Pearson’s correlation, its relation to linear regression and portfolio beta, and its limitations as far as measuring dependence between assets is concerned.

In this post, we provide empirical evidence that the i.i.d. Gaussian assumption for asset returns does not hold for U.S. stocks and futures, and we present an alternative to Pearson’s correlation, namely the information-adjusted correlation, which measures the association between time series (not random variables), while fully capturing nonlinearities and, more importantly, temporal structures. We then use information-adjusted correlation to construct an information-theoretic alternative to the (CAPM’s) beta of a portfolio relative to the market, which we call information-adjusted beta.

Measuring Time Series Association With Information Theory

Note: All logarithms in this section are base 2.

Entropy As A Measure Of Uncertainty

The amount of information contained in a complex system modeled as a random variable is typically defined as the amount of uncertainty in its associated random variable.

Measuring the amount of uncertainty in a random variable is a problem that is as old as information theory itself. The canonical solution to this problem is the notion of information entropy introduced by Claude Shannon, the father of information theory, in his seminal paper A Mathematical Theory Of Communication, in which he focused on discrete random phenomena (i.e. those taking a countable number of values).

The entropy of a probability distribution with density function p with respect to a base measure **dμ.** The entropy of a random variable is that of its probability distribution.

The notion of information entropy introduced by Shannon for discrete random variables was later generalized to any random variable.

An important related measure is the so-called conditional entropy. Intuitively, the conditional entropy of a random variable y given x is the amount of information/uncertainty that remains about random variable y given random variable x.

More specifically, it is the difference between the amount of uncertainty (or entropy) there is in y and x collectively, and the amount of uncertainty there is in x.

Venn diagram illustrating the link between entropies, joint entropy, and conditional entropy

As is illustrated in the Venn diagram above, the amount of information contained in y and x collectively is rarely the sum of the amount of information contained in y and that contained in x, as there could be information redundancy between y and x.

One of the beauties of using entropy as a measure of information is the fact that the conditional entropy of y given x is never greater than the entropy of y, and the two are equal if and only y and x are independent (i.e. there is no association between the two whatsoever, linear or nonlinear).

The notion of conditional entropy fully captures independence between two random variables

Unlike Pearson’s correlation, conditional entropy captures both linear and nonlinear associations between random variables.

A related measure of association is the mutual information between y and x, defined as difference between the entropy of y and the conditional entropy of y given x which, as the name suggests, reflects the amount of information shared between y and x.

As it turns out, this quantity coincides with a formal and popular statistical measure of how far off we would be if we were to assume that y and x are independent, namely the so-called Kullback-Leibler divergence.

Relation between the mutual information between two random variables, their conditional entropies, and the KL-divergence between their joint distribution and the product of its marginals.

In short, even if we were to assume that asset returns are i.i.d., we can borrow from information-theory to construct a measure of association that, unlike Pearson’s correlation, fully captures both linear and nonlinear associations.

Entropy Rate As A Measure Of Information In Time Series

The notion of time plays too central a role in economics and financial markets to believe that order doesn’t matter, and that the same random phenomenon keeps repeating itself. Simply put, assuming returns are i.i.d., more often than not, is wrong. The natural probabilistic abstraction to model financial markets is the notion of a stochastic process or a time series, not the notion of a random variable.

A time series is basically a time-stamped collection of random variables.

Fortunately, the notions of entropy and conditional entropy are extended to time series by the notions of the entropy rate of a time series

and the conditional entropy rate of a time series given another

Their interpretations are very similar. The entropy rate measures the amount of information produced per unit of time by a time series. The conditional entropy rate measures the amount of new information produced by a time series per unit of time, that is not already contained in another time series.

Similarly to the random variable case, the difference between the entropy rate of a time series and its conditional entropy rate given another time series reflects the amount of information shared between the two time series per unit of time.

The amount of information shared between time series {x} and {y} per unit of time is equal to the rate of KL-divergence between the joint process of {x, y}, and the product of coordinate processes {x} and {y} (i.e. the joint process stripped out of any association between coordinate processes {x} and {y}). It is also equal to their rate of ‘mutual information’.

Crucially, the notion of conditional entropy rate goes well beyond linear associations of samples corresponding to the same time, and captures any association between the two time series, linear or nonlinear, and across time.

The notion of conditional entropy rate fully captures independence between two time series

Pairwise Incremental Diversification As A Measure Of Dependence Between Assets

In our Yellow Paper, we define the amount of diversification an asset adds to the other as the mutual information timescale (inverse of the rate of mutual information) of time series of returns:

Intuitively, this quantity can be interpreted as the amount of time it would take on average to see 1 bit of shared information between the two assets (or equivalently their time series of returns). The less related two assets are, the longer it would take to observe 1 bit of mutual information between their returns time series. Similarly, the more related two assets are, the less time it would take to see 1 bit of mutual information between the two.

Incremental diversification is always positive, and varies between 0 (when one time series of returns can be fully determined from the other) and +∞ (when the two time series of returns are independent).

From Incremental Diversification To Information-Adjusted Correlation

The alert reader has certainly noticed that we haven’t made any specific distribution assumption to ensure that our notion of incremental diversification fully captures any form of association between two time series of returns, linear or nonlinear, at the same time or across time. Moreover, it is possible to estimate incremental diversification from empirical evidence without placing any arbitrary distribution assumption (see Yellow Paper our for more details).

Now, here’s the deal. We known that in the case of i.i.d. Gaussians, Pearson’s correlation is sufficient to characterize any form of association, linear or otherwise. This begs the question: what is the functional relationship between incremental diversification and Pearson’s correlation in the case of i.i.d. Gaussians? It turns out that the answer is available in closed form:

Relationship between incremental diversification and Pearson’s correlation when time series of returns are assumed jointly Gaussian and memoryless.

We can also ask the reverse question. Given that we know how to accurately estimate incremental diversification, what Pearson’s correlation coefficient would the estimated incremental diversification value correspond to under the i.i.d. Gaussian assumption? The answer to this question — obtained by inverting the equation above — is what we refer to as information-adjusted correlation.

The information-adjusted correlation between two assets with time series of returns {y} and {x} is the Pearson’s correlation coefficient that, under the possibly incorrect i.i.d. Gaussian assumption, would produce an accurate measure of incremental diversification.

We can then independently estimate Pearson’s correlation and compare it to information-adjusted correlation. If the Gaussian i.i.d. assumption is valid, then the two values should be close to each other!

A Simple And Practical Black-Swan Test

It sounds nice and all but where are the black-swans you might ask! Well, if there is any practical takeaway from this post this is it:

Read our Yellow Paper to figure out how to estimate ACorr from data.
Case 1: ACorr ≈ Corr: If you observe that information-adjusted correlation is (approximately) equal to Pearson’s correlation, then the i.i.d. Gaussian assumption holds, and you can trust your favorite linear i.i.d. factor model.
Case 2: |ACorr| < |Corr|: Sorry, but there is a bug in your code! This is mathematically impossible.
Case 3: |ACorr| >> |Corr|: Red flag! There is a whole lot of risk in your portfolio that neither Pearson’s correlation nor your favorite linear i.i.d. factor model is accounting for, and that will come bite you hard in big market moves. Any portfolio of yours in these assets that you think is market-neutral is probably not market-neutral at all!

Does It All Matter? You Bet It Does!

As previously discussed, if we plot information-adjusted correlation against Pearson’s correlation for some pairs of assets, any significant deviation from the line y=x is a strong indication that the i.i.d. Gaussian assumption for asset returns doesn’t hold.

Well, let’s do just that. Let’s consider as universe of assets the constituents of the S&P 100 and 60 of the most liquid U.S. futures (front-month continuously adjusted with the backward ratio method). For each pair of assets in the universe we compute both the Pearson’s correlation between their daily returns and the information-adjusted correlation between their daily returns, we plot one against the other in a scatter plot, and we get the following chart.

Relation between Pearson’s correlation and information-adjusted correlation for S&P 100 stocks and 60 of the most liquid U.S. futures. Information-adjusted correlation are estimated using the maximum-entropy approach described in our Yellow Paper.

Let’s analyze the chart.

Observation 1: We see that the closer Pearson’s correlation is to 1 (resp. -1), the closer information-adjusted correlation is to 1 (resp. -1). This makes intuitive sense. Pearson’s correlation captures linear associations between returns corresponding to the same time. Strong evidence of this specific form of association does imply strong evidence of association between the underlying time series of returns, which is what information-adjusted correlation captures.

Observation 2: However, we see that information-adjusted correlation does not go to 0 as Pearson’s correlation goes to 0. Intuitively, the lack of evidence of linear association between daily returns corresponding to the same time (i.e. weak Pearson’s correlation) does not in general imply evidence of the lack of association between the two underlying time series of daily return. This is true in the special case of jointly Gaussian white noises, but certainly not in general. In general, there could be other forms of associations (e.g. nonlinear associations, temporal dependencies etc.) that would be captured by information-adjusted correlation but not Pearson’s correlation.

The fact that the scatter plot above deviates significantly from the y=x line is sufficient empirical evidence that the i.i.d. Gaussian assumption does not hold for daily returns of U.S. stocks and futures!

Main Observation: You see those pairs with 0 Pearson’s correlation on the vertical axis? None of them have 0 information-adjusted correlation! A Pearson’s correlation of 0 between liquid exchange-traded U.S. assets can hide up to a 0.3 ‘real’ correlation, which can only arise through nonlinearities (i.e. fat tails) or temporal dependencies (i.e. butterfly effects), both of which can be source of black-swan events.

Basically, linear i.i.d. factor models do not accurately capture risk in liquid U.S. stocks and futures!

Information-Adjusted Portfolio Beta

As discussed in Part I, the (CAPM’s) beta of a portfolio can be obtained as

Beta of a portfolio relative to the market

A simple generalization of this measure to capture both nonlinear and temporal dependencies between the portfolio’s returns and those of the market is obtained by replacing Pearson’s correlation with information-adjusted correlation. We call the resulting measure information-adjusted portfolio beta.

Information-adjusted beta of a portfolio relative to the market

A direct consequence of the discussion above is that a portfolio with 0 information-adjusted beta has returns time series independent from that of the market, and therefore is truly independent from the market, truly market-neutral.

Final Words

In our Yellow Paper we introduce an information-theoretic alternative to Pearson’s correlation, namely the information-adjusted correlation, that fully captures nonlinearities and temporal dependencies between time series of returns in a model-free fashion.

We use the information-adjusted correlation to construct an alternative to the CAPM’s beta of a portfolio, namely the information-adjusted beta, that captures any association between a portfolio and the market (linear and nonlinear, at the same time, or across time).

We illustrate that the i.i.d. Gaussian assumption for asset returns is inconsistent with empirical evidence in U.S. stocks and futures, which attests to the practical importance of information-adjusted alternatives proprosed.

Crucially, we illustrate that Pearson’s correlation, CAPM’s beta, and other i.i.d. linear factor models can hide a significant amount of financial risk that will reveal itself as black-swan events.