Covariance, correlation and beta: defined and disentangled

Published in

decomplexify

9 min readJan 30, 2021

Explaining measures of association between variables

Covariance, correlation and beta are all measures that quantify relationships between variables. Predicting these successfully is at the core of several aspects of investing, including both portfolio construction and determining which factors impact asset prices.

This article will present a simple example to visually define these measures and distinguish between them.

In order to explain measures of association between two variables, we start with three statistics that describe just a single variable (mean, variance and standard deviation). We then demonstrate how measures of association for two variables (covariance and correlation) build on them. Finally, we describe the data via a model (as opposed to describing the data directly) to explain beta in the context of asset pricing models.

By the end of this article, you will understand how each measure of association is calculated, how they are related to each other and the differences between them.

Mean (“the center”)

Assume we have two variables that have a broadly positive relationship. In other words: they generally both go up (or down) at the same time. These two variables might be ice cream sales and temperature; or percentage return of two equity sectors; or GDP growth and equity returns. For our illustrative example, we will call them simply X and Y.

Six observations of X and Y are plotted below together with their respective means (Xbar in green and Ybar in pink). Each mean is simply the sum of all values divided by the number of observations (n).

In practice, we would clearly always use far more than 6 observations to estimate anything but this is a simple visual example!

Variance and Standard Deviation (“dispersion”)

Where mean defines the center of the data, variance measures the dispersion of the data, specifically the dispersion around the mean. A low variance represents data that is very tight around the mean; a high variance represents data that is very spread out around the mean.

An estimate of the variance of X based on a sample of data is calculated by finding the difference to its mean (illustrated in green below) then squaring it and taking the average

In a similar way to X, the variance of Y is calculated by finding the difference to its mean (illustrated in pink below) then squaring it and taking the average

Why n-1 and not n in the denominator? The use of n-1 in the denominator is because we are using a sample to estimate the variance of a population. The sample is a small dataset compared to the larger population or ‘ground truth’. For example: this may be the case when estimating the unknown future variance of a financial asset from known historical data. The reasons for this are due to the degrees of freedom within the sample. However, as a rule of thumb, if dividing by n or n-1 actually makes a big difference, then we have a bigger problem… not enough data or large uncertainty in what we are attempting to estimate.

Back to the example, due to the squaring of these differences to the mean, the units of variance are the square of the units of the underlying data. If the data is percentage, then variance is measured in units of percentage squared. If the data is GDP, then variance is in units of GDP squared. If the data is number of ice-creams, then variance is numbers of ice-cream squared.

Percentage squared, GDP squared and ice-cream squared are hard (impossible?) quantities to interpret. By ‘undoing’ the square with a square-root, we arrive at Standard Deviation (SD). Standard Deviation is now in the same units as the original data and easier to interpret (at least for a human).

Covariance (joint variation of two variables)

Where variance measures the dispersion of one variable around its mean, covariance measures the joint variability of two variables to each other. Covariance of X with X, itself, is variance (as described above). Covariance of X with Y is illustrated below.

Covariance is the joint variability of two variables

If X is above its mean and Y is above its mean, then the contribution to covariance of that observation is positive. Likewise, if they are both below their respective means, then the contribution to covariance is also positive (a negative times a negative is a positive). Examples from these upper-right and lower-left quadrants are illustrated below.

Conversely, if one of X or Y is above its mean but the other is below, then the contribution of that observation will be negative. Examples from these upper-left and lower-right quadrants are illustrated below.

The sum of all the positive and negative contributions from each data point create either a positive or negative covariance overall. In our example, the magnitude of the observations in the upper-right & lower-left (positive, yellow in the image below) are greater than those in the upper-left & lower-right (negative, red in the image below) so the covariance is positive overall.

The relative sizes of the positive or negative contributions determine the covariance

Covariance is a measure of the joint variability between X and Y and the extent to which they are linearly related to each other. The units of the covariance is the product of the units of X with the units of Y. But, in a similar way that the units of variance are hard to interpret, the units of covariance are also hard to interpret. For example, they could be percentage squared; or GDP multiplied by equity returns; or ice-cream multiplied by temperature. In each case these are hard (impossible?) to make sense of.

Unlike variance, where we could take the square-root, the square-root of covariance does not help. Instead, we need another a way to convert covariance into something that is easier to interpret (at least by human brain).

Correlation (standardized joint variation)

The covariance between two variables is a function of the dispersion of each variable and their joint variability. Covariance (above) is expressed in units of the product of both variables and can take an unbounded positive or negative value. Instead, correlation converts it to a normalized value between -1 and 1 by dividing covariance by the product of standard deviations of both variables. This bounded measure, devoid of units and on a standard scale of -1 to 1 is now easier for a human to comprehend.

An additional feature of correlation that makes it useful: if one of our variables is multiplied by a constant then covariance will be changed, however correlation will not.

The relationship between Covariance and Correlation and their differences

From data to models

Everything we have discussed so far (mean, variance, standard deviation, covariance, correlation) are statistics calculated on a sample of data. That sample of data can be used to build a model to approximate the data. We might do this for two reasons: to explain the past (an explanatory model) or to estimate the future (a predictive model). We will now look at some properties of a simple model.

Beta coefficients

Ordinary Least Squares (OLS), or linear regression, estimates the parameters (“betas”)in a regression model by minimizing the sum of the squared residuals. This method draws a line through the data points that minimizes the sum of the squared differences between estimated values (from the model) and observed values (from the data).

Any model makes some simplifying assumptions. In this case, we assume the form of the equation that relates Y to X. Since this is an assumption, Y is only estimated and so denoted as ‘y-hat’ . Our assumption is that Y can be represented as the sum β0+ β1X + ε, where β1 represents the sensitivity of Y to X; β0 is the intercept and ε is the residual (or error) in our model.

The parameters of this model (β0 and β1) are selected by minimizing the square of the residuals. Typically this is via minimizing the Root Mean Square Error (RMSE), illustrated below and described in more details in this article here <insert link here>

CAPM

The CAPM (Capital Asset Pricing Model) was an early example of this type of model in finance. Y was expected asset return, X was market return less risk free rate and β1 was the sensitivity of the asset to the market. Hence the adoption of the term ‘beta’ to describe an asset’s sensitivity to the market. It is worth highlighting that the part of the asset return not explained by the market can then only be either β0 (colloquially referred to as ‘alpha’) or ε, the error. It was common practice to ‘assume away’ the errors and conclude that any return not explained as market is inevitably alpha.

Since the 1960s the single factor CAPM has evolved into multi-factor models where each asset now has a β for a larger number of factors (not just the market return). Commonly used factors now include size, value, momentum, low volatility, quality and many others in the ‘factor zoo’.

However, as with any model, an appreciation of model assumptions is always valuable (and specifically the consequences of those assumptions). Any misestimation of the βs, “the betas”, combined with an assumption that any residual return can only be β0, “alpha”, is a potentially dangerous combination. This is of relevance when searching for alpha both in back-tested simulations or performance attribution of portfolios. In these situations, if the residuals are ‘assumed away’, that what should correctly be classified as residuals or error are inferred erroneously to be alpha.

Summary

In order to explain measures of association (covariance, correlation and beta) we started with two measures of dispersion for a single variable: variance and standard deviation. Standard deviation converts variance back into the same units as the original data to make it easier to interpret.

We then discussed covariance as the extent the joint variability of two variables follow a linear relationship. Correlation converts covariance into a measure of association between two variables on a standard scale (-1 to 1). Remember: Covariance is a combination of both correlation and the standard deviation of both variables; useful to a computer but harder to interpret by the mark 1 human eyeball.

Finally, data can be simplified into a linear model. Beta is a coefficient of a simplified model (not the data itself) which measures the sensitivity of one variable to another (in the model, but not the original data itself).

The table below summarizes the various measures by several categories: do they describe dispersion of one variable or the joint variability of two? do they describe raw data or a model? What range of values can each take?

Final comments and a question to think about:

Correlation is a function of covariance & standard deviations. Standard deviations & covariance are a function of mean. If we are attempting to predict future correlation (as opposed to describe past correlation), how important is an accurate prediction of mean in the context of asset returns?
Correlation measures the extent to which variables are linearly related to each other. A correlation of zero mean ‘no linear relation’, which is not the same as ‘no relationship at all’.

A future article will address these questions