Alex Botsula
5 min readFeb 1, 2023

Experiments with denoising/shrinkage of a covariance matrix

I run some experiments on the impact of covariance matrix construction on portfolio stability. In particular, the following ways to generate a covariance matrix will be attempted:

  • Sample covariance — the most basic approach to construct a simple covariance matrix based on historical return data
  • Shrinkage of covariance matrix — additional shrinkage applied to all off-diagonal elements of the sample covariance matrix
  • Ledoit-Wolf shrinkage — additional tuning of shrinkage coefficient to achieve optimal mean square error between estimated and the real covariance matrix

Details of the implementation of the covariance matrix estimation methods can be found in [1].

We further assess the implication of shrinkage on portfolio optimization through the simulation of efficient frontiers with noisy asset returns.

Getting raw assets’ return data

Using historical returns from the Quantiacs Q18 NASDAQ-100 Stock Long-Short contest. Refer to https://quantiacs.com/documentation/en/user_guide/local_development.html

import numpy as np
import qnt.data as qndata
from sklearn.covariance import shrunk_covariance, empirical_covariance, ledoit_wolf
import matplotlib.pyplot as plt
data = qndata.stocks.load_ndx_data(min_date='2021-06-01')
data_close = data.sel(field='close').to_pandas().dropna(how='any', axis=1)
data_return = data_close.pct_change(1).iloc[1:,:]
fetched chunk 1/1 0s
Data loaded 0s
data_close.plot(figsize=(15,10), legend=False);
png

Function to construct efficient frontier

For further experiments, we will need a function constructing an efficient frontier based on a classical minimum variance optimization process. I could not find a suitable package for this task, so the section presents the implementation of efficient frontier construnction functions.

Min variance portfolio

(Yes, I’ve run it through ChatGPT to generate docstrings.)

def min_var_portf( 
mu,
cov_mat,
r
) :
"""
Calculate the minimum variance portfolio weights.

Parameters
----------
mu : numpy.ndarray
1D array with the expected returns of the assets.
cov_mat : numpy.ndarray
2D array with the covariance matrix of the assets.
r : float
The required return for the portfolio.

Returns
-------
numpy.ndarray
1D array with the weights of the assets in the minimum variance portfolio.

"""
n_assets = cov_mat.shape[1]

A = np.zeros(shape=np.add(cov_mat.shape, 2))
A[0:n_assets, 0:n_assets] = 2 * cov_mat
A[n_assets, 0:n_assets] = mu
A[n_assets+1, 0:n_assets] = 1
A[0:n_assets, n_assets] = -mu
A[0:n_assets, n_assets+1] = -1

b = np.zeros(n_assets + 2)
b[n_assets] = r
b[n_assets + 1] = 1

A_inv = np.linalg.inv(A)
wgt = (A_inv @ b)[:-2]

return wgt
# Test

n_assets = data_return.shape[1]
mu = np.random.normal(size=n_assets, loc=0, scale=.02)
cov_mat = empirical_covariance(data_return) * 252

wgt = min_var_portf(mu, cov_mat, 0.1)

var = np.transpose(wgt) @ cov_mat @ wgt
var
0.011656238500254209

Efficient frontier

def efficient_frontier(
mu,
cov_mat,
r_min,
r_max,
r_step
):
"""
Calculates the efficient frontier, a graphical representation of the trade-off between risk and return for a given portfolio.

Parameters
----------
mu : numpy array
The expected returns of the assets in the portfolio.
cov_mat : numpy array
The covariance matrix of the assets in the portfolio.
r_min : float
The minimum return of the portfolio.
r_max : float
The maximum return of the portfolio.
r_step : float
The step size between the minimum and maximum returns.

Returns
-------
tuple
A tuple containing two arrays:
- The first array is a 1D numpy array of returns along the efficient frontier.
- The second array is a 1D numpy array of the variances of portfolios with returns along the efficient frontier.
"""
r_vec = np.arange(r_min, r_max+1e-5, r_step)

wgt_min = min_var_portf(mu, cov_mat, r_min).reshape(-1,1)
wgt_max = min_var_portf(mu, cov_mat, r_max).reshape(-1,1)

beta = (r_vec - r_min) / (r_max - r_min)
beta = beta.reshape(-1, 1)
wgt_ef = wgt_min @ np.transpose(1-beta) + wgt_max @ np.transpose(beta)

var_ef = [np.sqrt(np.transpose(wgt) @ cov_mat @ wgt) for wgt in wgt_ef.T]

return(r_vec, var_ef)
# Test

r_test, var_test = efficient_frontier(mu, cov_mat, 0.01, 0.2, 0.001)
plt.plot(var_test, r_test)
[<matplotlib.lines.Line2D at 0x7f9a31bed310>]
png

Experimenting with the stability of portfolio construction

To visualize the impact of the noise in the covariance matrix, we take the approach suggested in one of the exercises in [2]:

Compute one hundred efficient frontiers by drawing one hundred alternative vectors of expected returns with a Normal-distributed noise with zero mean and 1% standard deviation.

Let’s begin with constructing the three versions of covariance matrices.

cov_mat_sample = empirical_covariance(data_return) * 252
cov_mat_shrunk = shrunk_covariance(cov_mat_sample, shrinkage=.05)
cov_mat_lw = ledoit_wolf(data_return)[0] * 252

The level of noise in the covariance matrix is defined by its condition number, calculated as a ratio between the maximum and minimum eigenvalue of the matrix. We estimate this for the three covariance matrices constructed.

print('Sample covariance\t{:.2E}'.format(np.linalg.cond(cov_mat_sample)))
print('Shrunk covariance\t{:.2E}'.format(np.linalg.cond(cov_mat_shrunk)))
print('LW covariance\t\t{:.2E}'.format(np.linalg.cond(cov_mat_lw)))
Sample covariance 1.46E+04
Shrunk covariance 8.22E+02
LW covariance 1.98E+03

With a relatively aggressive shrinkage, the shrunk version of the covariance matrix achieved the largest reduction in the condition number, with Ledoit-Wolf implying more gentle shrinkage and hence moderate reduction in the condition number.

Compare reference efficient frontiers

Continuing to the main part of the experiment, we sample one hundred efficient frontiers based in all three versions of the covariance matrices, with the results presented below.

# Comparison of EF curves
r_ub = 0.01
r_lb = 0.2
r_step = 0.001

n_runs = 100

for i in range(n_runs):
mu_noise = mu + np.random.normal(loc=0, scale=0.01, size=n_assets)

r_smpl, var_smpl = efficient_frontier(mu_noise, cov_mat_sample, r_ub, r_lb, r_step)
r_shrunk, var_shrunk = efficient_frontier(mu_noise, cov_mat_shrunk, r_ub, r_lb, r_step)
r_lw, var_lw = efficient_frontier(mu_noise, cov_mat_lw, r_ub, r_lb, r_step)


plt.plot(var_smpl, r_smpl, color='lightblue', linewidth=0.5, \
label='Sample Covariance' if i == 0 else '')
plt.plot(var_shrunk, r_shrunk, color='lightgreen', linewidth=0.5, \
label='Shrunk Covariance' if i == 0 else '')
plt.plot(var_lw, r_lw, color='lightgrey', linewidth=0.5, \
label='LW Covariance' if i == 0 else '')


plt.legend()
plt.show()
png

My take on the resulting graphs is as follows:

  • “Raw” sample covariance implies the best returns of the portfolio driven by higher diversification opportunities, however, the variance of the outcome can be extreme even for a relatively moderate noise in the assets returns;
  • Shrinkage is a powerful technique in tackling noise, but it is difficult to define the right level of shrinkage coefficient to balance the bias in the outcomes. In our results the deviation of the efficient frontiers from the source is quite significant for the shrinkage coefficient selected;
  • Ledoit-Wolf suggests the optimal out-of-the-box shrinkage, with significant improvement in the level of noise and moderate deviation from the source efficient frontier.

References

[1] https://scikit-learn.org/stable/modules/covariance.html#shrunk-covariance
[2] Marcos M. López de Prado “Machine Learning for Assets Managers”

Alex Botsula

Hi there, I am going to demonstrate some use cases of ML algorithms in market analysis and trading strategies. https://www.linkedin.com/in/alex-botsula-0421232