Experiments with denoising/shrinkage of a covariance matrix
I run some experiments on the impact of covariance matrix construction on portfolio stability. In particular, the following ways to generate a covariance matrix will be attempted:
- Sample covariance — the most basic approach to construct a simple covariance matrix based on historical return data
- Shrinkage of covariance matrix — additional shrinkage applied to all off-diagonal elements of the sample covariance matrix
- Ledoit-Wolf shrinkage — additional tuning of shrinkage coefficient to achieve optimal mean square error between estimated and the real covariance matrix
Details of the implementation of the covariance matrix estimation methods can be found in [1].
We further assess the implication of shrinkage on portfolio optimization through the simulation of efficient frontiers with noisy asset returns.
Getting raw assets’ return data
Using historical returns from the Quantiacs Q18 NASDAQ-100 Stock Long-Short contest. Refer to https://quantiacs.com/documentation/en/user_guide/local_development.html
import numpy as np
import qnt.data as qndata
from sklearn.covariance import shrunk_covariance, empirical_covariance, ledoit_wolf
import matplotlib.pyplot as pltdata = qndata.stocks.load_ndx_data(min_date='2021-06-01')
data_close = data.sel(field='close').to_pandas().dropna(how='any', axis=1)
data_return = data_close.pct_change(1).iloc[1:,:]fetched chunk 1/1 0s
Data loaded 0sdata_close.plot(figsize=(15,10), legend=False);
Function to construct efficient frontier
For further experiments, we will need a function constructing an efficient frontier based on a classical minimum variance optimization process. I could not find a suitable package for this task, so the section presents the implementation of efficient frontier construnction functions.
Min variance portfolio
(Yes, I’ve run it through ChatGPT to generate docstrings.)
def min_var_portf(
mu,
cov_mat,
r
) :
"""
Calculate the minimum variance portfolio weights.
Parameters
----------
mu : numpy.ndarray
1D array with the expected returns of the assets.
cov_mat : numpy.ndarray
2D array with the covariance matrix of the assets.
r : float
The required return for the portfolio.
Returns
-------
numpy.ndarray
1D array with the weights of the assets in the minimum variance portfolio.
"""
n_assets = cov_mat.shape[1]
A = np.zeros(shape=np.add(cov_mat.shape, 2))
A[0:n_assets, 0:n_assets] = 2 * cov_mat
A[n_assets, 0:n_assets] = mu
A[n_assets+1, 0:n_assets] = 1
A[0:n_assets, n_assets] = -mu
A[0:n_assets, n_assets+1] = -1
b = np.zeros(n_assets + 2)
b[n_assets] = r
b[n_assets + 1] = 1
A_inv = np.linalg.inv(A)
wgt = (A_inv @ b)[:-2]
return wgt# Test
n_assets = data_return.shape[1]
mu = np.random.normal(size=n_assets, loc=0, scale=.02)
cov_mat = empirical_covariance(data_return) * 252
wgt = min_var_portf(mu, cov_mat, 0.1)
var = np.transpose(wgt) @ cov_mat @ wgt
var0.011656238500254209
Efficient frontier
def efficient_frontier(
mu,
cov_mat,
r_min,
r_max,
r_step
):
"""
Calculates the efficient frontier, a graphical representation of the trade-off between risk and return for a given portfolio.
Parameters
----------
mu : numpy array
The expected returns of the assets in the portfolio.
cov_mat : numpy array
The covariance matrix of the assets in the portfolio.
r_min : float
The minimum return of the portfolio.
r_max : float
The maximum return of the portfolio.
r_step : float
The step size between the minimum and maximum returns.
Returns
-------
tuple
A tuple containing two arrays:
- The first array is a 1D numpy array of returns along the efficient frontier.
- The second array is a 1D numpy array of the variances of portfolios with returns along the efficient frontier.
"""
r_vec = np.arange(r_min, r_max+1e-5, r_step)
wgt_min = min_var_portf(mu, cov_mat, r_min).reshape(-1,1)
wgt_max = min_var_portf(mu, cov_mat, r_max).reshape(-1,1)
beta = (r_vec - r_min) / (r_max - r_min)
beta = beta.reshape(-1, 1)
wgt_ef = wgt_min @ np.transpose(1-beta) + wgt_max @ np.transpose(beta)
var_ef = [np.sqrt(np.transpose(wgt) @ cov_mat @ wgt) for wgt in wgt_ef.T]
return(r_vec, var_ef)# Test
r_test, var_test = efficient_frontier(mu, cov_mat, 0.01, 0.2, 0.001)
plt.plot(var_test, r_test)[<matplotlib.lines.Line2D at 0x7f9a31bed310>]
Experimenting with the stability of portfolio construction
To visualize the impact of the noise in the covariance matrix, we take the approach suggested in one of the exercises in [2]:
Compute one hundred efficient frontiers by drawing one hundred alternative vectors of expected returns with a Normal-distributed noise with zero mean and 1% standard deviation.
Let’s begin with constructing the three versions of covariance matrices.
cov_mat_sample = empirical_covariance(data_return) * 252
cov_mat_shrunk = shrunk_covariance(cov_mat_sample, shrinkage=.05)
cov_mat_lw = ledoit_wolf(data_return)[0] * 252
The level of noise in the covariance matrix is defined by its condition number, calculated as a ratio between the maximum and minimum eigenvalue of the matrix. We estimate this for the three covariance matrices constructed.
print('Sample covariance\t{:.2E}'.format(np.linalg.cond(cov_mat_sample)))
print('Shrunk covariance\t{:.2E}'.format(np.linalg.cond(cov_mat_shrunk)))
print('LW covariance\t\t{:.2E}'.format(np.linalg.cond(cov_mat_lw)))Sample covariance 1.46E+04
Shrunk covariance 8.22E+02
LW covariance 1.98E+03
With a relatively aggressive shrinkage, the shrunk version of the covariance matrix achieved the largest reduction in the condition number, with Ledoit-Wolf implying more gentle shrinkage and hence moderate reduction in the condition number.
Compare reference efficient frontiers
Continuing to the main part of the experiment, we sample one hundred efficient frontiers based in all three versions of the covariance matrices, with the results presented below.
# Comparison of EF curves
r_ub = 0.01
r_lb = 0.2
r_step = 0.001
n_runs = 100
for i in range(n_runs):
mu_noise = mu + np.random.normal(loc=0, scale=0.01, size=n_assets)
r_smpl, var_smpl = efficient_frontier(mu_noise, cov_mat_sample, r_ub, r_lb, r_step)
r_shrunk, var_shrunk = efficient_frontier(mu_noise, cov_mat_shrunk, r_ub, r_lb, r_step)
r_lw, var_lw = efficient_frontier(mu_noise, cov_mat_lw, r_ub, r_lb, r_step)
plt.plot(var_smpl, r_smpl, color='lightblue', linewidth=0.5, \
label='Sample Covariance' if i == 0 else '')
plt.plot(var_shrunk, r_shrunk, color='lightgreen', linewidth=0.5, \
label='Shrunk Covariance' if i == 0 else '')
plt.plot(var_lw, r_lw, color='lightgrey', linewidth=0.5, \
label='LW Covariance' if i == 0 else '')
plt.legend()
plt.show()
My take on the resulting graphs is as follows:
- “Raw” sample covariance implies the best returns of the portfolio driven by higher diversification opportunities, however, the variance of the outcome can be extreme even for a relatively moderate noise in the assets returns;
- Shrinkage is a powerful technique in tackling noise, but it is difficult to define the right level of shrinkage coefficient to balance the bias in the outcomes. In our results the deviation of the efficient frontiers from the source is quite significant for the shrinkage coefficient selected;
- Ledoit-Wolf suggests the optimal out-of-the-box shrinkage, with significant improvement in the level of noise and moderate deviation from the source efficient frontier.
References
[1] https://scikit-learn.org/stable/modules/covariance.html#shrunk-covariance
[2] Marcos M. López de Prado “Machine Learning for Assets Managers”