The Stats Dinosaur — Optimal Portfolio — How many years of data is required?

A quick summary on why it is not advisable to always construct a portfolio using the Harry Markowitz portfolio theory. HMP theory involves the calculation of var-covar matrix.


Consider a case, when you have 50 stocks in your portfolio. Essentially, you have to estimate 1275 elements of the var-covar matrix from the past data. Estimates would be accurate only when you have much more than 1275 elements or, in other words, a minimum of 25 years of data for the 50 stocks.

However, 25 years of data alone is not sufficient, we need at least 50 years of data. Why?

If m > n, i.e. less no. of years compared to the number of stocks, the matrix is non-invertible. Hence, we need at least m years of data of each stock to begin with, else the matrix is non-invertible.
As you can see, we need at least 50 years of data to begin with. Even then, sometimes, two stock returns may be highly correlated (may be, they are operating in the same industry). Is there a way to get a nonsingular var-covar matrix with fewer years of data?

Statisticians around the world have come up with a method called shrinkage estimators — estimators which are biased, but whose S.E. is much smaller compared to the usual estimators. This concept of shrinkage estimators is very useful for portfolio theory. By using shrinkage estimators in estimating the var-covar matrix, our calculations of efficient portfolio will be consistent, albeit biased(to a small extent) — which is acceptable to most investors.

Theory of shrinkage estimators

Example of how shrinkage estimators helps in better estimation.

Mean Squared Error (MSE) is the difference between the estimator and what is estimated. Only in the case of an unbiased estimator, the MSE is equal to the variance.

Copyrights with Prof. Mark H. Hansen of UCLA
See how the MSE for shrinkage estimators is better than traditonal OLS estimators

The whole game of modern portfolio theory revolves around which biased estimators to use in place of the traditional unbiased estimators.