Extreme Portfolio Optimisation Weights And How To Fix Them

Applications of Monte Carlo Methods

The Rational Investor Allocation

Markowitz Portfolio Optimisation seeks to find a set of weights for N assets in a portfolio such that the risk adjusted return of the portfolio is maximised (aka the Sharpe Ratio).

This is the portfolio the so called “rational investor” would choose as clearly no one would want a portfolio that did not generate the best risk adjusted return right?

Let us introduce this with a little bit of mathematics to accompany this. Capital “Sigma” is the covariance matrix of asset returns, with entries measuring the strength with which one assets returns tend to move with or against another in the portfolio. Mu is a vector containing the mean returns of each of the N assets over the period observed. W is the vector of weights to be optimised, these must sum to 1 (or 100% of the portfolio) and be greater than zero so as not to admit short positions.

Markowitz Portfolio Optimisation Maximise the Sharpe Ratio

To solve this problem we can run an algorithm called a quadratic programme to search for the optimal solution. To apply this algorithm we must switch notation so as to be in quadratic form. This is given below (in case you want to actually implement this yourself). As a technical note you need to normalise the proxy weights (Theta) to obtain the actual weights “w”.

Quadratic Form Required by Optimiser

Running the optimiser on an actual portfolio, we see this doesn’t always give a very nice result. In fact with the particular portfolio chosen, selected based on fundamental analysis, the optimiser allocates 60% of the portfolio into just 2 stocks and assigns zero weight to 70% of the stocks in our portfolio!

A rather terrible result produced from Markowitz Portfolio Optimisation

I think its pretty clear that the rational investor would be very uncomfortable with this portfolio, nor does this seem particularly ‘risk adjusted’ when you consider the concentration of risk in 2 stocks! Thats not to say that you can’t get weights that look right, but this certainly isn’t always the case as demonstrated in this example.

Why are we getting extreme weights?

Put simply what gets measured gets done. The optimiser seeks to maximise the Sharpe Ratio, and this is exactly what it does. The are in fact 3 problems here:

1) Our prior beliefs about a stock’s future performance is absent from the optimisation process

Fundamental analysis of the stocks in the portfolio will yield us as the analysts with varying degrees of confidence in each stock’s investment thesis and thus we will have different expectations for risk and potential returns in each equity we select. Unfortunately the optimiser doesn’t know about this as, your research is not in the encoded in the mean and covariance matrix of historical returns.

2) The optimised weights are random variables but no confidence interval is estimated.

The optimiser returns a single set of weights which minimise the objective function, however we must realise that the optimiser has been fed inputs which are in fact random variables (sample means and covariances of returns) There is hence uncertainty in the outputted weights, which are also random variables. It is essential to know what level of confidence can be assigned to the weights obtained, low confidence (a wide interval) means we don’t know the optimal weight with any degree of accuracy and thus should not blindly use the output of the optimiser directly in our portfolio.

3) Nicer weights which are close to optimal are discarded by the optimiser.

The optimiser searches for the maximum possible risk adjusted return, this sometimes creates undesirable results as in our previous example. However there may be slightly less optimal weights which have more desirable characteristics, and thus we may wish to consider these instead, however the optimiser does not give us alternative candidate weights.

The solution

I will solve the above problems without meddling with the optimiser equations or procedures. Sidestepping the optimiser has the advantage of not introducing additional complexity which would likely create more unforeseen side effects. Instead we will do the following:

Part 1: (Better Weights)

  1. Calculate the mean and variance of the portfolio generated by the optimiser.
  2. Sample 20,000 randomly generated portfolio weights (With a clever choice of sampling distribution called a “Dirichlet” distribution, to ensure efficient exploration of the weight space)
  3. Estimate the mean and covariance of each of these portfolios and calculate the distance between each portfolio and the optimiser’s portfolio.
  4. Pick the portfolio weights with smallest distance from optimal of the random portfolios.

Part 2: Confidence Intervals Through Bootstrapping:

  1. Re-sample the returns data with replacement and run the above procedure a few 1000 times to obtain many candidate “best optimal” and “best random” portfolios, this will serve to create a distribution for each weight and thus quantify the uncertainty in both the optimisation and sampling method.
  2. Plot the distribution of weights for each equity and compute the 95% credible interval.

Some Technical Details

At this point your probably wondering why generating random portfolios is a useful! Well the trick is in how we generate random portfolios. Specifically it is more accurate to say that we sample random portfolios from a specific distribution with desirable characteristics. In this case we use what is called a Dirichlet Distribution. This distribution has some nice properties:

  1. Sampling units are vector valued: It outputs a vector of values of any length we want
  2. [0,1] Range: Each value in the vector ranges between 0 and 1, so it can be viewed as a set of portfolio weights.
  3. Unit Sum: The sum of all values in the vector sum to 1. So the portfolio weights will sum to 1.
  4. Prior Information Can be Incorporated: The distribution has n entries in the vector and n parameters, with each parameter controlling how much of the density of the multivariate distribution is concentrated closer to 1 or 0 for the n’th weight. This allows us to incorporate our prior beliefs about how much weight we would typically like to place on each stock in our portfolio relative to the others.
This chart visualises a Dirichlet pdf for various values of the the parameter vector alpha. This examples has only 3 weights or entries in the vector (denoted x1, x2, x3), the darker the region, the more likely values of the 3 weights are to be sampled from this region. For example in the final plot the weights (x1,x2,x3) are mostly likely to be around (0.2, 0.2, 0.6) — read the graph as a triangular lattice.

We care about this as the random samples drawn from a Dirichlet Distribution are going to explore the high dimensional weight space by expending most effort on areas the space that we want explored — in our case since I have no particular prior preference for one equity over the other I will go for concentration parameters which are all equal to 1, thus keeping all the weights equally small. This will be far more efficient than sampling from a multivariate normal or uniform distribution which would wastefully explore weighting schemes which are undesirable (eg extreme weights or weights > 1).

In the second phase I re-sampled the returns data and repeated the procedure many times, this tries to adjust for instabilities in the optimisation process by synthetically creating new return series of a similar distribution to the observations. This provides a way to quantify the uncertainty in candidate weights that I produce. This process is commonly known in the statistics world as “bootstrapping”.

The Results

In the below figure we plot each of the bootstrapped portfolios for both simple Markowitz Optimisation (Light red) as well as those generated from our Dirichlet Procedure (Light blue).

We notice immediately that the distribution of the Dirichlet portfolios is much more tightly clustered than the Markowitz portfolios, showing that the risk adjusted returns of the portfolios generated under my proposed scheme are more stable under re-sampling of the return series. Additionally we see in to two right-hand plots that under my Dirichlet model we have significant and positive weights for all stocks (indicated by the dark blue 95% credible interval) whereas the weights for all except 1 stock (VWDRY) are not different from zero with a 95% confidence, rendering the Markowitz Optimal portfolio useless.

Its also interesting to see our portfolio’s weight distribution as a density plot, this may be helpful in building some intuition on the above plots.


In this article we show that Markowitz portfolio optimisation can produce extreme or undesirable weighting schemes. Rather than change the optimiser I apply a Monte-Carlo method based on ideas proposed by Bartlmae 2009. In particular I employ a Dirichlet distribution to improve efficiency of exploration and facilitate the incorporation of prior beliefs into the optimisation process. In conclusion I find that the weights produced by this process are more desirable for practical use and also qualitatively more robust to resampling of the returns data.

You’ve been reading Quantamental the UCD Investors & Entrepreneurs Society Data Science and Financial Research Blog


Bartlmae, Kai. “Portfolio construction: Using bootstrapping and portfolio weight resampling for construction of diversified portfolios.” 2009 International Conference on Business Intelligence and Financial Engineering. IEEE, 2009.

If you want to try this out yourself, i’ve put the code below!

UCD Statistics & ACM, Learning Data Science, Winning Team @ Citadel Dublin Data Open. www.hugodolan.com/linkedin | Mailing List: http://eepurl.com/gkV7ov