An Optimal Trade

The Kelly Criterion in Practice

Published in

From the Diaries of John Henry

17 min readFeb 24, 2018

After a recent visit to a seminar addressing themes in the sphere of risk management, I was left with the realization of a certain hole in my knowledge around a valuable tool for analyzing portfolio allocations, which I’ve tried to fill with a literature review over the last week or two. This framework, the Kelly Criterion for investing, is I expect commonly known amongst professional or institutional investors but not as widely used in retail channels. This post will attempt a practical overview for what the tool can teach us for managing our own portfolios, primarily intended for my own benefit (the writing process helps me organize my thoughts), but if any interested readers can gain actionable insights from the work all the better. Since this is a realm which can benefit from precise language I’ll attempt to maintain that style throughout. (I may also throw in a few images and music videos for color.)

First here’s a quick sketch of what the Kelly Criterion is and what it’s trying to accomplish. When allocating capital in probabilistic games — such as placing casino bets or investing in financial securities, the criterion derives an optimal bet size and allocation to maximize expected returns for the portfolio for a given time period. Note that this derivation is only an approximation because the underlying assumptions that the model is based on, which could be asset return properties like mean, variance, and correlations which collectively allow us to describe an ‘edge’, are themselves approximations subject to estimation error, as we’ll see later in this post the risks associated with estimation errors can be somewhat mitigated by a reduction in bet size while only sacrificing a portion of expected earnings — what is known as ‘partial Kelly’. It’s worth noting that even an optimal allocation is subject to a range of performance outcomes — however one of the big benefits of a Kelly criterion allocation is that by design it keeps our funds outside of scenarios that could expose us to an exit barrier (ruin), thus maintaining at least a theoretical ergodicity.

In the process of researching this paper, while I did find a few publishings freely available online, the most comprehensive and illustrative turned out to be the first one I clicked on, an overview by the noted investor (and gambler) Edward Thorp titled “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market.” In fact this paper was so illustrative about the concept that I’ll spend most of this post walking through the discussions presented by Thorp. Of course if you’re smart you’ll probably ask “Well what do I need this post for — I’ll just read Thorp’s paper instead.” My response would be yes that’s probably the smart approach, especially if you’re looking for a deeper understanding. However if you’re just looking for a briefer introduction and overview I would caution that the paper itself is fairly dense and the author has a certain propensity for introducing variables and equation parameters in a somewhat haphazardly scattered fashion, so that the extensive equations are on first pass in some cases indecipherable without a concerted effort to track down the meaning of each parameter. This post will mostly gloss over these equations and instead attempt to replace the ideal precision of hard equations with at least I hope a more accessible to the layperson narrative style — what I’ll dub the Roger Penrose approach (inside joke). There will be slightly more precision found with the inclusion of a few illustrative financial derivations using Wolfram Language later in the paper, and once I’m through covering the paper I’ll address a few more points that were picked up in literature review. Again the end goal is for the reader to develop an appreciation for the considerations of use and practical implications of the Kelly strategy, and to facilitate discussions I’ll follow a lot of points addressed by Thorp’s paper “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market”.

The paper in question starts by quickly glancing over some of the backgrounds of origination and early use of the Kelly strategy (originally known as the “geometric mean criterion”), I’ll defer any aspiring historian to those resources sited in Thorp’s paper.

Once through introductions and pleasantries, Thorp immediately turns to (and spends the bulk of the paper on) a series of illustrative examples of placing bets under uncertainty, each of progressive complexity ranging from the simplest application of a weighted coin toss, then casino games, sports betting, and finally settling on Wall Street — the biggest game of them all. In each of these examples new properties of the Kelly strategy are demonstrated. It’s actually a really neat and effective narrative structure and suits the subject matter perfectly.

Van Morrison — Cleaning Windows

Our first example of a coin toss game should be an easy one to grasp. For a player with a finite cash holding, some fraction of that purse is bet on the outcome of each in a series of coin tosses, with a weighted coin that lands heads more than tails — we’ll use for example here a coin with a win/lose ratio of 60/40. We’ll also assume for simplicity that the returns on a wager is equal to the bet size. Even from such a simple game there are a few things that we can learn about making decisions under uncertainty that may not be immediately intuitive. A less sophisticated bettor might consider that in order to maximize his expected returns he should bet his entire purse on each toss, after all the expected return is 20% of our bet size. It turns out there is an error in this reasoning for repeated games because the possibility of losing any intermediate toss means our purse goes to zero along with our ability to participate in subsequent rounds. The presence of this ruin barrier (from which once crossed we cannot return) can be described in mathematical terms as non-ergodicity. Of course we can avoid the possibility of ruin by betting the minimal allowed bet size, but as bets get smaller so does our expected value of winnings. This implies that there is some middle size bet in between these two extremes from which we can avoid the ruin barrier to ensure we participate in all rounds of betting while also maximizing expected returns — this bet size we can learn from the Kelly criterion. It turns out that in order to ensure we avoid a potential ruin barrier over a series of games we’ll need to vary our bet size as a function of our remaining purse value, thus the Kelly criterion doesn’t tell us a fixed bet value to repeat on each toss but instead suggests we make bets as a percentage of remaining purse size. This also implies that as we win tosses our absolute optimal bet values go up!

Joseph’s Brethren Discover Money in Their Grain Sacks — French ca. 1530

I leave formulas and derivations to Thorp, but if you walk through the steps it turns out that this optimal bet fraction can be derived purely from what is known about the coin weightings and associated win/lose ratio, and for our example the optimal bet ratio is equivalent to the “edge” we have in the bet, simply 60% — 40% = 20%. Now this is an application specific solution to the weighted coin problem, but it demonstrates that a derivation can be performed for an arbitrary problem setup. For simple problems like this one solutions can be derived algebraically, but as we approach more complex setups I believe solutions utilizing numerical methods are required.

Note that as a special case of this coin toss problem, if we are only allowed to bet tails (with an unfavorable 40% chance of winning), our expected value after each round of play only goes down and the possibility of ruin exists independent of bet size for long enough time horizons. For these cases as one might find in a casino for instance, I believe the highest expected value bet is the maximum (amount you’re comfortable losing) on a first bet with no repetition. Unless of course you can claw back an edge from counting cards in black jack or have Claude Shannon helping you design a roulette predicting wearable computer hack.

Grateful Dead — Cumberland Blues

To simplify discussions I’m going to go ahead and introduce a chart depicting elements of a Kelly criterion evaluation. First a few variable definitions:

G(f) (the y axis) is the expected value of the growth rate coefficient which measures the exponential rate of increase in purse size per round of betting.
f (the x axis) is the betting ratio, the percentage of current purse size applied to each bet.
fc is the value of the critical betting ratio f above which G(f) turns negative.
f* is the optimal betting ratio f per the Kelly criterion which maximizes G(f).

And here the function G(f) is presented graphically to illustrate:

Now an important point about the Kelly criterion is that even though it does tell us the theoretical optimal betting ratio to maximize expected growth in funds, even if our assumptions are spot on the actual growth achieved will in practice fall along some probability distribution, meaning it is entirely possible that in practice our purse size at some time step could be well below the expected level (including potential for losses of capital). Now if our assumptions are realistic, as we extend our playing time the chances of approaching expected values will improve, the paper even demonstrates ways to calculate a certainty band envelope for different time periods. However for those with some aversion to too much volatility in outcome Thorp suggests the partial Kelly strategy (as mentioned earlier in this post). The partial Kelly solution is very simple, it means that instead of an optimal f* betting ratio, we reduce by some fraction (thus shifting our f somewhere left of f*), which in practice acts as a tradeoff between expected growth and volatility of outcome, and leads to “barbell” allocations split between safe assets and those more risky. Using too large an f∗ and over-betting has potential for much more severe penalties than using too small an f* and under-betting. An added benefit of the partial Kelly strategy is that as we try to apply these concepts in the real world, our estimation error will often overstate our edge (for reasons such as reversion to the mean), and so what we estimate as our critical betting ratio fc may well be shifted further left along the f axis than we expect. It is even possible in games with insufficient information the actual fc could fall left of where we estimate f*, meaning our estimated f* would actually result in a negative expected growth rate, and almost certain ruin! Thus the partial Kelly strategy also acts as a safety factor to mitigate estimation errors.

Marble Statue of a Lion, Greek, ca. 400–390 BC

Our simplest example of a weighted coin toss game can be extended to demonstrate several concepts that will prove useful as we approach applications in investing. We can introduce betting odds making for uneven distribution of winnings vs bet size. We may find opportunities to split our bets among multiple coins of different weightings. We may even find that the outcome of these different coin tosses are not fully independent. We can derive solutions to each of these problems such as finding a set of partial betting ratios for each of the set of weighted coins f = [f1, f2, … , fi] which collectively constitute a Kelly strategy, or perhaps incorporating correlations or covariance between outcomes in the derivation of this same set f.

Grateful Dead — Keep Your Day Job

Having established a lot of ground work in our simpler examples, we’ll now turn our attention to a somewhat bigger game, Wall Street investments. There are some fundamental differences that come into play, including the use of continuous instead of discreet probability distributions (i.e. stock prices continuously fluctuate unlike the discrete bets of a coin toss). We can allow ourselves betting ratios f that fall outside of the range 0<f<1, where f<0 would represent selling a security short and a f>1 buying on margin. For purposes of analysis we’ll make some simplifications that may deviate from practice, such as infinitely divisible bet sizes, negligible transaction costs, and a tax sheltered account. We’ll assume availability of a “risk-free” asset which is in practice approximated by US treasuries. We’ll assume that asset prices fluctuate continuously without sudden jumps (which in this age of high frequency trading fueled flash crashes may be a suspect component here for those operating on the longer time scales of a retail channel). There will be some interest premium to the risk free rate associated with margin purchases, as this premium increases the criterion’s appetite for margin purchases will of course decline. If we do incorporate a tax rate, we’ll find that states with higher taxes lead to higher margin recommendations. In practice there will be limits to the degree of margin trades available which may be exceeded by Kelly recommendations. Because the stock prices are continuously fluctuating, we’ll assume that we are in parallel scaling up and down our bet sizes after updates based on market conditions. We’ll also assume that we are continuously rebalancing portfolios as asset prices fall out of sync which in practice would likely only take place periodically. We’ll assume that for securities considered for our portfolio we have the ability to forecast estimates of future return profiles including mean, variance, and correlation — this particular assumption is surely the most suspect of all! Anyway I’ve tried to be thorough here for assumptions and simplifications of the following example, there may be some that I’ve overlooked. Clearly a lot of knobs and dials to keep track of.

Now that we’ve got all of our basis of evaluation under way, let’s take a look at an example presented by Thorp for allocations between a limited selection of securities, which I found highly illustrative and even hinted at some fascinating implications for margin constrained portfolios that I did not see discussed elsewhere in my (admittedly limited) literature review — more on that to follow. Now Thorp’s example presumes that an investor has already decided as a prior to limit his allocations to three options: Berkshire Hathaway (BRK) , a fictional Biotech company (BTIM), and a S&P500 index fund (SP500) — along with a fourth option for a “risk-free” asset of US treasuries (T-bills) which when shorted in a portfolio serves as a proxy for margin debt. The example also presents return profiles for these securities including mean, standard deviation, and correlations.

Source: The Kelly Criterion in Blackjack Sports Betting, and the Stock Market by Edward Thorp

I found this example particularly interesting because it gave an opportunity to validate a comparable Wolfram Language demonstration implementation of Kelly for portfolio allocations that is freely available through the Wolfram Cloud: [Link]. All that’s needed to access and interact with this model and its source code is a (free!) Wolfram Cloud account.

Now Thorp’s model demonstrated optimal portfolio weightings for three scenarios: one without borrowing, one with a 50% cap on margin, and finally one with unrestricted borrowing.

The Wolfram model appears to be based on the scenario of unrestricted lending (although my output didn’t perfectly match the paper’s results it came pretty close). Near as I can tell if one was going to update this Wolfram model to achieve the other two margin constrained scenarios you would need to introduce the margin constraint q from equation 8.3 of Thorp’s paper (which if anyone wants to take a crack at this let me know I’d be interested to see your results).

The reason this particular point about allocation scenarios for constrained margin lending is so interesting to me (as I have hinted to previously in this paper) is that it appears to describe a superior approach than applying the partial Kelly strategy. According to the partial Kelly strategy, if we wanted to de-risk or trade some expected return of our portfolio for a reduction in variability all we would do is apply a fractional multiplier to the allocations of the unconstrained lending scenario. What the paper’s table with the three allocation scenarios is telling us is something different. Instead of just scaling down the unconstrained scenario’s holdings of the three securities BRK, BTIM, and SP500 it instead completely changes the ratios — in fact dropping the SP500 fund from the portfolio altogether. Note this is the same SP500 fund that was estimated to have the lowest standard deviation of the three. So I’m interpreting this model to say that outside of margin lending scenarios an optimal holding avoids a diversified index fund in favor of riskier securities with higher expected returns coupled with much higher variance. No wonder people still want some margin of safety with a partial Kelly multiplier — optimal Kelly can be quite aggressive!

So we’ve demonstrated that the linked Wolfram Language implementation of Kelly works for duplicating inputs from the examples given in Thorp’s paper, now how about if we want to evaluate our own security allocations based on present market conditions? I don’t have a Bloomberg terminal so am not sure what is out there for the professionals, but for the retail crowd I know of at least one free(!) option for sourcing values of security return profiles of the type that this model calls for (i.e. mean, variance, and correlation of returns), and that free option is coincidentally the same powerful tool that provided us with a Kelly implementation in the first place — the Wolfram Language. Here are a few demonstration code samples of how rudimentary versions of these values can be derived from historical data which is freely available in the Wolfram Cloud. Of course we should never forget that past performance is no guarantee of future returns, those funds that outperform will likely revert to the mean and historical data can never predict industry shifts and disruptions. But at a minimum if we want to experiment with the Kelly model these derivations could prove useful. The following code snippets are presented in the Wolfram Language, and are based on an arbitrary 2017 calendar year of daily historical returns — note that in Thorp’s example he uses a 63 month evaluation period and uses monthly data instead of daily.

#Standard Deviation of annual returns for AAPL stock:
StandardDeviation[(FinancialData["AAPL","Close",{DateObject[{2017, 1, 2}],DateObject[{2017, 12,29}]}, "Value"]-FinancialData["AAPL","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"])/FinancialData["AAPL","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"]]*Sqrt[251]#Mean annual returns for AAPL stock
Mean[(FinancialData["AAPL","Close",{DateObject[{2017, 1, 2}],DateObject[{2017, 12,29}]}, "Value"]-FinancialData["AAPL","Close",{DateObject[{2016, 1,1}],DateObject[{2016, 12,29}]}, "Value"])/FinancialData["AAPL","Close",{DateObject[{2016, 1,1}],DateObject[{2016, 12,29}]}, "Value"]]#Correlation of returns between AAPL stock and S&P500 index ETF VOO
Correlation[(FinancialData["AAPL","Close",{DateObject[{2017, 1, 2}],DateObject[{2017, 12,29}]}, "Value"]-FinancialData["AAPL","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"])/FinancialData["AAPL","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"], (FinancialData["NYSE:VOO","Close",{DateObject[{2017, 1, 2}],DateObject[{2017, 12,29}]}, "Value"]-FinancialData["NYSE:VOO","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"])/FinancialData["NYSE:VOO","Close",{DateObject[{2016, 12,30}],DateObject[{2017, 12,28}]}, "Value"]]

Note that in the derivation of annual standard deviation we multiply by sqrt(251). The 251 represents the number of trading days in 2017, and it turns out this number isn’t a constant for a given range of calendar days since weekend days fall differently, leap years, etc (most years have 252). One suggestion I would offer to some of those micromanagers from Wolfram Research if you are looking for any feature requests that could make your product more user friendly for the investment crowd: you could consider allowing us to interact with date ranges using units of (NYSE) trading days instead of calendar days (I couldn’t find anything equivalent in the natural language box interface). Cheers.

Bob Dylan — Maggie’s Farm

Anytime you are forecasting the future you are going to have uncertainty and estimation error, in fact even your estimate of estimation error will have estimation error and etc recursively — in most cases you’re better off focusing on your exposure of benefit and harm from ranges of volatility in a variable rather than trying to predict any specific value (for more on this check out Nassim Taleb’s book Antifragile). But in the case of the stock market it’s possible that we may even be trying to forecast a standard deviation value that doesn’t exist!

Lion felling a bull, Greek ca. 525–500 BC

I believe it was Benoit Mandelbrot in his book The Fractal Geometry of Nature that first popularized the problem of infinite variance in economic returns, which was based on an evaluation of cotton prices of all things. I’ve seen some more recent work attempting to fit stock market returns to the Pareto-Levy distribution (a power law distribution), and in the process deriving a power law exponent parameter (alpha) of 2.85. This value is significant for our discussions because power laws have the peculiar property that as the alpha parameter gets smaller the influence of the tail is such that some of our statistical measures of distribution lose tractability. Thus if we believe that stock market returns really are part of a power law with alpha = 2.85, we can only draw the conclusion that a standard deviation simply does not exist! (Some of the literature on this point is a little confusing, especially as physicists may use different notations than mathematicians etc.)

Source: David Feldman’s Santa Fe Institute Complexity Explorer MOOC on Fractals and Scaling

In a betting game with imperfect information, every new toss of the coin tells you just a little bit more about any potential weightings and distribution of payoffs. While a single throw won’t tell you much, over time as the data accumulates you learn more about a system’s behavior. What does a short term fall in the price of a security tell you? Value investors would say the stock is now a better bargain. Momentum investors might say something different. For a perfectly informed agent, the value story seems the only prudent one. For one acting on incomplete information, I think you need to consider the possibility that a material move, especially one counter to broader market trends, is a signal that there is something missing from your model, and the further removed an asset is from fundamentals and future cash flows (cryptocurrencies anyone?) the more weight should be put on this consideration — at least that’s how I look at it. Some may tell you that the stock market is a memoryless system simply following Brownian motion, that what has happened in the past has no bearing on the present, but I have seen others characterize the market as fractal:

“Another aspect of the real world tackled by fractal finance is that markets keep the memory of past moves, particularly of volatile days, and act according to such memory. Volatility breeds volatility; it comes in clusters and lumps.”
- Mandelbrot and Taleb

Perhaps the power of a tool like Kelly criterion lies not in the precise derivation of allocations between specific funds, but instead by giving us hints at improved approaches to acting and reacting in conditions of imperfect information — conditions that will always be present any time we are trying to peer into the future. Thus this GPS can not reliably tell you exactly where to drive your car, but at least it gives you a general idea of a more prudent directions to follow given a constrained view of the road ahead.

Paul Simon — Rewrite

*For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.

Books that were referenced here or otherwise inspired this post:

Antifragile — Nassim Taleb

A Man for All Markets — Edward Thorp

An Elementary Introduction to the Wolfram Language — Stephen Wolfram

(As an Amazon Associate I earn from qualifying purchases.)

Albums that were referenced here or otherwise inspired this post:

Workingman’s Dead — Grateful Dead

Bringing It All Back Home — Bob Dylan

(As an Amazon Associate I earn from qualifying purchases.)

For more from Edward Thorp there is a recently published autobiography that I’m sure is worthwhile read and is certainly on my list: A Man for All Markets

If you enjoyed or got some value from this post feel free say hello. I’m on twitter at @_NicT_ and have been blogging on Medium for a while now on a range of subjects from computer science to poetry and a lot in between.

For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.