Biffure #1: Hearthstone, Verhulst & exponential functions

Cédric Bellet
Biffures
Published in
7 min readApr 5, 2021

“Biffures” is French for crossing-outs; these short articles are light explorations, less involved discussions of topics that don’t form a particular sequence. If you are interested in more structured content, check out the Bitwise or the ML series.

What is the probability of winning 12 games in a Hearthstone arena or duels run? Is the limit of 12 wins anchored in some interesting math?

There are several ways to think about this (in Biffure #2 certainly), and one of them will involve make assumptions about the distribution of deck and player strength. Naturally, using the normal distribution came to mind, though I am not sure why, apart from normal distributions sounding familiar and credible based on recollections from university. But is it actually fair to assume that strength would follow a normal distribution? Is there a mathematical reason why this would work? Unclear from reading different sources.

It turns out even human heights, the classic empirical illustration of a normal distribution, does not really follow a normal distribution — though it is well approximated by it. So if it is about approximating distribution shapes as opposed to matching a mathematical reality, surely alternatives to the normal distribution should be considered? The first Google result, an excerpt from a 1968 The American Statistician article, suggests that the logistic distribution is “the most well-known alternative to the normal distribution”

Logistic function

So, what is a logistic distribution? According to Wikipedia, it is a distribution whose cumulative distribution function is the logistic function.

And what is a logistic function? Though popularised in the 20th century, the logistic function was first named and discussed in the first half of the 19th century by Verhulst, a Belgian mathematician, who was attempting to model population growth in Belgium, France and England. Verhulst proposed the logistic function as a better alternative to Malthus’s 1798 exponential model (known then as a logarithmic model), which in his view only fit countries such as the United States with exceptionally large resources relative to population. By contrast, Verhulst’s model was supposed to take into account the population’s perception of limited resources, “depuis le moment où la difficulté de trouver des bonnes terres se fait sentir”.

Drawing of the logistic function in “Recherches mathématiques sur la loi d’accroissement de la population”, 1844 (retrieved in 2021 at this URL); Malthus’s logarithmic function grows to infinity whereas Verhult’s logistic function is bounded by a terminal, asymptotic population number.

Unlike Malthus’s “logarithmic” (i.e. exponential) curve, Verhulst’ logistic curve proposed a model of population growth that resulted in an asymptotic, maximum terminal population for countries based on their resources, which fit relatively well the empirical data that Verhulst could gather. Based on this, Verhulst determined a maximum population of 6M for Belgium and 40M for France — insightful forecasts even though they did not prove right in the long run as technology drastically changed the limit of our resources.

Verhulst predicts max number of “âmes” (souls) for France and Belgium based on his logistic model

In mathematical terms, Verhulst defines the logistic functions via a set of two equations, in which p stands for population, t for time, and all others can be thought of as constants:

Simplified, p can be expressed directly as a function t, with parameter a giving the amplitude of the curve and m the steepness of the logistic curve.

Logistic curve as per Verhulst’s formula, visualised in Desmos (link)

Logistic distribution

The logistic distribution is such that is cumulative distribution function is described by the logistic function:

Left: logistic distributions; right: logistic functions, which are the logistic density functions’s cumulative functions

How did we go from logistic function to logistic distribution? The origin lies, according to J.S. Cramer in “Origins of the logistic regression”, 2002 (retrieved in 2021 at this address), in the field of bio-assay, where the normal distribution was commonly used to understand deviations between stimulus and human response. Cramer relates that Joseph Berkson, an American statistician, challenged that use of the normal distribution from the 1940s until the 1980s, proposing instead to use the logistic distribution as a replacement, with arguments beyond my ability to summarise but detailed by Cramer (Berkson opposed maximum likelihood in favour of minimum Chi-squared, which somehow related better to the logistic approach). In practice, the logistic function and the normal cumulative distribution functions look similar and so do the distributions:

Normal cumulative distribution function (CDF) vs. logistic CDF (i.e. the logistic function), source: Enrique Pinzon, The Stata Blog, 2016
Probability density functions of a logistic (blue) and normal (orange) distribution tuned to match one another closely, source: John D. Cook’s blog, 2010; the logistic distribution has slightly heavier tails

In Berkson’s time, it was the “formidable power of [its] analytical properties” (through a simpler CDF) that gave the logistic model an advantage which allowed it to soar later on. Assuming less concern with analytical power and given computers to find numerical solutions, it is less clear whether either solution really is superior. In fact Enrique Penzon on the Stata blog argues in 2016 that choosing one or the other boils down to “a matter of habit or preference.”

In conclusion, Verhulst proposed in the 19th c. an innovative function to describe population growth, which in the unrelated field of bio-assay was found a century later to be a good cumulative distribution function for a probability distribution named after it; this mattered decades ago, but does not really anymore today. As far as modelling player strength distribution in Hearthstone, picking normal vs. logistic distribution should therefore be of little significance, and certainly of less concern than judging whether player strength indeed follows in general a symmetrical, bell-shaped distribution.

Exponentials

On a separate note, I was surprised to see that Verhulst uses “10^mt” in his logistic function, that is to say a power of 10, instead of the power of Euler’s number e that we see in all modern versions.

Left: modern presentation of the logistic function (Wikipedia); right: Verhulst’s logistic function

In fact, Verhulst also uses the power of 10 to describe a geometric progression, when common sense would have us use a power of r, the ratio of the geometric progression:

Common sense would instead have us write p = k * r^t for a geometric progression

It would take a historian to understand the preference for the use of powers of 10 instead of powers of Euler’s number e or of the geometric ratio r, but we can at least prove that the notations can be freely substituted by one another.

Nothing that for any real r:

and that specifically for r=10:

We can use the first equation to show that:

That is to say, that for any real r, there exists a real l such that:

This is true for any real r and therefore true for r = e:

This proves that Verhulst’s usage of powers of 10, though different in presentation, is identical in substance to our more familiar use of powers, thanks to Verhulst’s incorporation of the“undetermined constants” l, m in his exponents. We could actually say that Verhulst’s usage of powers of 10 is in fact equivalent to using the exponential function everywhere, in base 10, if we interpret the powers of 10 as a variant of the natural exponential function:

Top: definition of the “natural” exponential function; bottom: “base n” exponential function

Wikipedia notes that “since any exponential function can be written in terms of the natural exponential, it is computationally and conceptually convenient to reduce the study of exponential functions to this particular one” — and it seems Verhulst, for some reason, chose instead to express any exponential function in terms of the base 10 exponential, in an odd, perhaps justified, and in any case valid choice.

--

--