Randomness, “Structure” and Event History

Published in

Roots and Branches

11 min readFeb 3, 2016

I’m busy with the new semester, so this is going to be a fragment. Also, I haven’t had time to do more translations for my forest work, so its a new interpretation of some old data…Oh well…

Take a look at these two graphs. One of them is random. One is based on historical data. Can you tell which is which?

Before I give away the answer, I’ll give a bit more information. One of these is generated from random draws from an exponential distribution with mean = 1. One of these is generated from a topic model of “crime”-related documents in the Qing Veritable Records (my published article on this material [firewall] and an earlier draft [academia.edu, sigh]).

For now, I’m just going to say that the topic model is based on the concurrence rates of words within documents as compared to the concurrence rates of words between documents. This is unsupervised (i.e. without research inputs prior to the model), and the results are fairly consistent with the trends that human experts find. But the topic model allows me to analyze almost 300,000 documents. For a more detailed explanation of how the topic model works, please see one of the above links.

Here is another set, one random, one based on historical data:

Here the random one is based on draws from a lognormal distribution with mean = 1, the non-random one is based on the rate of “grand canal”-related documents in the Qing Veritable Records.

If you haven’t guessed by this point, the blue ones are random, and the red ones are based on historical data. The point is, you might be able to tell a slight difference between the results of real-world data, and the random draws — especially in the second set of graphs — but probably have to admit that they look almost identical. I suspect — especially in the second case — that with further tweaking I could get essentially identical-looking graphs.

So what can we takeaway from this observation?

First, if we zoom out enough, to include enough individual events, history looks random.

Second, there are different kinds of randomness, and that different kinds of historical event look like different kinds of randomness. Perhaps more properly, what Benoit Mandelbrot called different “states of randomness”.

States of Randomness

Benoit Mandelbrot, the genius mathematician who essentially invented fractal geometry, described three “states of randomness” — later expanded to seven.

Mild randomness follows the normal distribution. The normal distribution describes certain real-world phenomena, like height, IQ and grades. Intuitively, it means that most observations are clustered “close to” the mean. If the average height for adult males is 5'10", we can expect that most men will be within +/- 5 inches. We’ll occasionally see a man over 7 feet or under 5 feet, but (with the exception of certain genetic or hormonal conditions) we will essentially never see an adult male over 8 feet or under 4 feet. Extreme heights are still relatively “close” to average.

Another feature of mild randomness is that random draws can be modeled by random variables that are independent and identically distributed ( abbreviated as i.i.d.). If you flip a coin and it comes up heads, that does not change the probability of heads on the second flip — the two trials are independent. Furthermore the chances of heads on your first flip and your twentieth flip are the same — 50%; they are identically distributed. We may still see “runs” of heads, or “hot hands,” but these are normal features of randomness, even mild randomness. And while the chance of each flip is identical, the chance of observing a run of a given length decreases exponentially. The chance of a run of 3 heads is 1/8 — it’s something we’ll see frequently. The chance of a run of 6 heads is 1/64 — it’s something we’ll see occasionally. The chance of a run of 12 heads is less than 1 in four-thousand. And so on.

Mild randomness is the way most people think about randomness intuitively, and it is the type of randomness that most statistics is based on.

But as Mandelbrot pointed out, there are many phenomena that are not well-represented by mild randomness.

We can get at this in two ways.

First, there are some phenomena where extreme events are not “close” to the mean. The average annual flooding of a river does not give us a sense of the largest floods the same way that average height gives us a sense of the greatest height. Where the max height is nowhere near twice the average height, the max flood could be orders of magnitude larger than the average flood. These are sometimes called heavy-tailed distributions.

Second, to model these states of randomness you should not think in terms of i.i.d. Unlike a coin flip, where the chances of the first, second and twentieth heads are independent, and are all 50% — with these types of phenomena, the outcome of the first “draw” effects the outcome of the second “draw,” and even the outcome of the twentieth “draw.” They are neither independent nor identically distributed.

Mandelbrot originally demarcated two states of non-mild randomness — slow randomness and wild randomness. The math is hard, but intuitively it works like this:

Slow randomness means that extremes are “less close” to the average, but the average still places some limits on how extreme they get.

Wild randomness means that extremes are essentially unbounded — extreme values may be rare, but the mean value gives us essentially no prediction of how extreme values can get.

Later Mandelbrot added additional states of randomness, yielding seven [proper] mild randomness (normal distribution), borderline mild randomness (exponential distribution); two forms of slow randomness (lognormal distribution); pre-wild, wild randomness and extreme randomness(Pareto distribution).

Ok, but what does this all mean for the historical data?

Structure and Event

Let’s start with “crime.” The instance of crime, or at least crime-reporting (the first red graph), looks a lot like draws from an exponential distribution (the first blue graph). That means that “crime” looks a lot like our intuitive sense of “random” — which I argue really means something more like “independent.” The occurrence of a crime at one place is time is almost independent of the occurrence of a crime at another place and time, but not quite.

In Manslaughter Markets and Moral Economy, Thomas Buoye claims that murders are essentially like this — they are “random” events that are essentially unrelated to one another, although they are not quite identically distributed — some places and times have more murders than others. Buoye argues that murder rates can be used as an index of underlying social tensions — social tensions make murders more likely, but each individual murder case is essentially unrelated to each other individual case. Critics of Buoye pointed out that this fails to account for feuding — instances where one murder does make future murders more likely.

I think this tension — murders are almost independent of one another, but not quite — neatly describes the intuition of borderline mild randomness. And I think that this neatly supports Buoye’s broader argument — that the murder rate is essentially structural — it is an index of the overall conditions of society in a place and time.

For what it’s worth, the majority of topics in the Veritable Records appear to have essentially borderline mild randomness. They represent hundreds or thousands of similar events that are almost independent of one another, but not quite. Describing individual events may be interesting, but it is not particularly meaningful. It is far more useful (at least to my mind) to look for patterns in these events that point to historical structures.

There is a second set of topics in the Veritable Records that look to exhibit something more like slow randomness. The topic “grand canal” (the second red graph above) is like this, which makes sense. Flooding, the major concern of grand canal administrators, is now generally modeled with lognormal distributions (the second blue graph).

I have reason to believe that artifacts of my model may actually suppress extremes. I suspect that certain types of events may be better modeled with pre-wild or even wild randomness. In particular, rebellion, warfare and natural disaster appear to exhibit wild randomness. What should this mean to us?

Two “rebellion” topics in the Veritable Records, note that the red one is essentially zero until the Taiping Rebellion, when it jumps as high as 40.

First, it means that these events matter more. Mathematically, this means that the occurrence of an event of this type changes the likelihood that it will occur again. That is true even of borderline mild randomness. But with slow or wild randomness, it matters because events can increase the likelihood of future events such that the magnitude of a short sequence of events exceeds the cumulative magnitude of all previous events of that type.

A “heads” does not make future coins more likely to land on “heads.” This is mild randomness.

A murder may make future murders more likely, but it is still unlikely that the annual murder rate exceeds all previous annual murder rates. This is borderline mild randomness.

A flood makes future flooding more likely, and makes it increasingly likely that the annual flooding exceeds previous flood totals. But it is still unlikely that the annual flooding exceeds the sum of all previous floods. This is slow randomness.

A rebellion makes future rebellions more likely. It makes it increasingly likely that the size of the rebellion exceeds the size of any previous rebellions. It even makes it possible for the size of the rebellion to exceed the total of all prior rebellions! This is wild randomness.

Historians of China have long focused on rebellion, invasion and natural disaster, and they are right to do so. These are the types of events that matter as events and not just as indexes of underlying social structures.

Second, it means that we should understand rebellion as “random” to the same degree that we understand murder as “random.” It is a different state of randomness, but it is still random. The difference is this: a “random” murder may start a small cascade of murders; a “random” act of rebellion may start a cascade of rebellions of historically unprecedented size. But we cannot predict or explain the event that touches off the cascade in either case. In both cases, the event that starts the chain reaction is effectively random.

There are structures that make murder more likely, or that make rebellion more likely. But these structures are so prevalent that they cannot possibly explain the murder, or the rebellion. And I think that trying to explain murder or rebellion by recourse to the specifics of the event, the psychology of the perpetrators, etc. is a fools game.

Let me try to explain this with an analogy.

We had to undergo emergency readiness training. Part of this training was on how to deal with depressed students for suicide-prevention purposes. Here is my (slightly sarcastic) summary of the training:

50% of students are depressed or anxious.
10% of students have had suicidal thoughts.
Most depression, anxiety and suicidal thoughts are unreported.
Students who attempt suicide are often those with a history of depression, anxiety and/or suicidal thoughts.

So a small but non-zero percentage of students attempt suicide. History of depression and suicidal thoughts is somewhat predictive of suicide attempts. But not all students with depression/suicidal thoughts attempt suicide, and some students who attempt suicide have no history of suicidal tendencies.

This means that at least 10% of the student body is considered high risk, and probably upwards of 50% of the student body is considered moderate risk. But only a small percentage of them will attempt suicide, and thankfully only a very small percentage of them will succeed.

This is nonetheless a serious problem. Can we make effective interventions that target half of the student population? Can we even make effective interventions that target 10% of them? I would argue that individual interventions are impossible at this level, we can only be effective by changing the structural dynamics.

The risk factors for murder, rebellion, etc. are similar to the risk factors for suicide, at least in terms of their prevalence. The Qing recognized that debts were the greatest risk factor for murder, and that conflicts over land and marriages were also major risk factors. And probably half of the population had some form of debt, and a non-trivial percentage was involved in active conflicts over debt, marriage, or land. Yet only a very small percentage of debtors committed murder.

Historians have noted that most rebellions were led by failed exam candidates, and were filled out with indebted or landless peasants. The populations of indebted peasants and failed scholars were both huge. Yet a vanishingly small proportion of them started rebellion….

Or did they?

Here is where rebellion is different. Under ordinary circumstances, very few failed scholars started rebellions. But once one failed scholar (perhaps named Hong Xiuquan) started a rebellion, other failed scholars became more likely to rebel — often by joining the rebel army of the first guy. And as the rebellion grew, it became more possible that it would grow further. This is how we get the Taiping Rebellion — a bigger conflict than the sum of all previous Qing conflicts. The chance of this rebellion breaking out from any individual case was vanishingly small, but once it did break out, the chance of it spreading grew, and continued to grow.

So I think it is borderline-useless to attribute the magnitude of the Taiping Rebellion to Hong Xiuquan or his immediate circumstances. There were hundreds of thousands of potential Hongs, most of who decided that they were better off not rebelling, even though they were angry and poor and possibly insane. There were thousands of Hongs who did rebel, sort-of, and became the leaders of bandit gangs or were quickly flushed out by Qing or gentry forces. The potential was present for each of these rebellions to grow, but the fact that some did and some did not was effectively random, or at least a convergence of events and tendencies external to the rebels.

That is not to say that we should ignore these kinds of events. They may be unpredictable and inexplicable in their specifics, but their potentiality is both predictable and subject to explanation. We should also pay particular attention to the outcomes of singular events, because they are generally the situations where we are most likely to see a shift in the structures of history. The best works on rebellion in China, including Kuhn’s Rebellion and Its Enemies, Perry’s Rebels and Revolutionaries in North China, Ownby’s Brotherhoods and Secret Societies, Averil’s Revolution in the Highlands, Crossley’s The Wobbling Pivot — these all understand these basic point. We should not be overly concerned with the proximate events leading up to rebellion, except for their value in adding a sense of drama and historicity. We should instead be concerned with the conditions that made rebellion possible, and the shift in social structures following rebellion and reconstruction.

I am essentially arguing for a more careful understanding of the role of “black swans” in history. I have very mixed feelings about Nassim Nicholas Taleb, perhaps worthy of another post. But Taleb’s work on black swans is very much an outgrowth of Mandelbrot’s work on states of randomness and fractals (in fact they worked together on a few projects), and Taleb’s understanding of the math is sound. As noted above, this is an understanding that historians have come to through intuition repeatedly. But I do think it is worth seeing what we can learn about both event and structure in history by thinking through randomness.

Randomness, “Structure” and Event History

States of Randomness

Structure and Event

Written by Ian Matthew Miller