A Game that we cannot “Game,” or Algorithms in RL

Published in

Roots and Branches

12 min readJan 7, 2016

There is growing concern about the problems that “Big Data” poses to free, democratic society, and rightly so.

There are all kinds of problems caused by bad data, and by the misuse of data. Biases in the data, measuring the “wrong things,” lack of granularity, lack of statistical significance, small effect sizes, lack of access to data for key stakeholders, reification of statistical categories, “juicing the numbers,” all of these can lead to incorrect conclusions and harmful interventions.

To give a few chilling examples:

Data bias: It is well known that there are huge racial disparities in “stop-and-frisk” policing. Simply put, police stop people of color with much greater frequency than they stop whites. More chillingly, these policies have the effect of perpetuating racial disparities in policing by generating more data about crimes perpetuated by people of color. Commentators (generally from the right) justify the frequent stops of blacks and hispanics by citing higher instances of criminal behavior among these populations. However, the data that shows higher instances of criminal behavior among blacks and hispanics were themselves generated by biased police work.

Reification of statistical categories: The Rwandan civil war was largely an ethnic conflict between Hutus and Tutsis. Yet “Hutu” and “Tutsi” were themselves ethnic categories created by colonial powers based on unproven theories of racial origin. In 1935, Belgium gave identity cards marking each individual with an ethnicity. This effectively created ethnic identities that had not formerly existed, at least not in such a discrete and fixed form. Yet these identities based in census categories were at the heart of a conflict that killed approximately 800,000 people.

“Juicing the numbers”: During the Great Leap Forward in China, local officials were rewarded for raising quotas for grain production. Rewards were parceled out based on these quotas, which created a culture of artificially inflating grain production figures. When the central government actually collected grain in accordance with the inflated figures, entire regions were left with massive food shortages. The result: about 30 million people died of famine during three years of relatively normal grain production (i.e. not effected by major natural disasters) as the central government exported “surplus” grain to Eastern Europe.

This should make the point that numbers matter. Improper use of statistics, especially statistics based on flawed data, can cause major problems, even war and famine.

Even in more mundane circumstance, the problem with bad data is the general principle that “what gets measured gets done” (generally because “what gets measured gets rewarded”).

Let’s take dieting as an example. For generations most diets have revolved around calories, not because calorie-counts are a particularly good index of successful diets, but because calories are easy to count.

But wait, should you be on a diet anyway?

For a decade or so, Body Mass Index (BMI) has become a sina qua non of preventative medicine, on the principle that people with high BMI (that is who are heavy for their height) are “too fat” and need to loose weight or experience higher rates of morbidity and mortality than the “normal weight” population. This is despite problems with BMI itself (it is a very crude figure that groups both “muscular” and “fat” people as “overweight”), and despite growing evidence that BMI is about as worthless a figure for predicting health outcomes as…well…calories is for predicting weight loss. But again, BMI is easy to measure.

In short, weight management still largely centers around two figures that are easy to measure, but that have little correlation with the actual desired outcomes: calories do not work very well to predict weight loss (or weight gain), and weight (or BMI) is not a very good predictor of health. But what gets measured gets done.

Or take basketball: Until the past ten years, the best basketball players were considered to be those who produced the most points, rebounds, assists, etc. Players who produced lots of points (etc.) were therefore the highest paid (“what gets measured gets rewarded”). The recent “metrics” revolution has turned much of that on its head, showing that some players who produce tons of points do not actually win very often, and that others who do not put up “big stats” according to traditional measures tend to win a lot. In the past few years, players have started being paid based on new statistics that are supposed to be better measures of the actual desired outcome — winning. (Note that scoring does matter to winning, so some players produce lots of points AND tend to win often. The big shift is starting to be seen in rising pay for players who don’t put up “big stats” but who nonetheless contribute to many wins, and the declining pay for players who do put up “big stats” in generally losing efforts).

This seems to be an example where transitioning to statistics that better measure the desired outcome (winning) as opposed to just reporting something easy to measure (points) has revolutionized an industry. This is a big win for Big Data — by collecting and analyzing lots of data, statisticians have been able to generate complex statistics that better predict outcomes than the older easy-to-measure statistics like points and rebounds. It is especially noteworthy that the first advances in basketball metrics came not from new statistics, but from advances in the ways of using the old easy-to-measure statistics (points, rebounds, etc). In other words, both the quality and the interpretation of data matters.

Statistics can be useful, but they are only as useful as they are good at measuring what they claim to measure. Effective use of good data can lead to useful interventions (paying the players who most contribute to winning, even if they don’t score much). Bad data tends to lead to ineffective interventions (counting calories to try to lose weight). And misuse of bad data can lead to disasters (genocide, famine).

For a long time, I thought the issues with “Big Data” were mostly…well…data issues like these. But I recently realized that Big Data brings a whole host of other potential problems. Because Big Data is not just about more data. Big Data is about so much data that humans cannot process it all. So to process all this data we rely on algorithms. And I’m starting to worry that bad algorithms are a bigger worry than bad data. If we reconsider the examples I gave above, each one is a case of bad data exacerbated by bad interpretation. It was not just inflated harvest figures that created the Great Leap Forward, it also took bureaucrats willing to take these data at face value, and to use them to advance flawed industrial policy.

But this becomes especially worrisome when algorithms are used to process the data, especially when the algorithms are secret. It essentially comes down to this: when people process bad data, there is still the chance that they exercise human judgement, potentially avoiding the worst outcomes. There is even the chance that they find ways to make flawed statistics useful, as in the basketball example. Nate Silver is great at this. I have my doubts about how often this actually happens, especially given how bad most people are at understanding statistics, but there is the possibility. But if you have algorithms (even good ones) processing bad data, you lose the possibility for human intervention. And if you have bad algorithms processing data (even good data)… And if you have secret, bad algorithms…

If you want to really understand the problems posed by misuse of secret algorithms, I suggest reading or watching a preview Cathy O’Neil’s forthcoming work on “Weapons of Math Destruction,” which is what started me thinking about this. Cathy O’Neil’s work discusses the chilling effects that algorithms are bringing to the real world, where secret, inaccessible, poorly written programs are increasingly determining things like access to financial products (insurance, loans), healthcare, criminal justice (policing and sentencing), and work (promotion, salaries, hiring, firing).

I don’t have the expertise or research to talk about real life using data from RL. Instead, I want to talk about bad algorithms by talking about a part of the world that is already run exclusively by algorithms: video games.

I used to game a lot, then I gamed a little, and now I basically don’t game at all. There are multiple reasons for this, including having a child and a full-time job. But part of the reason I gave up games is that I didn’t like them very much any more. I came to dislike many games through instances where the “cracks” were showing — that is, cases where the game did not behave the way it “should” have behaved given the story it was trying to tell or the scenario it was presenting. These are essentially cases where the actual “rules” of the game (those coded into its software) differ markedly from the superficial “rules” of the game (those presented by its scenario).

I’ll give a few examples.

For a while, I played Europa Universalis III compulsively. I am (or was) generally a fan of “empire-building” games, and EU3 is an especially complex example of this genre. In effect, you can choose to play almost any state at any time between about 1400 and the early 1800s, and play through a relatively historically-accurate scenario. There is no “winning” the game, you get to set your own goals. But most people play by choosing a European state like Castile or England, and trying to create a colonial empire outside of Europe while fighting in a never-ending string of European wars. Not everyone’s cup of tea, but certainly my cup of coffee.

From the get-go I had a few issues with the historicity of the game, in particular the treatment of non-European states (for example, the provincial divisions of China and their resource allocations are almost nonsensical). But as long as I stuck to playing as a European state, I found the game pretty fun. For reasons not entirely clear, I am totally obsessed with the Lotharingian Inheritance. So I particularly enjoyed playing as Burgundy and trying to create an alternative history where Charles the Bold did not die in a ridiculous battle against the Swiss at Nancy, but lived to maintain his regional powerhouse in the Low Countries; and where his successors went on to colonize Canada.

I also disliked the way the game would tend to follow teleologies of modern nations by tending to force the creation of unified France, Germany, Italy, etc. along fixed lines, even though the modern boundaries of these states were highly contingent on a whole slew of historical accidents. This would tend to force me to turn my growing Duchy of Burgundy into either “France” or “the Netherlands” at a certain point in the game, swamping my vision of a world where “the sun never sets on the Burgundian Empire.” Nonetheless, EU3 embedded tons of historical details and, as long as I stuck to playing a European state, allowed me to play out various semi-convincing alternative histories.

Then I came up against a bad algorithm. The game effectively does not allow junior partners in wartime alliances to formalize their territorial gains at peace negotiations. No matter what. If this sounds really obscure and uninteresting, please skip the next two paragraphs:

<obscure specifics>Note that the game defines “senior partner” as the party that started the war, and “junior partner” as any other state that joined the alliance once the war was already underway (A hidden rule that I had to figure out from the way the game worked). I can see why the game was designed this way: to prevent minor powers (the Duchy of Hainault) from jumping into alliances with major powers (France), and gaining territories that they had no role in conquering. But this algorithm leads to bad outcomes — outcomes that effect both historicity and gameplay. For example, England starts a war with France and asks me (the Duke of Burgundy) to join him. England spends the entire war doing very little, perhaps fighting a few naval battles in the English Channel and occupying Picardie. In the meantime, I have fought a dozen massive battles in eastern France, captured four provinces (Limosin, Champagne, Metz, Valenciennes) and, having laid siege to Paris, bring the French to the negotiation table. The game will have France cede territory (Picardie) to England (the “senior” partner who did next to nothing) ending the war and giving me (the “junior partner” who did almost everything) nothing.

I opened a saved game and tried again. And again. I tried opening peace negotiations earlier in the war, to force France to cede me territory before they settled with England. They would unconditionally refuse, until the hidden algorithm determined a “breaking point” in the warfare, at which point — before I had a chance to act — England gets Picardie, Burgundy gets nothing. Again, and again, and again.</obscure specifics>

To summarize, by embedding a set of “rules” into the software that violated my sense of the actual “rules” of history, and the contextual “rules” of the game scenario, a bad algorithm broke the immersive experience. It broke both the historicity and the gameplay. I tried different scenarios, and again the problem would pop up. I stopped playing.

The lack of historicity with China (my area of specialty) was not enough to stop me from playing the game. The teleologies that forced Burgundy to become either “the Netherlands” or “France” was not enough to stop me from playing the game. The encoded rules that violated my sense of the “true rules” of the scenario were what made me stop playing.

It is not just Europa Universalis. Whenever I encounter a lack of consistency, fairness, balance, that — and not issues with storytelling, historicity, etc. — is what makes me quit playing (that or my work and family life).

I should note that there are people who make a game out of finding exactly these types of faults and exploiting them. I have recently become fascinated with people trying to “break” Fallout 4. One player managed to beat the game without killing anyone, in a game that really wants you to kill people, leading to all kinds of weirdness. Another guy killed every single character in Fallout 3, the previous game in the series that — by contrast — emphasized non-violent solutions. Again causing all kinds of weirdness. Another player beat what is generally a 40+ hour game in just over an hour by exploiting places where the game “breaks.”

All of these are examples of people finding ways to reveal the cracks in these games, to reveal the actual rules of the game. Once they throw out the superficial “rules” that the game seems to play by according to the scenario it presents, and find the underlying “rules” that are coded into the games’ actual software, they can play an entirely different game. So there are ways around “bad algorithms” — to exploit them, to use them to do something new and creative. To “game” the game.

Let’s return to real life. There are bad algorithms here as well. Lots of them. Secret ones. In some cases, people have been able to reverse engineer aspects of these algorithms and exploit them. Northeastern notoriously did this with the college-ranking algorithm. This type of “gaming” the rankings seems like a bad thing. Initially, it made me angry at institutions like Northeastern. But remember the principles I cited above: “what gets measured gets done” and “what gets measured gets rewarded.” The underlying problem is not the fact that Northeastern (and, let’s be frank, most colleges) are gaming the rankings, but the fact that the rankings are based on a bad algorithm. And that bad algorithm is running bad data.

The thing is that finding exploits, the places where the underlying algorithms show through, takes a lot of time and a lot of repetitions. The “kill everything” play-through that revealed interesting things about Fallout 3 took seven months. Finding exploits to these games also takes a lot of “lives.” The “no-kills” play-through of Fallout 4 took dozens, hundreds, even thousands of in-game “deaths” to figure out the necessary work-arounds.

Sometimes we can “put in the reps” in RL as well. We can apply to dozens or hundreds of jobs, and see what tends to get us interviews. We can learn from other people’s job application experiences. But there are cases where we don’t really get a “do-over.” When there is a bad algorithm that effects our chances of incarceration, that affects our healthcare or our pay, we don’t have the possibility of loading a saved game. We cannot live another “life” to figure out how to avoid being racially profiled and thrown in jail, or to avoid having our benefits cut.

This is how the world works now — increasingly governed by bad algorithms. Often bad, secret algorithms. Often bad, secret algorithms that use bad data — biased, reified data that doesn’t actually correlate with the desired outcomes — because it is what is available. Often bad, secret algorithms that we cannot “game” because we don’t have multiple lives and autosaves.

I have come to a new conclusion: bad data is bad; bad algorithms are worse.

And a corollary: as bad as it is that some people and institutions are hacking some of the algorithms that govern RL, it is even worse that there are many algorithms we cannot hack. Not only is real life being turned into a game, it is being turned into a game that we cannot “game.”

A Game that we cannot “Game,” or Algorithms in RL

Written by Ian Matthew Miller