Gaming the System: Playing with in-game algorithms

One distinction between casual and competitive games is that the latter has some mechanism for players to comparatively evaluate their skills against those of other players. Ordinarily, to incentivize this self-evaluation, which encourages additional practicing and play, games present these comparative skills as objective scores, or rankings. Formalizing subjective player skill as an objective, usually numerical classification has certain additional benefits: players can feel inclusively situated in a broader game community or economy when their skill is listed in a leaderboard, and players can be better matched with other players of similar skill in a process called matchmaking.

A significant result of developing these ranking systems is that in-game ranks become a widely accepted, nearly commodified symbol of status. A player’s depicted rank becomes a representation and even prediction of their subjective, immeasurable competitive performance. To an extent, the depicted rank can come to matter more than actual skill and performance: it is inseparable from a player’s in-game identity, a social status symbol in all rights.

These ranks have more social implications in some games than in others. For example, whereas in Call of Duty: Modern Warfare, visible player ranking is determined largely by the amount of games spent playing, in Halo 3, visible ranks depend on the turnout, or outcome, of competitive games.* Both ranking systems are successful at indicating an approximation of a player’s past performance, and, of course, neither is a perfect representation of something so subjective as player skill. However, in outcome-based ranking systems like that of Halo 3’s, ranks are regarded somewhat more highly. The seemingly arbitrary numbers that represent rank—1 through 50—are expressive signifiers of each player’s skills, dedication, and in-game status.

* Halo 3 matchmaking includes both Ranked and Social game types, both of which influence visible player rank; however, only the former Ranked games increase the numerical player Skill. Call of Duty: Modern Warfare also involves leaderboards to view one’s global ranking of performance, but this ranking does not define a player’s visible rank.

When player ranking such as that in Halo 3 is determined by the outcome of competitive games against other players, as opposed to by counting games and actions played, some algorithm must be employed to determine how game outcome affects player skill. The design of this algorithm is important: if players are to care about their visible rank, the process for determining this rank must be legitimized in practice. It is also important for this algorithm to be robust: it should handle special cases in games that affect player outcome, so as to prevent cheating, feeling cheated, and gaming the system.

Introducing TrueSkill™

To this end, Microsoft Research (MSR) developed the TrueSkill™ Ranking System to best determine Halo 3 players’ rankings based on the outcome of competitive games. The impetus for its development, despite the availability of other ranking system algorithms like ELO (used for professional sports), was the need for a ranking system that could account for more than two players or teams, which is customary in Halo 3 games. However, this technical motivation is of little to no concern to Halo’s players; rather, what matters about this algorithm’s design to players is how it affects their ranking.

Since TrueSkill is responsible for determining player rankings in Halo 3—here on in called Skill—and players understandably care about their visible Skill, players begin to gradually understand how TrueSkill works as they play. This understanding is highly subjective, and never complete: whereas casual players might only understand that Skill is determined by game outcome, more serious players have acknowledged that Skill is a reflection of some additional factors. Both demographics acknowledge that Skill is determined by some hidden process, but depending on their valuation of Skill as an important or useful measurement, they develop an understanding of the underlying algorithm with varying levels of scrutiny and depth.

Understanding the Algorithm

Coming to an understanding of how TrueSkill ranks and matches players is not necessarily deliberate; in fact, it is usually subconscious, or emerges as a result of shared player behaviors and practices—especially those that exploit the TrueSkill system. The in-game practice of boosting is a particularly good example of this, for its widespread acknowledgement by Halo players and its traceable underlying intuition. To arrive at this exploitation, Halo 3 players realized that winning a game with a poorly ranked teammate caused ranking up to occur must faster (hence the name boosting). What was damaging about this insight, especially to the legitimacy of the TrueSkill system and its designers, was that players could repeatedly and deliberately leave matches to reduce their Skill, and then use their ruined Skill to boost their friend’s accounts—or the accounts of paying clients.

Generally, the algorithmic reasons behind why boosting works are not widely understood by the boosters themselves; merely, the system is exploited because it can be. As a result, the design of the algorithm, as well as the lack of compensating for its flaws, affects player behavior without them even knowing that the algorithm is there, much less how it works.

Another instance of players developing a tacit understanding of TrueSkill is more nuanced and less harmful: by participating in a significant number of competitive matches, Halo players begin to develop an ‘intuition’ of the way that the hidden ranking system works. After all, the ranking system is critical to players’ expectations about a game’s difficulty before it begins (i.e., seeing the competitors’ Skills before the match), and what will happen to their Skill depending on how they fare. This intuition is developed over time in an inductive fashion: players see that certain game outcomes have certain effects, both to their ranking and to the competitors they are matched with in following games.

To their ranking, the most obvious effect is that winning against players with much higher Skills is more likely to increase Skill, and vice versa. As an emergent result, when a player is matched against only players with higher Skills, they acknowledge the prospect of ranking up significantly but not ranking down—it is an important, ‘qualifying’ match. A more subtle effect is that the magnitude of the change in a player’s Skill slows down as they participate in more competitive games. All highly ranked players in Halo 3 are fully aware of this, and even if they are not sure why the slowing down occurs, they take it for granted as a fundamental component of the ranking process.

Exposing the Algorithm

In the wake of TrueSkill’s adoption into commercial games, MSR provided explanations of the ranking system’s underlying algorithms online. Why would they do so, when increased knowledge about the system’s hidden processes could lead to players discovering additional exploitations? Perhaps MSR knows that finding additional exploitations is unlikely; still, there must exist some incentive for the lab to publish their work online. The purpose and audience of this documentation is suggested best by its format: there is not only a simplified overview and a detail overview, but also an interactive tool.

The simplified overview lends a basic overview of the two variables involved in calculating Skill. This is much in line with Facebook’s explanation of its retired EdgeRank algorithm: some variables are made central to the algorithm’s explanation, but the more technical details are excluded. This summary is sufficient for players who, for example, want to understand why their changes in Skill are slowing down over time: it is a result of the σ variable—the system’s certainty about a player’s Skill—which decreases after every game. Holistically this document explains that there are some underlying processes responsible for determining Skill, which effectively functions to reassure players that TrueSkill is a calculated, well thought-out method of Skill determination. The overview legitimizes the algorithm.

What do players, as stakeholders in the ranking system’s use, care about its implementation? For one, they want to know why they have the Skill that they were assigned by the system, but also they care whether the system is fair. The interactive tool works to justify and legitimize this fairness—it is a proof by interaction. Users may either create hypothetical players and games to experiment with how TrueSkill determines their ranks, or choose scenarios from a pre-defined list. A tool such is this is visibly developed out of a passionate interest in the quality and performance of the algorithm. Yet it also works in the interest of the TrueSkill system directly, demonstrating that the system is usable, functional, and legitimate for you to use.

TrueSkill’s detail overview is the most revealing explanation, providing enough information that an interested developer could reimplement the system themselves. This discursive explanation is little different than the simplified overview, but it does provide more figures, formulas, and examples that can be used to piece together how TrueSkill works. One of these descriptions provides the mathematical fault behind boosting: “the team’s skill is assumed to be the sum of the skills of the players.” The limitation of this decision, and the fact that boosting occurs in-game for this reason, is not mentioned.

The detail overview also provides information about how Skill is calculated, and made into visible number, with a “conservative skill estimate.” A player’s skill (lowercase) is not an exact quantitative number; rather, it is a probabilistic belief about what the player’s skill might be, based on their prior game history and associated variables like σ. As skill is uncertain (the extent to which is denoted by σ), the best that TrueSkill can do is to be “conservative” and assume the worst Skill, at worst 3 ranks below the player’s most likely Skill.

Knowing the Algorithm

As we have identified, MSR has chosen to expose the inner-workings of the TrueSkill™ Ranking System for more reasons than convenience, and their explanations are not only thorough but thoughtful, implemented into an interactive tool. The nature of this documentation offers some clues as to why it was published, but we cannot exhaustively list these reasons. One thing we do know, however, is that the online presentation of TrueSkill accompanies the understanding garnered by its players in practice, but cannot supplant it. To ‘know’ how ranking up happens by playing Halo 3 is very different than to ‘know’ how the underlying algorithm works from a machine learning perspective, even if both kinds of understanding yield the same conclusions.

Unfortunately, the designers of algorithms are unable to describe their work in terms of the nuanced, tacit, and experiential knowledge of algorithm users. But even if they could, should they want to? If MSR were to acknowledge that player interpretations of TrueSkill affect the way that they participate in (or exploit) Halo 3, it would undermine or complicate the legitimacy of the algorithm. Rather, it is best to describe the ranking system in scientific terms—to delineate each of its variables and processes objectively, and to provide examples of use that demonstrate the system’s robustness. TrueSkill is ‘good’ for this reason: it is well thought-out, handles potential edge-cases, and is even communicable to a wide audience.

MSR means no harm in developing and carefully explaining TrueSkill; after all, the worst foreseeable outcome of developing the system for use in games is players feeling cheated by its mathematical reasoning, or the system itself being exploited. But player intuition should not be forgotten. As soon as the ranking system is placed in the dynamic ecosystem of an environment—even one rule-based as a game—its reception and use changes in unprecedented ways. For now, TrueSkill is more than MSR can describe.