How does a disruptive event change emergent clusters?

--

If one has trained algorithmic models deployed, and if one knows that a disruptive event, E, is coming, it raises the question of how will the disruptive event require the deployed models to change? Similarly, if a system’s current formulation (e.g. parameters in a recommendation engine) has allowed for emergent communities to form that are dependent on said system, how does an emergent community respond to fundamental changes in their environment? To study this question in a data-rich environment, this work pulls data from the Massively Multiplayer Online First Person Shooter videogame Destiny 2, examines how players are interacting with the game via building player-weapon graphs, and applies clustering and community detection techniques on said player-weapon usage graphs to determine playstyle clusters. The key idea of this work is that if the player-base plays the game in methods A, B, and C, then that should be reflected in the data people have generated. Using this insight and clustering techniques to group similar players, does a change in how the game itself functions change how people are interacting with the game, therefore, resulting in new e.g. playstyle clusters? Does the playerbase engage with the game differently after the game was changed?

Studying the impact of large-scale shocks onto dependent trained models is very specific and requires not only a dataset that spans a time of great change but also to have models that are trained before the shock as well as after. Furthermore, many modern learning algorithms require many samples of data to extrapolate a latent pattern. So now we not only need data before and after a large-scale shock, but we need lots of data both before and after the shock. The stacking of each of these requirements means that there are not many publicly available datasets that meet these criteria. Therefore, part of the contribution of this work is the creation of one such dataset.

How did Destiny 2 change in Beyond Light?

As I note here:

On November 10th, 2020 Bungie launched an expansion for their game Destiny 2 entitled Beyond Light. In addition to new story content, the large yearly expansions often come with dramatic tuning of the game world via changes to the weapon systems, abilities, and/or new game modes. Beyond Light follows that trend by not only introducing a new element — Stasis — into Destiny (a first for the game since it launched in 2012), but also making drastic changes to the weapon system.

These weapon changes include removing an entire weapon subclass (a first for the game), changing damage amounts, slowing down handling animations (e.g. how long it takes to switch guns, which massively changes how one can play the game), and making a vast majority of the game’s prior loot unuseable in high-level activities. In addition to all of this, Bungie introduced new abilities into the game that also change how players optimally dual other players (e.g. running at me with a shotgun is now much more dangerous for the shotgunner since the game introduced freezing that can stop aggression easily). All of this taken together results in a game that, at least anecdotally, plays very differently than it did before November 10th.

Related Work

When machine learning models are trained on e.g. MNIST data to recognize digits, they implicitly assume that all digits the model might see in the future come from the same distribution as what is available in MNIST. However, that assumption is easily violated outside of laboratory conditions. Specifically, observing changes in one’s environment is commonplace in the technology industry. For example, any time that Google wants to make a change to their search engine, they will need to make sure that they do not break a dependence on e.g. GPT-3. This degraded ability due to a change comes about because the i.i.d. [3] requirement most trained models rely on has been violated. In other words, if you train on a distribution of data, you expect to also test as well as operate on that distribution. Therefore, if a model sees out-of-distribution data, it should not be surprising that the algorithm fails to operate as previously trained.

This is a necessary question from the perspective of researchers studying e.g. radicalization in the age of the internet. If researchers have their data and analysis pipelines set up to track and capture YouTube data [2] and YouTube changes their underlying recommendation engine, months of research studying how people get indoctrinated (which is a form of community formation, albeit a dark one) in the age of the internet can be invalidated.

Another example of research around large-scale systemic shocks comes from researchers Sakaki, Okazaki, and Matsuo (2010). They built a system to detect the presence of earthquakes in Japan (as earthquakes are unpredictable events) by determining and isolating (i.e. finding the epicenter of) disruptive changes in Twitter. At a broad level, this work can be abstracted to have studied the question of “what happens to our system (e.g. Twitter) in the aftermath of a disruptive event (an event that changes how people use the platform), and can we algorithmically determine what type of event this disruption was (e.g. a disaster, an election, the shutdown of the Twitter itself, etc.) and where it originated?”

Data

The data for this project was pulled via PyDest (think Tweepy for Destiny 2) from open-facing Bungie servers using Snowball sampling (start with an initial subject and have them point you towards other eligible people). Specifically, I scraped data from the Player vs Player competitive arena of the game. For the pre Beyond Light expansion, creating a large dataset was simple. I simply extracted the 250 most-recent competitive PvP match results — Post Game Carnage Report (PGCRs) — for each unique person I came across (excluding duplicate matches). As the player population is millions of players, this sampling method could have continued virtually indefinitely, but I stopped after 22617 players. Naturally, this sampling method is biased, and due to its “word-of-mouth” feature, this method is not a random sample of the population. Furthermore, this sampling is reflective of Bungie’s match-making (aka player recommendation) algorithm they use for building up matches where players get selected for their respective teams one by one.

Post Game Carnage Report from a competitive Destiny 2 match. This (player-facing) report shows basic information such as the number of kills someone got while their team held a zone advantage. When this same PGCR is pulled from Bungie’s servers, it contains much more detail than shown here.

I saved this data into a large JSON/dictionary object indexed by matchIDs and playerIDs. This information can also be flattened into a CSV form to facilitate many different types of research questions (e.g. skill prediction, matchmaking, engagement, measuring team cohesion and its effect on play, etc). From there, I reformatted the data to reflect the underlying graph this work would go on to analyze. See more here.

[That d]ifferent classes and weapon types exist in Destiny implies that there exists different ways of playing the game. Playing the game with shotguns results in vastly different play patterns than using a sniper rifle. Therefore, we’re going to try to tease [latent playstyle structure] out of the graph.

The underlying graph was characterized by the interaction of players with different weapon types where edges are increasingly weighted according to how much a player uses a weapon type. This graph can be seen below. For a less crowded graph, see here where a reduced version of the directly-below exists. For a full breakdown of node abbreviations, see here.

Pre Beyond Light player (pink) weapon (blue) graph. This is bipartite. Unfortunately, this layout algorithm is not great for the readability of node names.

An additional, similar in high-level structure (player-weapon), graph was collected from Destiny 2 after the launch of Beyond Light, but since there was a limited amount of time between data collection and launch, the snowball sampling had to collect fewer games per person and profile more people in order to build up a dataset equal in size to the pre Beyond Light data. See the limitations section for more details on this.

Post Beyond Light player (pink) weapon (blue) usage graph.

Notably, players interact with weapons and not other players, therefore the interaction graph of players and weapons is bipartite (Certainly you can also build in player-player interaction e.g. player a killed player b. Or player c was on a team with player d. Etc.). We have two types of nodes:

Weapon nodes and Player nodes. Player nodes are naturally the player’s unique ID, the weapon node is what type of weapons the person used in each match of collected data.

Statistics from the player-weapon graph above revealed the median number of edges (for players) was 4 (min: 0, max: 16), and that weapon nodes such as Hand Cannons and Pulse Rifles were central in the graph. Since the median number of edges is 4, this tells us that the median person uses 4 unique weapon/ability types per match. I posit therefore that someone's playstyle can be approximated by the weapons they choose to use and we can determine such combinations using clustering methods on the player-weapon graph.

Community Detection/Clustering

Therefore, determining playstyle clusters amounts to a clustering/community detection problem on the player-weapon graph. I used the structure of the graph as my method of searching for the latent playstyle feature. Specifically, I use an algorithm known as `leading_eigenvector` which is a “recursive, divisive algorithm. Each split [of the graph] is done by maximizing the modularity regarding the original network” [5] and the algorithm stops at a predefined number of clusters or when modularity, Q, is maximized. Therefore, this algorithm does not need to have a number of clusters defined beforehand and can infer such a value from the data (or one can be provided if the structure is known a-priori).

Weapon nodes clustered together. Pre Beyond Light.

Pre Beyond Light saw 8 unique clusterings of the different weapon nodes. These clusterings are discussed here in detail. But briefly:

It’s difficult to validate the above clustering since we’re looking at latent structure. But from an “experts” point of view, at least three of these clusters make sense … Hand Cannons (HC) and Sniper Rifles (SR) … Linear Fusion Rifles (LFR) [being alone] … Grenades [being alone].

For the first few days in Beyond Light, the clustering methods are returning partitions that are not very informative as there were only two clusters found in the data. Therefore our original methods are not providing the same caliber of information in the aftermath of the changes Beyond Light brought to the game. However, it is not clear if this is a failing of the algorithm itself or simply that players are experimenting with more weapons right after the expansion launched as drastically changing the weapon systems was one of the desired change of Bungie’s with this expansion.

Note, there are only two clusters here. Furthermore, there is something odd here where only 16 of the 19 possible weapon classes were observed. This is likely an artifact of my snowball sampling just not being a truly random sample of the player population.

Again here we only see two clusters. Day 3 also sees only two clusters, but day 4 starts to see new clusters emerging from the playerbase, and therefore the algorithms are returning to a similar state as before.

This initially static result makes sense though as in the first few days of a drastically changed game (after the large shocking event), people will be often experimenting with their new guns and abilities and trying out new stuff. As such, it makes sense that no one has specialized into anything yet and so clusters would not be reflected in the data. Therefore, to answer the titular question of are players playing the game differently, it certainly seems that way in the first few days after the game changed based on how our algorithms have changed. However, 6 unique clusters are starting to emerge as the “meta” settles in the community (and people have unlocked all their new toys) on Day 4. Furthermore, if we take a holistic look from Beyond Light’s launch to current (as of data collection), we see 6 clusters. Of these 6 clusters, brown makes sense as Shotguns and Melee are close-range weapons, with PulseRifles coving the mid-range. FusionRifles being in a class of their own makes sense as no other gun plays like these. Similarly Bow’s being alone make sense. Finally, the large teal class would seem to be a general non-specialist class.

Weapon nodes clustered together. Post Beyond Light (all days).

This post here analyzes how players are changing how they play during this time by looking at how the centrality of weapon nodes changes over time. Of note is that this daily breakdown of centrality supports the hypothesis that on Day 3 (Nov 12th) people were starting to finish earning their new subclass therefore the SuperKills spiked. For a full analysis of the centrality see the previous post.

WeaponNode centrality over the first 4 days of Beond Light

In an attempt to determine if these clusters needed additional player data to be more informative, I added descriptive player data to each node such as efficiency (kill/death/assist ratio), combatRating (a skill metric), averageKillDistance, averageLifespan, and averageScorePerKill. These were selected because they all seemed like good discriminative statistics that could be used to articulate different types of players. For example, a player who uses shotguns would have a lower averageKillDistance than a player who uses sniper rifles.

However, when I reran my modularity-based clustering algorithms with this new node data added to each player, none of the results changed. In hindsight, this makes sense since modularity-based methods are looking at the edge structure of the graph. Therefore, I took that data and put it into an unsupervised learning clustering algorithm called Density-Based Scan (DBSCAN) along with a TSNE embedding to get a 2D visualization. DBSCAN does not require the user to specify the number of clusters beforehand, therefore it makes for a decent clustering algorithm if you don’t have a-priori knowledge of latent structure.

Ultimately, however, these player statistics alone were not enough to get out a clustering that matched what we were finding with our modularity-based methods. Furthermore, the analysis pipeline means that the two analyses were wholly separate from each other since they dealt with different types of data. However, in reality, to determine a playstyle these two sets of data will need to be holistically taken into account.

Limitations:

  • I used Snowball sampling [4] to collect both the pre Beyond Light data as well as the post Beyond Light data. Both the pre and post datasets started with the initial individual (myself), but since I am reliant on the matches and teams that Bungie’s matchmaking engine created the two samples are not of the same people. However, it was not feasible to simply take all of the unique playerId’s that show-up in the pre Beyond Light dataset and then query Bungie’s servers asking for data from these people post Beyond Light. This data collection happened four days after Beyond Light was released. Therefore, to get a dataset of comparable size with just the same number of players would have required that each player in the pre Beyond Light dataset play ~250 Competitive PvP matches in four days. This is not reasonably possible as each match lasts ~10 minutes and players churn from games constantly. Therefore, I had to cast a shallower net (fewer games from each player) for longer (more unique players) to get a post Beyond Light dataset of comparable size to the pre Beyond Light data.
  • A result of having the pre and post Beyond Light graphs being different is that I am unable to use pre Beyond Light graph partitions in a meaningful way on the post Beyond Light graph. I can rerun the same algorithm on the new data and see what has changed (which is what I did), but I could not take the output of the algorithm from the pre Beyond Light days and apply that clustering on the new data. A lazy algorithm (e.g. knn) cannot be trained, frozen, anddeployed in the same manner as a non-lazy learner (e.g. neural network) to then study how what the lazy learner fails in the new environment. So, I think I made the wrong algorithm choice when doing my clustering.
  • As discussed here, it is reasonable to infer some approximation of playstyle from weapon usage since each weapon type in Destiny has different characteristics of where they are “good”/“bad”/”viable” etc. However, I have no way of determining if the players whose data was collected would agree with my classification of their playstyle.
  • Adding in node data did not cause a shift in what our graph-based clustering methods used to determine clusters. Furthermore, running a separate clustering algorithm on the selected node attribute data resulted in clusters that were not reflective of the graph-based clusters (which makes sense as they are different methods that were fed different types of data); however, an enticing piece of further work would be to apply feed both sets of data into one algorithmic process that can search the Pareto front of clusters with respect to both sets of data.
  • Finally, it is certainly feasible — since this data was collected immediately in the aftermath of Beyond Light’s launch and Beyond Light drastically remade the game — that the player base has not yet had enough time to solidify into new behaviors that are meaningful yet. Anecdotally, it took me about 4–5 days before I had finished earning my new abilities and even now, a month later, I still haven’t finished my power-grind up to max level all of which changes the guns that I use in the Competitive playlist (when I play it). However, this is all anecdotal and I certainly haven’t been playing the game as much as I used to due to schoolwork.

Thanks for making it this far.

[1] code: https://drive.google.com/file/d/1h1ZtPRN3ikVKNX_YXrs9GaCzvYLiEcBr/view?usp=sharing This is a zip of all experimental code. The primary notebook to look at is: `weaponGraphs.ipynb`. This also includes all datafiles necessary to recreate this work. The necessary datafiles are the .pkl files that are tagged BL if they are Beyond Light data. Also available are .csv files that are flattened versions of the .pkl files.

[2] https://journals.sagepub.com/doi/full/10.1177/0894439314555329

[3] https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables

[4] https://www.statisticshowto.com/snowball-sampling/

[5] https://igraph.org/python/doc/python-igraph.pdf page 48

--

--