Attempting to Collect Unbiased Data About the Player Base of Overwatch (PC)

mörkenbörken
8 min readJun 16, 2018

--

Relatively little information is available about the overall player base of Overwatch. Aside from the occasional post from members of the Overwatch development team, the only sources with large amounts of data about the player base are third-parties such as Overbuff and MasterOverwatch. While these sites do a good job of collecting information about the users that give them permission to access their information, the data they can collect is affected by the fact that someone who uses third-party services to track information about their play is probably more invested in the game than your average truly random user.

So in order to find a set of players that accurately represents the player base of Overwatch, you would have to select people at random from the player base of Overwatch. I believe I have obtained a sufficiently random and unbiased sample of Overwatch players in order to determine some statistics.

How I collected the data

Blizzard gives everyone access to a limited view of their career profile at playoverwatch.com/en-us/{Platform name}/{Battletag}. So by getting a ‘random’ battletag and looking up their career profile, you can obtain a fairly unbiased sample of an Overwatch player. By doing this for every battletag for a username with 5 letters of the English alphabet or fewer, you can create a fairly large database of players which should be mostly unbiased towards any particular type of player. Doing this generated a list of 2270791 valid (at the time) battletags.

Personally, the largest bias I can think of that is introduced to this dataset is the number of users named after a hero in Overwatch with a name that is a part of the set of usernames, such as Mercy, Genji or Ana. These users may be inherently predisposed to having more playtime on these heroes, which would affect their representation in this dataset compared to other heroes with longer names, such as Widowmaker or Reinhardt. Regardless, I don’t think this impacts the unbiasedness of the dataset too much.

The data presented here was downloaded between 10AM CET on the 26th of April (1–2 days before the end of the 9th competitive season of Overwatch) and the 10th competitive season. playoverwatch.com also doesn’t always update instantly, sometimes taking a day or more to update your profile. This does mean some of the profiles have changed since then, and that some of the data is somewhat inaccurate. But I believe (and hope) that the difference isn’t too significant. This delay mostly affects profiles that are inactive for some time, so SR decay on a number of profiles may not be properly reflected in this dataset.

OWAPI provides a method for parsing an Overwatch players career profile and storing the data in a JSON format. However, OWAPI runs as a server which downloads the webpage from playoverwatch.com, performs some caching operations and parses it to return it the JSON file. This wasn’t very efficient as the download rate for the server wasn’t very high and the server would also crash when I sent requests at the rate I wanted. So I decided to write my own solution (I wouldn’t recommend this to anyone by the way, it’s a lot of work and a number of things on playoverwatch.com don’t really work properly). I used aria2 to download all the webpages (in segments, as passing it a list with 2 million names would cause it to consume far too much memory), then would check whether or not I was successfully able to download the webpage of a players career profile and then only keep the pages of players who had a skill rating, which indicated that they had completed their placement matches as I was only interested in data from competitive mode. This solution however, did not work too well and my scripts for filtering the webpages crashed a number of times. All of this resulted in the total number of webpages which could be downloaded and accounted for without issues being reduced to 1567222. Fortunately none of the files containing data from players who completed their placement matches had been deleted.

After this I had to extract the career profile data from all of these pages. OWAPI only works for getting data directly from webpages served by playoverwatch.com, and in my case the webpages containing career profiles were stored locally. So I decided to fork OWAPI and strip it down to a library which consists of only the parsing part of OWAPI. A number of issues arose here as well, which resulted in the number of career profiles which could be accounted for being reduced to 1329189. After getting rid of all the career profiles which hadn’t completed their placement matches, I was left with 426920 profiles.

There is however, one large issue with playoverwatch.com that I haven’t mentioned thus far. If a player finishes their placement matches, and proceeds to never touch the game again, whether that be quick play or competitive, the website will show their competitive stats for the last competitive season they played. So among all the profiles I collected, a significant number of them may be outdated and not representative of players who completed their placement matches in season 9. One way to make sure that a profile was showing competitive mode data from season 9 was by checking whether or not they had any playtime in quick play as Brigitte (she was not available in competitive mode until season 10). Brigitte was added to the live version of Overwatch on the 20th of March 2018, which means that everyone with any playtime on Brigitte registered on playoverwatch.com has had their profile updated since then, so their competitive stats are valid season 9 stats. However, this does mean that anyone who placed in season 9 but hadn’t played as Brigitte by the time I collected this data is out of the dataset.

After this I was left with 122414 profiles. While having playtime as Brigitte as a requirement for being included this dataset does alter it significantly, I don’t think it’ll introduce significant bias to some statistics such as the skill rating distribution. What it likely could introduce bias to, are statistics such as a players total playtime or what kind of heroes they play.

Statistics

Placement Rate

So of the 1329189 profiles gathered, 426920 have completed their placement matches at some point in all competitive seasons of Overwatch and 122414 have completed their placement matches and played Brigitte on the live server before some point where I was collecting the data. As a result, I can confidently say that the placement rate for season 9 of competitive mode is somewhere between 32.12% and 9.21%.

Rank/Skill Rating Distribution

The skill rating (SR) distribution for season 9 of Overwatchs competitive mode from this dataset. Each bar represents the number of players with an SR between where the bar starts and where it ends. There is a new bar for every increment of 25SR. Players below 500SR have their SR displayed as 499.

The sample mean SR is 2266.29. The percentage of players in each tier is:

  • Bronze: 10.25% (12553 players)
  • Silver: 21.59% (26427 players)
  • Gold: 32.1% (39300 players)
  • Platinum: 26.55% (32495 players)
  • Diamond: 6.72% (8232 players)
  • Master: 2.13% (2612 players)
  • Grandmaster: 0.65% (795 players)

The distribution is bell-shaped. Due to the nature of what is being sampled and the sample size, I believe the best approximation of this would be a normal distribution with a mean of 2266.3 and a standard deviation of 610.1.

The large peak around 3000 SR shows just how many players have let their account decay. The other peaks at the lower end of each tier are presumably from players who decided to grind at the end of the season to end in a higher tier.

Relation Between Skill Rating and Winrate

The average winrate of a player at a particular skill rating in this dataset. There is a new bar for every increment of 25SR. Players below 500SR have their SR displayed as 499.

This graph shows a clear positive correlation between SR and average winrate. The average winrate of a player in each tier is:

  • Bronze: 42.52%
  • Silver: 45.64%
  • Gold: 48.02%
  • Platinum: 49.57%
  • Diamond: 50.62%
  • Master: 51.28%
  • Grandmaster: 52.39%

Relation Between Skill Rating and Level

The average level of a player at a particular skill rating in this dataset. There is a new bar for every increment of 25SR. Players below 500SR have their SR displayed as 499.

This graph shows that players at higher skill ratings on average have higher levels (and thus playtime) than players at lower ranks. The large variance at higher skill ratings (4000+) presumably comes from the lower number of samples in that tiers. One thing that affects this data is the fact that playoverwatch.com doesn’t always accurately display a players level. When looking at a players profile their prestige caps out 18 (a 5-star golden border/portrait). Using the search function also shows a particular players level and it isn’t restricted to the 18th prestige, but it often shows a value a few levels lower than their current level.

The average level of players at a certain tier is:

  • Bronze: 191
  • Silver: 247
  • Gold: 347
  • Platinum: 469
  • Diamond: 552
  • Master: 647
  • Grandmaster: 742

Most Played Heroes and Roles

I personally think that the following data should be viewed with a lot of skepticism. Partly because of personal bias, but mostly because Brigitte, being a support hero, is bound to affect what type of players have any playtime as her. Nonetheless I’m showing this graph because someone reading this may know better, because it’s kind of interesting see and because I put a good amount of effort into it so I may as well show it (Yes, I know the text is tiny but I don’t know how to scale an embed on Medium, just click it to see it in full size).

The total playtime of all heroes as a percentage of the total in this dataset, overall and for every tier.

This is actually fairly close to what Jeff Kaplan said the most popular heroes were in each tier, but obviously that may have changed over time and the requirement to have Brigitte playtime may impact it too.

The amount of playtime each role gets as a percentage of the total in this dataset, overall and for each tier.

So tanks and supports seem to be gaining more playtime in higher tiers when compared to lower tiers. Though again, as I feel this cannot be stated often enough, the Brigitte playtime requirement may affect this significantly.

Closing Thoughts

I can’t really confirm the accuracy of any of this data, as I don’t have access to all the data Blizzard have access to. Personally, I think everything aside from the SR distribution and winrates are pretty likely to be biased. But as far as getting the SR distribution goes, this is probably the least biased way I can think of doing it.

All the data can be downloaded from here: https://www.dropbox.com/sh/8rad0va37jwizos/AACLzuC98KGIWLHWkZerfKZ-a?dl=0 (Be warned, some of the archives contain a lot of files and take up a lot of space). The code used for generating the graphs can be found on my Github: https://github.com/AthulJP/OWCompetitiveStats

Special thanks to the people behind OWAPI for making OW profiles easy to parse.

--

--