The Good New: I Have to Start Over

That title is hyperbolic but yes, I do have to restart my project. Not because of any doing of my own but because in the time between I started building my scraper (more on that soon) and now, a ‘Heroes of the Storm’ api now exists. Currently the metric data I need is down due to performance issues (the admin says it should be back up soon) but iterating through an API is both better and far less ethically grey especially if I want to scale up to building something beyond personal use.

The api (found here) has a bunch of obvious advantages. Foremost is ease of use, that’s a given. But more importantly, I can build a more dynamic program since I have access to far more information than just a win loss percentage. The entire API (which is sorted by replay) looks like the following:

"page": 0,
"page_count": 0,
"total": 0,
"per_page": 0,
"replays": [
"id": 0,
"filename": "string",
"size": 0,
"game_type": "string",
"game_date": "2017-10-15T00:30:35.905Z",
"game_length": 0,
"game_map": "string",
"game_version": "string",
"region": 0,
"fingerprint": "string",
"url": "string",
"players": [
"battletag": "string",
"hero": "string",
"hero_level": 0,
"team": 0,
"winner": true,
"blizz_id": 0,
"silenced": true,
"party": 0,
"talents": [
"name": "string",
"title": "string",
"description": "string",
"icon": "string",
"icon_url": {
"66x66": ""
"ability": "string",
"sort": 0,
"cooldown": 0,
"mana_cost": 0,
"level": 0
"scores": [
"level": 0,
"kills": 0,
"assists": 0,
"takedowns": 0,
"deaths": 0,
"highest_kill_streak": 0,
"hero_damage": 0,
"siege_damage": 0,
"structure_damage": 0,
"minion_damage": 0,
"creep_damage": 0,
"summon_damage": 0,
"time_cc_enemy_heroes": 0,
"healing": 0,
"self_healing": 0,
"damage_taken": 0,
"experience_contribution": 0,
"town_kills": 0,
"time_spent_dead": 0,
"merc_camp_captures": 0,
"watch_tower_captures": 0,
"meta_experience": 0
"bans": [

Assuming I have my DB build correctly (I do) I can parse the API by hero name (searching hash[‘players’][‘hero’]). If that comes back positive, I check that player instance if they “won” the game, then tally the results and do the average.

The beauty of this process is that as the draft proceeds, I can re-parse the API and update my DB’s win percentage based upon games that include the character picked — essentially every time I cut the amount of games and data I need to keep track of as the possible character choices diminish. Furthermore, I do not have to worry about hard coding or inputting synergies and counters: as the percentages adjust to picks made, the most ideal pick will always be the top percentage as it implies that the character is advantageous due to either synergies with your team or a counter to the enemy’s. The point is that I wan t this to be almost raw statistics so that possibly unconsidered counters and synergies are revealed.

Another consideration is that — even though HOTSAPI has the data for about 3.7 million replays it may in fact not be enough: there could still be unique combinations of team compositions. Furthermore, when compiling my win-rate averages I’d likely only want to count the most recent patch of the game since the win-rate of characters changes drastically after one (especially if a new hero is released), therefore my effective sample size of games is probably only in the hundreds of thousands which — ironically — is not perfect when you have such an extreme amount of variability.

I am also considering the options for scaling (likely using a simplistic rails framework). I am mostly happy with a CLI for my own use, but this might be something other gamers might be interested in. Who knows, maybe I can sell out and make some ad rev.

But in general, I now have a way to accomplish a number of tasks that prior I would have been unable to do or would have requires some hard-coding.

So how did my finished scraper do?

Well, here’s what my DB looks like (sqlite):

of course Medivh is still at the bottom lol

Not bad!

Im so old memes from a decade ago still entertain me

This was created a couple days ago so the numbers are not accurate but otherwise, it worked perfectly. From this I have all relevant statistics. In addition, there is a ‘picked’ column: this was from my original design to allow for easier sorting as a the draft is going. Other designs I made (but haven’t migrated since the project is on hold till HOTSAPI re-adds the character statistics) included a ‘teams’ table that’d keep track of pick (referenced by hero-id) and the map-side. This would allow for keeping track of prior drafts.

Anyway, here’s the code for the scraper:

First we have the defined object of the scraper:

class Scraper
attr_accessor :page, :heroes_info
def initialize
@page = Nokogiri::HTML(open(HOTSLOGURL))

Its initialized with the HOTSLOGURL (not a gurl, it’s a URL) string constant (defined elsewhere) and stored as an instance variable, then the ‘get_data’ method is automatically called which does the following:

def get_data
body = @page.css('body')
table = body.css('div#RadGridCharacterStatistics')
rows = table.css('tr')
array = []
rows.css('td').each_with_index do |stuff, index|
unless index == 0
array << stuff.text
array = array.each_slice(10).to_a
@heroes_info = self.build_hash(array)
return heroes_info

The most interesting part is how the .css narrows and gets all of the data from the @page, pulls the body then the table then the individual rows. Then, we build an array of the text from each ‘td’ element. This seems a like an extra step, but partly because of how the HTML labels all the elements, this creates a very consistent pattern of data which is then sliced into equal elements (hence the ‘slice’ on line 11). From that array, I build another instance variable @heroes_info by calling another instance method, ‘build_hash.” This works the following way:

def build_hash(array)
id = 0
array_of_hashes = []
array.each do |hero|
id += 1
hero_hash = {}
hero_hash[:id] = id
hero_hash[:name] = hero[0]
hero_hash[:group] = hero[6]
hero_hash[:popularity] = convert_to_f(hero[3])
hero_hash[:win_percent] = convert_to_f(hero[4])
hero_hash[:pick_rate] = convert_to_i(hero[1])
hero_hash[:ban_rate] = convert_to_i(hero[2])
array_of_hashes << hero_hash

While not the most elegant of code, it does the trick. Most importantly, I had to define additional methods in another object to deal with the conversions of to and from different data types (converting a string to a float and etc. etc.).

This is all very automated and allows me to very easily populate my table in the seed, first by running the scraper (simply and using a find_or_create_by on my Character object (populating the characters table). The only other important function I built was the ‘updater,’ which allows for updating metrics in my DB. It works fairly simply:

def self.update_with_hash(hash)
Character.all.each do |old_hero_data|
replace_hero_data(old_hero_data, hash)

Sending in a new hash, it updates the old hero data in the active record with new information. The replace_hereo_data method looks like the following:

def replace_hero_data(hero, hash)
hash.each do |new_hero_data|
if == new_hero_data[:name]
hero.win_percent = new_hero_data[:win_percent]
hero.pick_rate = new_hero_data[:pick_rate]
hero.ban_rate = new_hero_data[:ban_rate]

Again, perhaps more lines than necessary but it does the trick and is relatively easy to follow and interpret.

So, here I sit waiting on the API to come back…

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.