DeepOverwatch — combining TensorFlow.js, Overwatch, Computer Vision, and Music.

Hi I’m Farza and I’m the creator of a music streaming site for gamers called mood.gg which has gotten pretty huge recently and has millions of hits + over 500,000 users. I’m just going to call it Mood for the remainder of this post :). This post is all about how I trained a convolutional neural network on my own dataset, using the recently released TensorFlow.js, in order to do real-time detection on the character a player uses in the game Overwatch in order to play music specifically for that character. All automatically.

Special thanks to Google Brain’s awesome developers that worked on TensorFlow.js, specifically Nikhil Thorat who gave me tons of hands on help.

If you have any questions, don’t hesitate to drop me a question on Twitter!

Get The Code/ Trained Model

Code behind the training scripts and the trained model(Python/Keras) can be found here. Code to the desktop app with all the TensorFlow.js stuff can be found here.

What is Mood?

Mood allows users to listen to music that relates back to the characters they play in certain games. Users can then listen to this music while they play the character in-game to actually “feel” like that character. This music we choose is based on a characters specific theme, play style, and personality. For example, below is a character named “Reaper” from Overwatch. He is described as “a wraith-like terrorist who sets out to kill his former comrades to feed his desire for revenge”.

He’s obviously this dark, menacing figure and that’s why if you check his playlist you’ll find its full of metal music, edgy rock, and satanical hip-hop. Now, when a player wants to play Reaper they can also listen to his playlist in the background. If you don’t use Mood, you’ll have to trust me when I say that it feels really fucking cool to play this very ominous character that blows up enemies with these massive guns all while rocking out to some metal music. I found that even people who hate heavy music had a great time immersing themselves in the personality of the character.

Best part about it all? Mood supports every hero in Overwatch!

The Problem

Mood for Overwatch comes with a downloadable desktop app that allows people to control the music via keyboard commands while they are in the game. This is cool, but what sucks is that when you change characters you’d need to actually minimize the game and manually change the playlist. And in Overwatch people sometimes switch between different characters a lot. This kills the immersion.

Instead, wouldn’t it be cool if the desktop app could automatically detect what character the user was playing by just looking at the screen? HELL YEAH. But is this possible?

Possible Solutions

Below is an example screenshot from the game of someone playing Ana, a sniper. Every single character in Overwatch has a couple of visual factors that distinguish them. Their character portrait (bottom left corner), their weapon which is held around the center of the screen like any other FPS, and the characters specific weapon art (bottom right corner). Detecting the weapon in the center seems hard, so I didn’t wanna do that. But the other two things seem doable.

I know what your thinking, “can’t you just use the character portrait and compare it against all the other character portraits in the game to decide what character it is?”. This was actually my first thought and its something that can easily be done using template matching with OpenCV. But, I quickly began to hate this idea when I realized that characters can have different portraits depending on what skin is used by the player. That means I would need to constantly update my program as new skins were released. Also, Blizzard is known to randomly update champion portraits as well .That’s annoying. I like to make stuff that maintains itself so this wasn’t a good solution.

So than I thought, “oh, I’ll just template match with the weapon art at the bottom right”. This also ended up being extremely problematic because the art is transparent and the background is constantly changing. This means template matching would give terrible results and a ton of false positives.

Plus, these solutions aren’t efficient. Template matching across many different templates is computationally expensive and it might actually cause lag for some people with low-end systems.

Neural Networks

I wanted to solve this problem using a convolutional neural network. You might think this is overkill. You might be right. This problem could be solved many ways but honestly neural nets are the new cool kids on the block and I was just curious to see how well they’d do in terms of accuracy and efficiency. After all, I needed this thing to work on super shitty computers and super good computers. Plus I’m pretty experienced with using convolutional neural nets and have already used them on video games to some success. But, the first thing I needed was training data.

Training Data

Definitely check out my code here if you want to know exactly how I did all this and want to replicate it yourself.

This was actually pretty easy. I just played each character for 5 minutes and pressed random buttons + moved around like crazy. I only played on one map the entire time, so in order to create a model that would generalize well on other maps I need to diversify the training set. The reason I was moving around a lot was to create data with a lot more variability. Neural nets trained on a lot of the same data rarely generalize well, so I made sure to make each frame count.

I should stress that the training data was ONLY made up of actual gameplay. So, all the other screens where you are dead or in a lobby were not included.

First record the clip.
Then rename the clip to reflect the character played in the clip.
Finally, crop out just the gun from the clip and save the clip as individual images. I simply kept track of the label of each image by saving the images in folders named after the label.

TensorFlow.js

I built the desktop app using Electron.js which is a framework that allows people to create beautiful desktops with HTML/CSS/Javascript and runs on Node.js/Chromium. It is also special in that it gives developers access to lots of OS level functionality like the ability to call shell scripts, create files, and take screenshots. Most neural net stuff is done using Python. This means I’d have to package Python with my desktop app, and the desktop app would call my Python scripts, which would than send messages back to my desktop app to tell it the detected character. This flow isn’t too bad, but it requires that I package Python with my app. That’s kinda lame.

But I still wondered, could it be done without Python? Could it be done right in the Chromium engine?

I knew that neural networks written in pure Javascript were possible. Andrej Karpathy, one of my favorite computer vision researchers and the current Director of AI at Tesla, created ConvNet.js in 2014. It allows you to train and test neural nets, right in your browser! This was insane to me when I first saw it because I was so used to neural nets requiring expensive GPUs and lots of setup. But, here they were, right in the browser. The library hadn’t been updated in four years but I still managed to get it working and implemented the tutorial MNIST program it provided. Sadly, it caused a ton of lag on my desktop app :(.

I then actually abandoned this pure JS approach for the packaged Python approach that I was trying to avoid, but pretty soon I found DeepLearn.js and hope was redeemed! This was similar to ConvNet.js, but, it was made by an actual team (versus just one person) and provided more features that would allow the neural nets to run faster in the browser such as the use of WebGL and the use of a GPU. Within a week of me using DeepLearn.js, the team announced that DeepLearn.js was now TensorFlow.js. This was actually amazing and the timing was pure luck. TensorFlow.js brought with it some awesome features.

Model

According to the developers, models that run in TensorFlow.js are “1.5–2x slower than TensorFlow with Python”. Despite this, I still went forward because:

  1. Neural networks running purely in the front end are super cool.
  2. The amount of setup on the clients end is minimal, just a script injection on a webpage via a <script> tag.
  3. The app wouldn’t need to be packaged with Python.
  4. My early experiments with MNSIT were showing that TensorFlow.js caused no lag on my desktop app. LETS. GO.

TLDR, with TensorFlow.js, you sacrifice speed for usability. My grandmother could easily run TensorFlow.js models and that’s amazing. Setting this thing up is a breeze. Just plug in the script and go. No crazy dependencies on Python, CUDA, etc.

The most useful feature of TensorFlow.js is the ability to train models in Python via Keras, and port them over to TensorFlow.js through a simple script. That means I was able to take advantage of the full power of TensorFlow in Python and could quickly train my model using an AWS GPU cloud instance. Afterwards I could just convert the model to a TensorFlow.js model.

COOL. So thanks to the TensorFlow guys the process to train and run a model was now super easy. Now I just needed a model! I couldn’t think of an existing model for this task, so I decided to create my own after lots of iteration.

Model I came up with that predicts 27 classes since there are 27 heroes in Overwatch.

As always, the hardest part of coming up with a completely new model is finding the perfect balance between a model that underfits and overfits. My process is always to start small and build a model that gets okay results. Than I start adding layers and more parameters and run lots of experiments to see how the model reacts. And of course its smart to use regularization techniques as needed. For example, my models kept overfitting as I added more layers. So, I combatted this my throwing dropout in between nearly every layer. I probably didn’t need to go tat crazy with it, but it worked out well!

Training was a breeze with this model. The validation loss and training loss decreased accordingly and didn’t show signs of under-fit or overfit. By the end, the accuracy on both training set and validation set was around 100%. The test set accuracy was right below 100% as well.

yay!

The final model has around 127,000 parameters and is shown above! By the end I spent around 200$ on AWS cloud GPUs on all my different experiments.

Almost Done

The last thing I did was take my trained Keras model, convert it to TensorFlow.js model, and and hosted it on S3 here. Now, in TensorFlow.js I can just do:

const loadModel = async () => {
console.log(“Loading Model…”);
model = await tf.loadModel(INSERT_S3_LINK_HERE);
console.log(“Model loaded!”);
setInterval(startScreen, 500);
}

startScreen is a function that takes screenshots of the users screen while they are in a game. I take the screenshot, crop it so it sees just the weapon icon, and pass that to tf.js like this:

const logits = tf.tidy(() => {
const b = tf.scalar(255);
const img = tf.fromPixels(imgElement).toFloat().div(b);
const batched = img.reshape([1, 40, 85, 3]);
return model.predict(batched);
});

And at this point, I know what character the person is playing and can change the music that plays :).

Final Observations

So, this was something I didn’t even think about prior: What happens if the desktop is running, but the user is not in game? What if their just browsing the internet or chilling in a pre-game lobby? I had to be sure that the playlists would only switch if the user was in an actual game because technically the desktop app could always be running.

I was worried about this because often with template matching (with small templates) you get a lot of false positives because its just looking for the pixels that look closest to the template. If I had used template matching for the desktop app, the user could have been chilling playing Call of Duty and the desktop app would randomly start playing music because something that looked like an Overwatch weapon icon popped up.

Luckily, I didn’t have to worry about this! For example, below I was just browsing the web with the desktop app on and my neural net was giving me a 7% chance I was playing Bastion.

Pretty cool! I don’t have to code for edge cases like when the user is in a lobby or playing another game. The neural net takes care of it because it’s gained a better understanding of what the weapons icon actually look like :).

And with that, thank you for making it this far and taking a moment learn about how I built this desktop app :). If you have any questions, don’t hesitate to drop me a question on Twitter! Now to celebrate the release of the desktop app!

me celebrating