Training a Rainbow Six Siege Operator Detector

Michael Sugimura
7 min readMar 12, 2018

Motivation

I play on a team with some friends in the tactical shooter Rainbow Six Siege,we make strategies, practice them, dissect our games, and track metrics. While not the most mechanically skilled, we make up for it with good preparation and can do fairly well against more skilled opponents because of it.

Since we place such a high premium on our analysis the data scientist in me has been wondering was, “what if we could use deep learning to automate some of our stat generation?” In particular an interesting metric to try and build out is the “engagement preparedness” metric that we look at, basically saying how good are we about having our crosshairs pointed at the correct location when an enemy appears, which maximizes our chances of winning that engagement. Based on the team’s communications, if we are communicating well, more often than not we should be ready and aiming at the correct location when an enemy appears.

I had been debating with myself the best way to go about it and after re-watching some of Andrew Ng’s Convolutional Neural Network lectures, I decided to try an end-to-end deep learning pipeline.

example output from operator detector

Project Pipeline

While my current work is a long ways away from automating our stat collection, it is a good example of how to, fairly quickly, build a tailored end to end deep learning pipeline using an object detection + facial recognition models. The pipeline will have two main parts, first an object detection network to detect Rainbow Six Siege characters (operators) in the frame and then feed those outputs into a second model for classification. The end goal is to be able to run the pipeline over images and or video to detect which operator is in the frame.

Let’s Get Down to Business

A few months ago I came across a blog post by Dat Tran on how to build a raccoon detector (here). Like in the raccoon detector post, I went ahead and built a dataset of around 1,200 images. I built my dataset by taking screenshots from around 9 hours of recordings of my team’s scrims. I made sure to include a fairly wide variety of images, low light, back light, and distances to the operator in question so the model would be exposed to the best and worst quality images I could throw at it. Using these recordings in this way helps me ensure that my training dataset distribution matches whatever final images/video I would run the models on. The following collage shows examples of images from my in game recordings.

In game images of operators, lots of variation which makes model training hard

Once the dataset was created I used labelImg to create the bounding boxes that would be used for predictions. LabelImg is a very nice interface to allow you to generate the PASCAL VOC format XMLs which are required for running the models.

With the dataset created and labeled, the next step was to figure out the best way to get an object detection model up to speed. I use Keras with a Tensorflow backend for most of my work and I was grateful to find that Ngoc Anh Huynh had already created a repository of object detection models for keras and made them quite easy to use and implement. Finding this repository saved me a significant amount of time.

The way that the models are implemented, the fully-trained networks of a variety of models like YOLO, MobileNet, and InceptionV3 are made available and the notes on how to implement the models is quite detailed so I recommend checking out the repository. For my purposes I went with a Full YOLO object detection model which has proved to be quite fast and easy to tune.

In trying to implement the Full YOLO model on my Rainbow Six Siege dataset, the only issue I ran into was how to best “warm up” the model to the new dataset. The default setting was to run it for 3 epochs over the dataset was the warmup period as the model adjusts, but I ran it for longer, 10 epochs with early stopping ending the runs due to lack of improvement. This seems to be a hyperparameter that you can just play with to adjust.

After the warmup training, I ran the full pipeline and the model reached between 60–70% average recall which is about the baseline that is mentioned in some of the issues in the repository. The model outputted at this point is very functional and I am quite happy with it. As a side note I was running training with an Nvidia 1070 and 32GB RAM for training/testing.

Facial Recognition… This part turned out to be fairly hard to build from scratch. The reason being that most facial recognition problems focus on frontal views with relatively obscured faces. In Rainbow Six Siege it is common to see operators from all 360 degrees as well as above and below. This combined with variations in lighting, distance, and the fact everyone is wearing helmets, masks, goggles, body armor makes the problem fairly difficult.

Only cool thing here was the object detector outputs can help make fun gifs…

Per usual the first step was to try and generate a relevant dataset… for this I used the object detection model to crop out operators which I figured I could feed into another neural network for classification. This was a very noisy approach because I ran it over a 3 hour recording and it generated close to 95K images, many of which were repeated crops of an operator which basically make stop-motion videos. I went over these 95K images by hand and generated a few thousand image dataset across the different operators. Once the dataset was created I used a pre-trained InceptionV3 network to extract embeddings of the headshots that I generated and fed these embeddings through a variety of different models. None of which did particularly well… so I was back to square one…

In this dark hour, I went back to one of the blogs I had read when I started to dive deeper into neural networks the “Machine Learning is Fun” series by Adam Geitgey in particular I always enjoyed the post on facial recognition. Re-reading it I was surprised to see an update at the end and realized I had lucked my way into another potential solution. Adam had been kind enough to built out a library called face_recognition. The api is very clean and simple to use and I have been able to apply it to some success, mostly combining it with the object detector to identify frontal images of certain operators.

The caveat here is good views of the entire operators face need to be present, in the case wish Ash above, even though she is wearing sunglasses it still detects her which is nice. However it does less well when faces are partially obscured or blocked by hands, clothes, or other objects which make it hard to see the overall face shape. So each of the operators above present fairly clean facial features for the api to identify. Which makes perfect since because standard facial recognition models are not commonly built to be deployed in these more niche situations.

Takeaways and Final Thoughts

This is far from complete, I will need to build out a methodology to recognize the operators in all types of lighting and angles in order to make this fully functional which is a fairly difficult recognition task. It could just be a question of data since I only used around 9 hours of footage for this round of training. Given the relatively small amount of data that I had, leveraging other transfer learning methods could be effective, better normalization, more data, or perhaps a Siamese architecture which works well with small amounts of data.

Transfer learning applies very well to the object detection model for identifying operators in images and can be set up with relatively little effort thanks to the various people who have contributed to make all parts of the pipeline quite streamlined.

Thank you to the community for building out so many cool projects. It makes it nice and straightforward to essentially construct a full facial recognition pipeline.

github repo here, I have just modified the previously mentioned keras detection repo.

I have also learned that getting to apply data science to video games is a lot of fun! I will be doing more posts over the coming months.

--

--