Using Tensorflow Object Detection to control first-person shooter games

In this article, I’ll explain how I used tensorflow’s object detection model to play the classic FPS game Counter Strike.

Published in

deepgamingai

4 min readJun 28, 2020

Playing Counter Strike with my webcam recording and a Tensorflow Object Detection model.

A little while ago, I came across this very interesting project where the author of the article uses a webcam to play the classic fighting game named Mortal Kombat. He utilizes a combination of Convolutional Neural Net and Recurrent Neural Net to identify the actions of kicking and punching from his webcam recording. He then translates the model’s prediction to the appropriate action to be taken in the game. Very cool way to play the game, indeed!

Playing Mortal Kombat with webcam and deep learning. Original article can be found here.

Using this an as inspiration, I created a similar controller interface that can play first-person shooter games using the predictions of a Tensorflow object detection model.

The code for this project can be found on my Github page, and is also linked below.

ChintanTrivedi/DeepGamingAI_FPS

An FPS game controller that uses webcam and deep learning to play games — ChintanTrivedi/DeepGamingAI_FPS

github.com

This controller is designed to handle the following actions in the game:-

1. Aiming the gun

First, in order to look around in the game, I am using object detection on a tennis ball. Based on the location of the ball detected on the screen, we can set the position of the mouse, which in turn controls where our player aims in the game.

2. Moving the player

Next, to instruct the player to move forward in the game, I am using detection of my index finger. When the finger points up, the player moves forward and putting the finger down again stops the movement of the player.

3. Shooting the gun

And the third action supported here is shooting of the gun. Since both the hands are used up in aiming and moving, I am using the mouth open gesture to control shooting of the gun.

Object Detection Model

The model used for object detection here is MobileNet in combination with Single-Shot Multi-Box Detector (SSD) for image localization. It has been trained on various images of tennis balls, raised fingers and that of teeth indicating an open mouth. It manages to run at a reasonable rate making it possible to use the lightweight model in real time to control our games.

Model Performance

In terms of performance of the model, the detection of the finger and teeth seems fairly reliable while playing the game. The main trouble is getting to aim the gun exactly where we want to since the model runs at much lower frame-rate than the game and hence the movement of the mouse is jumpy and not very smooth. Also, detection of the ball towards edges of the image is poor making it unreliable. This issue could be addressed by tweaking the model to reliably detect objects a bit farther away from the webcam, so we have enough space to move the tennis ball and thereby have better control of our aim.

The results of the in-game performance of this model can be found on my YouTube channel, with the video embedded below.

Conclusion

I feel the overall experience of controlling games with just the webcam and no extra hardware remains a very enticing concept. It has become very much possible thanks to advances in Deep Learning models. Practical implementation of this control mechanism needs to be perfect in order to replace the more conventional ways of playing these games. I can see a polished implementation of this idea be a fun way to play FPS games.

Thank you for reading! If you liked this article, please follow me on Medium, GitHub, or subscribe to my YouTube channel.

Note: This is a repost of the article originally published with towardsdatascience in 2018.