How to build a Computer Vision Game in Python?

Published in

Analytics Vidhya

6 min readOct 7, 2019

I grew up in the (19)90’s. Before the internet, smartphones, next generation game consoles. Hell, before augmented and virtual reality. I Know, I’m old… At least I got to experience the physical world. Snap! (No, we didn’t have Snapchat either…). We would just communicate an approximate location, time for re-entrance and we were ready for takeoff.

That was fun. But you know what else was fun? A Master System II.

Brief history of gaming consoles

A long, long, looooong time ago, gaming consoles looked like this:

Games were cartridges that you had to insert in the console. Well… Not all of them. One game came pre-installed (thank G.od they didn’t open an inquiry for abuse of dominant position). A game that captured hours of my life.

Alex Kidd in Miracle World

This game was a master piece. Indeed a true miracle world. You were playing a small character, Alex, on a mission to defeat an illegitimate tyrant (plot). No impeachment, just tons of different levels.

What was particularly amazing in this game, was the diversity of its gameplay. You were jumping around breaking bricks, collecting money, swimming under water trying to avoid the fish and octopus’ tentacles. Why not taking a boat? Well you could buy one. Or a helicopter, a motorbike, a flying cape… Instagram life before Instragram!

Let’s play Rock Paper Scissors…

Alex Kidd was afraid of nothing. Except rock, paper and scissors…

At the end of almost every level, you had to play a game of Rock Paper Scissors. I have never played more intense games of RPS than in Alex Kidd. But that was before.

… on live video…

Technology is amazing. While I reminisce hours of games played on a TV that weighted probably more than me, “nowadays kids” are playing in virtual worlds, without cables. Let’s meet in the middle.

The program I created uses computer vision and deep learning to play a game of rock paper scissors on a live video stream (in this case from a webcam). Sounds impressive… It’s not. In fact, it is really simple!

… using OpenCV and Python

OpenCV is a very complete computer vision library that pretty much lets you do everything you need. Plenty of tutorials are available online to get you started (I can recommend some if needed — hit me up in the comments). That’s what I used to capture and process the images. I used Tensorflow and Keras for the deep learning part.

The workflow

What do you need to play this game? A hand. And that’s it. The program simply needs to recognize the gesture from the hand, compare it against the opponent and output the result (win, loss, draw).

1 — Identifying the hand

You could come up with a hand tracking algorithm, but I like simplicity. So I created a region of interest which is where the user has to put its hand and do the gesture. It is less flexible and “idiot proof”, but it has the merit of simplifying the code.

We have narrowed down where the action is going to take place. But we still need to “capture the action” (meaning identify the hand). For that, there are several different approaches. The first one that I tried was to use a histogram of oriented gradients. If you don’t know what that is, no worries, this is not the approach I ended up with. I just wanted to brag a little.

I used a simple ‘‘background subtraction’’. If you subtract the background of an image you are left with the foreground. And this is what we want here: the hand is (supposed to be) the only thing moving in the region of interest.

2 — Recognizing the gesture

Once we have isolated the hand, we can start thinking of strategies to recognize what gesture it is making. I decided to use deep learning, because it is cool to use deep learning for everything and anything.

And it is also incredibly easy. Count with me: I recorded a few hundred images of each hand gesture (20 seconds), augmented that dataset (20 seconds) and trained the model (5 minutes). Deep learning in under 6 minutes!

NB: data augmentation consists of randomly modifying the images I created to create new ones, a little bit different (rotated by some angle, zoomed in or out by some percentage etc.). You can do that directly when training the model but I like to do it separately so 1/ I can visualize the modification that I’ve generated, 2/ keep track of the pictures (for reproduction) and 3/ memory space was not an issue here.

3 — Output the results

Once the program has recognized what gesture the person is doing, you just need to implement the rules (paper beats rock, rock beats scissors, scissors beats paper). I hard-coded the logic because it is much simpler.

Because nobody wanted to play with me, I simulated an opponent. At the end, this is what you get.

Look at how much fun I’m having!!

How does it work?

Let’s break it down:

1 — The program takes the first image coming from the stream and consider it as the background (we hypothesize that only the hand will be moving in the region of interest);

2 — Then takes each frame and subtracts the background (this approach also works for diverse skin tone);

3 — Takes the result of 2/ and feeds it to the deep learning model in order to identify the gesture;

4 — Compares the gesture to the opponent’s according to the rules;

5 — Outputs the image and results information (win, draw, loss).

How can it be improved?

If that game doesn’t get me a job as a developer at Blizzard, I dont know what will! But if I wanted to spend more time and really improve it, I would:

1 — Use a hand tracking system to remove the need of a preset region of interest;

2 — Improve the deep learning model (design, but also the training data — including additional hand gestures);

3 — Improve the opponent (either by creating a reinforcement learning agent or by creating more logic like “if player played two X in a row, play Y” — the best strategy is to play randomly, but it is more exciting to build an opponent that seemingly has a strategy that you try to guess);

4 — Improve the graphics and effects of the play screen (count down for each new game, celebratory confetti when winning etc.);

5 — Other ideas you have!

Have fun!

If you made it until here, congratulations, you defeated all the bosses (Boredominus, NotFuninator and ObviousItIs). As a reward, you can find the code on my github. Please feel free to reach out if you have any questions, but the code is very simple so I’m sure you’ll be fine!

Ok, here’s your real reward: play Alex Kidd in Miracle World on PC for free.

See you on the other side!

How to build a Computer Vision Game in Python?

Written by Thomas Taieb