Mark Farragher
Apr 1 · 4 min read

In this article I’m going to show you how to build an application that can track Mario in a video of someone playing the first level of a Super Mario game.

So here’s what the final app looks like:

My app is playing the game video and placing a little targeting rectangle over the Mario character. As Mario is running through the level, the app tries to keep the rectangle over the character.

You can see that occasionally the app loses track of Mario, but gracefully recovers every time.

Let’s see how to build this app in C#. I am going to use the awesome Accord library which is perfect for creating computer vision apps.

My biggest challenge is to find Mario in this colorful game level. To us humans the level clearly shows platforms, power-ups, and evil turtles, but to a computer it’s just a confusing jumble of colored pixels.

I’m going to focus on the colors of Mario’s outfit. I can use the ColorFiltering class to remove everything but Mario from each video frame, then convert the image to grayscale and threshold it to black and white. Finally, I can use the BinaryDilation3x3 class to create a pronounced Mario blob that motion tracking algorithms can lock on to.

My full image processing pipeline looks like this:

Here’s my code for the pipeline:

The NewFrame event is called by Accord’s videoplayer control every time a new video frame is available.

The first thing I do is set up a ColorFiltering class to keep all pixels of Mario’s outfit and discard everything else. Then I convert the frame to grayscale using the BT709 filter.

Next, I use the Thresholding class. This is an image filter that converts grayscale pixels to black and white. I pick a threshold value of 1 which means that any pixel with a grayscale value between 1 and 255 is treated as white.

Binary dilation is an image filter that adds extra pixels to the boundary of a shape. It’s a cool trick to close up any ‘holes’ in a shape, or make two distinct shapes flow together as one.

I use the BinaryDilation3x3 class and apply it three times to the image, to make the pixels of Mario’s outfit flow together into a featureless blob.

My final step is to use the ApplyMask class to extract all pixels corresponding to the dilated blog from the original image. This is what you can see in the bottom-right window of my app.

So that’s the full image processing pipeline.

My next step is to build some kind of tracking code to follow Mario as he’s running around the level. My challenge is to keep tracking Mario as he jumps in and out of the frame.

I’m going to use a BlobCounter to locate Mario in the frame, and then pass the coordinates on to the Camshift motion tracking class to track Mario from one frame to the next.

I’ll occasionally lose tracking as Mario jumps completely out of the frame. When that happens, I will revert back to using the BlobCounter, search for Mario, and then re-initialize a new Camshift instance with the coordinates where I found Mario.

Sound complicated? Here’s the complete flow:

And this is what it looks like in code:

The first time this method is called, it will initialize a new Camshift motion tracker and set it to non-conservative, non-smooth RGB tracking.

Then the code checks if the Camshift motion tracker in the tracker variable is still tracking Mario.

When Camshift has lost tracking, either the TrackingObject field is null, TrackingObject.IsEmpty is true, or the TrackingObject.Rectangle dimensions are blown up. I check for all three events.

If Camshift has lost tracking, I set up a BlobCounter and tell it to look for Mario. If it finds him, I call GetObjectsRectangles().First to get his location, pass it on to the Camshift SearchWindow, and tell Camshift to process the frame.

If Camshift is still tracking Mario, I just tell it to process the frame. Unfortunately it will zoom in too much on Mario, which will cause it to lose tracking in later frames. So to make the motion capture more stable, I inflate the tracking rectangle by 30 pixels after every processing step.

And here is the code for drawing the reticle. It takes the current video frame and draws the reticle straight into the bitmap.

You can grab the complete source code from here:

So what do you think? Have I inspired you to write computer vision code of your own?

Add a comment and tell me about it!

The Machine Learning Advantage

Learn the basics of computer vision and machine learning without any complex math! Lots of C# examples and a tiny bit of Python.

Mark Farragher

Written by

I help C# developers learn Computer Vision and Machine Learning in 6 weeks without requiring complex math or Python |

The Machine Learning Advantage

Learn the basics of computer vision and machine learning without any complex math! Lots of C# examples and a tiny bit of Python.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade