Play Street Fighter with body movements using Arduino and Tensorflow.js
For the past year, I’ve been learning more about machine learning. I’ve built a few browser experiments but lately, I’ve been spending some time mixing Machine Learning with another passion of mine, hardware!
The following tutorial is about how I prototyped a gesture recognition system using an Arduino and Tensorflow.js.
This is only a first version and is still very experimental.
What follows covers the main steps I took to build this, with some code samples.
If you want to have a look at the whole code, you can check the Github repo but know that I am going to change quite a bit of it in the next few months.
I believe that the most important thing to understand is the steps to take, rather than all the code needed.
This prototype is about training a machine learning model to recognise body movements like “punch” or “hadoken” to interact with a web game of street fighter.
The end result looks like this:
To build this, we need some hardware. What I used for my prototype includes:
I also used a breadboard to put everything together but if you decide to solder it, you’ll probably need a protoboard instead.
In terms of tech stack, I used:
- Johnny five
- Web sockets
Step 1: Gathering data
If we start from scratch, we just have our idea: playing street fighter with our body movements. Now, we need to think about how we’re gonna make that happen…
To be able to build a classifier, we need data. This data is gonna be coming from some hardware we’re gonna be holding in our hand.
To get the data, we need to start by assembling our components together.
It should look something like this:
The micro-controller I used is an Arduino MKR1000. I picked this model because I already had it at home and it had built-in wifi, which meant I wouldn’t have to be tethered to my laptop to record gestures. You could also try with an Arduino Uno but you would have to be tethered to your laptop all the time, which is not ideal for this particular prototype, but it would still be useful to get started.
The second part is the accelerometer/gyroscope. I used an MPU6050 that allows you to get acceleration data on the x, y and z axis, as well as rotation data on the x, y and z axis, giving you a total of 6 points of data.
Finally, I also used a button because I wanted to record data only when I was performing a certain gesture. This way, I could record data only when I am pressing the button and performing a “punch” for example.
Now that we have assembled our hardware, we need to write the code to get this data.
The code looks something like this:
In the code sample above, we start by requiring the Node.js modules we need, we set up our board with the IP address of our Arduino as well as the port it’s gonna be communicating on. Then, when the board is ready, we set up our button, MPU6050 sensor and we create a stream so we can write all our data to a file. When we get data from our sensor, we store it all in a variable and, while we hold our button down, we write this data to the file we declared above.
Finally, when we release the button, we close our stream, meaning we don’t write data to this particular file anymore.
This code sample covers how to write data to a file for a single gesture sample, however, for each gesture, we need to record multiple samples, so you would have to modify this file to record punch sample 2, punch sample 3, 4, etc…
After this step of recording gesture data, we need to move on to a second step to be able to use it, this step is data processing.
Step 2: Data processing
At the moment, all we have is a folder full of files with sensor data that should look something like this:
To be able to use this, we are going to have to read the data from these files, and transform it so it can be used by Tensorflow.js.
1. Read data from files
I’m not going to go through the code to do this, as I think a lot of other blog posts have covered it before and I’m sure you could figure it out if you needed to do it.
The main goal is to go through each data file in our data folder, read line by line and transform our data from the format above, to an array of objects.
What we want is for our data to look something like this:
What we’re doing here is going from lines in a file called
sample_punch_0.txt to something we can start working with.
The array of features represents our data for a single gesture sample, and our label represents the name of our gesture.
We don’t want to be working with strings so if we want to train 3 different gestures, we can have a gesture array of [‘hadoken’, ‘punch’, ‘uppercut’]. In this case, a label of 1 would map to ‘punch’.
We need to be doing this for all of our data files though, so in the end, we would have a big array of gesture objects, like this:
We’ve now transformed all our files into objects of labels and features.
However, this is not ready yet to be used with Tensorflow. We need to keep transforming our data into something that the framework can use.
2.Formatting the data
At this stage, we’re going to start transforming our objects into 2 arrays. One for the labels, and one for the features.
What we aim for is something like:
With the format above, we are separating labels and features but they are still mapped to each other. What I mean is that the 1st layer of the labels array represents all the gestures with a label of 0 (“hadoken” for example), and the 1st layer of the features array represents all the data for our hadoken gestures.
Now, we’re really close to something Tensorflow can work with, except that the framework works with a special data structure called Tensors.
3. Converting to tensors
This is where we start using specific Tensorflow.js code. Using built-in methods, we’re going to transform our arrays into tensors.
To do this, here’s a code sample:
Now we have a tensor for labels and one for features. We’re almost ready to train our model! But first, one last step is to split the data between training and testing set.
Considering our code sample above, we need to split both labels and features tensors into a training and testing set.
The reason why we’re doing this is because we want to use about 80% of our set to train the model, and the remaining 20% to validate the predictions of our model.
Now that we have our training and testing tensors for both labels and features, we’re ready to create our model.
Training the model
Creating the model is a step that’s a bit more experimental than the previous ones. Your model could be built in a lot of different ways and you can play around with parameters, the numbers of layers in your neural network, the number of epochs (steps) you want to go through, etc…
There is not a set way to create the right model. As you change parameters, you should see a change in the accuracy and predictions of your model and you can decide to stop tweaking once you get to a level of accuracy you’re happy with.
My current model is created this way:
The last line of this code sample saves the model as a file in your application. This way, you can use it for the last step, predicting new samples of data!
Using the model for predictions
Now that our training process is done, our model is ready to be used to classify new samples between “hadoken”, “punch” and “uppercut”.
With the code sample above, we get live data while holding the button down and performing one of the multiple gestures we trained. Once we release the button, we run our
predict function with this new sample the model has never seen before. We get back an index we can use in our
gestureClasses array to get the predicted gesture.
And we’re done! 🎉
As I said at the beginning of this tutorial, the most important thing is to understand the steps you’d need to go through if you wanted to build something similar. If you don’t understand the code entirely, it’s totally ok!
A cool thing to know is that, to start, you need a way to get data but this doesn’t have to involve an Arduino. I built a version of this project using a Daydream controller and also… a mobile phone!
Most modern phones have a built-in accelerometer and gyroscope you can use to gather data for this type of experiments. The code would have to change a little because, instead of Johnny-Five, you’d need to use the Generic Sensor API, that would look something like:
I might write another post later as I improve the code little by little and eventually build other experiments :)
Thanks for reading! 💚