Hi! This is a tutorial for Teachable Machine, a web-based tool that makes creating machine learning models fast, easy, and accessible to everyone.
In this tutorial, I’m going to walk you through making a machine learning model to detect snaps, claps, and whistles using audio clips. There are also two other tutorials — one making a machine learning model using images, and one using poses.
So, to get started, I’m going to go to Teachable Machine, and open up a sound project.
HOW TO TRAIN IT
To start training the machine, we first have to create different categories, or classes, to teach it with.
Right away, I can see it’s given me two boxes here on the left — “Background Noise” and “Class 2.” These are the starter classes Teachable Machine gives you when making any project.
You’ll always need a background noise class, to detect when no noise is happening at all. And because background noise in a forest is different than in an office (or anywhere else,) you should give that class audio samples for anywhere you foresee using your model.
Since I’m probably just going to use this in my office, I’m going to record 20 seconds of background noise of… nothing happening.
And since I want to detect three things — snaps, claps, and whistles — I’m going to add two new classes, and rename them all to reflect this. Of course, you can make whatever sounds you like — these are just easy to demonstrate.
In each class, I’m going to want to record a set of samples (examples the computer can learn from.) So, first, I’m going to record a few seconds of me snapping.
[snapping record gif]
This particular machine learning technology learns from samples that are one second long, so I’m going to extract 8 one-second samples from this recording by clicking this “extract samples” button. The classes need at least 8 1-second sound samples to train properly. And generally, the more data you give the classes to learn from, the better they’ll be at classifying.
Now I’m going to do the same for the next two classes by recording some claps and whistles:
So now I have all data for all three classes, which means I’m ready to train the machine learning model:
And now, I can preview how the model I just trained is working! When I’m not making any noise, it just looks like this, which is pretty boring:
But if I try snapping, clapping, and whistling, I can see how well the model learned from my samples:
THINGS TO TRY
One of the fundamental aspects of machine learning is that it can be uncertain. For example, the model I made seems to get snapping and clapping confused occasionally. If you look at the spectrograms (the blueish graphs that visualize the sound) in both of those classes, they look pretty similar. So, the machine probably has a harder time predicting the difference between new sounds that come in.
You can try other things to see where the model gets confused.
For example, you could try switching mics, or getting closer or farther away from your mic.
You could try moving someplace with different background noise.
You could try an actual whistle.
Or you could try whistling and snapping at the same time and seeing what it predicts.
WHAT CAN YOU DO WITH THIS?
You can actually export your model to make things with it if you like!
I exported this model and made this website with it on Glitch, that you can try out if you like.
For example, our friends at Stoj made this game that you control with sound.
You can learn more about using your models in the Teachable Machine FAQ here.
If you’re interested in peeking behind the scenes, you can open this project to see all the samples and try it out.
If you want to try making your own model, go to Teachable Machine.