Classifying Music and Speech with Machine Learning

An audio classification walkthrough with code

Code AI Blogs
CodeAI
4 min readMay 15, 2021

--

Introduction

The difference between music and speech is crystal clear to human ears, but how do you train a machine to learn the same?

My goal is to create a classifier that can differentiate between music and speech.

Like my earlier articles on Pokémon and waste classification, I’ll do this using a convolutional neural network.

I based my approach and model off of this TensorFlow tutorial, which builds a speech recognition network that recognizes 10 different keywords:

Data Source

For this project, I’ll use the GTZAN music speech dataset:

It is part of the TensorFlow Datasets catalog and contains 120 tracks that are each 30 seconds long. Under “Display Examples…” at the above link, you can listen to samples from both the music and speech classes.

Setup

First things first, I pip the Pydub library, a Python library for manipulating audio.

You can read more about Pydub here:

Alternatively, you can install Pydub within your command line.

Now I’ll import all the libraries we’ll need for this project:

Then, I load the dataset from TensorFlow, set up a path to the directory where the data is stored, and store the names of the categories in a list.

Note: WAV is an audio file format.

Now, we’ll get all the filenames:

From the output, we see that we have 128 samples, with 64 for each class.

Finally, we’ll split the dataset into training and validation sets in a 3:1 ratio. The TensorFlow audio recognition tutorial also creates a test set, but I’ll skip that here as we’re working with a tiny dataset.

Data Preprocessing

To start, let’s create a dataset with the waveform and label for each training file.

Here’s what the waveforms look like:

As we’re using a convolutional neural network for this project, we need to transform the waveforms into spectrograms, which are visual representations of the spectrum of frequencies of signals over time. We’ll create a function for this conversion:

As an example, this is the conversion for one music sample:

Note: unfortunately, IPython’s display does not render properly within GitHub gists. If you uncomment the last two lines in this code cell, you will have the option of playing the audio sample within the cell’s output under Audio playback.

For comparison, we’ll plot both the waveform and spectrogram for this sample. In spectrograms, colors reflect the amplitudes of the frequency of the waveform.

Now, we’ll do the same preprocessing for the rest of the training set and the validation set.

Training the Model

We’re now ready to train our classifier! If you’re using Google Colab, I recommend using the GPU hardware accelerator to speed up the process.

Let’s start by selecting the batch size and optimizing performance using cache() and prefetch():

Then we’ll build, compile, and fit the model:

Results

Now that we have our trained classifier, let’s plot its loss and accuracy during training:

From the loss curve, we see that both training and validation loss decrease sharply before leveling off. The opposite is true for the accuracy plot, where both training and validation accuracy improve. Remarkably, validation accuracy peaks and levels off at 100%!

Conclusion

We’ve successfully trained a classifier to differentiate between music and speech! But this is just a small step into the realm of audio-related machine learning. Here are some ideas for going further:

  • Test out other machine learning models
  • Experiment with data augmentation and hyperparameter tuning
  • Use mel spectrograms instead of spectrograms and compare the resulting model’s performance to the one trained here
  • Give transfer learning a try by following this guide:
  • Look into other audio classification tasks and datasets! You can find a handful from TensorFlow at the following link:

References

In addition to the ones linked throughout this article, I wouldn’t have been able to complete this project without the help of this awesome tutorial:

[1] Medium | Music Genre Classification with Python by Parul Pandey

--

--