Music Genre Classification using Random Forest

Sidharth Pandita
hackerdawn
Published in
5 min readMay 29, 2021
Photo by Cezar Sampaio on Unsplash

If Music is a Place — then Jazz is the City, Folk is the Wilderness, Rock is the Road, Classical is a Temple.

— Vera Nazarian

Music is an essential part of our lives and, music streaming companies like Spotify are nowadays using machine learning to create recommendations for us. Music genres play a big role in creating these recommendations. In this story, we will build a model for the classification of music tracks into their respective genres. For this tutorial, we‘ll use librosa, a library for music and audio analysis. We will also use the GTZAN Dataset from Kaggle for training our classifier.

Importing Libraries

import os
import pandas as pd
import numpy as np
import IPython
import librosa
import librosa.display
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import minmax_scale
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Different Genres

Let’s see the entire list of genres present in the dataset.

general_path = './Data'
print(list(os.listdir('Data/genres_original/')))

Taking a Single Audio

We’ll take a single audio file for exploration.

file = './Data/genres_original/classical/classical.00050.wav'
signal , sr = librosa.load(file , sr = 22050)

Signal Visualization

Let’s use a wave plot to visualize the audio file or signal.

plt.figure(figsize=(15,5))
librosa.display.waveplot(signal , sr = sr)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.title("Classical music signal")
plt.show()

Playing the Audio

To play the audio in the Jupyter Notebook, we’ll use the following code.

IPython.display.Audio(signal, rate=sr)
Output

Fourier Transform

Fourier Transform is a function that takes a signal in the time domain as input and outputs its decomposition into frequencies. Let’s plot a graph using it to see the distribution of frequencies.

n_fft = 2048
hop_length = 512
D = np.abs(librosa.stft(signal, n_fft = n_fft, hop_length = hop_length))plt.figure(figsize = (15, 5))
plt.plot(D)
plt.title('Fourier Transform')
plt.show()

Spectrogram

A spectrogram is a representation of the loudness of a signal over time at various frequencies present in a particular waveform. Let’s plot a spectrogram for our audio file.

DB = librosa.amplitude_to_db(D, ref = np.max)plt.figure(figsize = (15, 5))
librosa.display.specshow(DB, sr = sr, hop_length = hop_length, x_axis = 'time', y_axis = 'log', cmap = 'cool')
plt.colorbar()
plt.title('Spectrogram')
plt.show()

Harmonics & Perceptual

Harmonics are unwanted higher frequencies superimposed on the fundamental waveform creating a distorted wave pattern. Perceptual represents the sound rhythm and emotion. Let’s plot both of them on a graph.

a,b = librosa.effects.hpss(signal)plt.figure(figsize = (15, 5))
plt.plot(a, color = '#FF5E33');
plt.plot(b, color = '#FFD433');
plt.title('Harmonics and Perceptrual')
plt.show()

Spectral Centroid

The spectral centroid indicates where the center of mass of the spectrum is located.

spectral_centroids = librosa.feature.spectral_centroid(signal, sr=sr)[0]plt.figure(figsize=(15, 5))
frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames)
librosa.display.waveplot(signal, sr=sr, alpha=0.4)
plt.plot(t, minmax_scale(spectral_centroids,axis=0), color='r')
plt.title('Spectral Centroid')
plt.show()

Chromogram

We will create a chromogram in which the entire spectrum will be projected onto 12 bins representing the 12 distinct semitones of the musical octave.

hop_length = 5000chromagram = librosa.feature.chroma_stft(signal, sr=sr, hop_length=hop_length)plt.figure(figsize=(15, 5))librosa.display.specshow(chromagram, x_axis='time', y_axis='chroma', hop_length=hop_length, cmap='YlGnBu')plt.title('Chromogram')
plt.show()

30 Second Features

The ‘features_30_sec.csv’ file contains audio features over a duration of 30 seconds. We will use it for studying beats per minute.

data = pd.read_csv('./Data/features_30_sec.csv')
data.head()
Output (Truncated)

Beats per Minute

Let’s see the beats per minute for different genres.

x = data[["label", "tempo"]]f, ax = plt.subplots(figsize=(15, 5));sns.boxplot(x = "label", y = "tempo", data = x, palette = 'husl');plt.title('BPM for Genres', fontsize = 20)
plt.xticks(fontsize = 14)
plt.yticks(fontsize = 10);
plt.xlabel("Genre", fontsize = 15)
plt.ylabel("BPM", fontsize = 15)
plt.show()

Loading the Data

The ‘features_3_sec.csv’ file contains audio features over a duration of 3 seconds. We will use it to train our model.

data = pd.read_csv('./Data/features_3_sec.csv')
data = data.iloc[0:, 1:]
Output (Truncated)

Preprocessing the Data

We will preprocess the data to make it suitable for our model. We’ll use MinMaxScaler for this purpose.

y = data['label']
X = data.loc[:, data.columns != 'label']
cols = X.columnsmin_max_scaler = preprocessing.MinMaxScaler()
np_scaled = min_max_scaler.fit_transform(X)
X = pd.DataFrame(np_scaled, columns = cols)

Splitting the Data

Let’s split the data for training and testing purposes.

X_train, X_test, y_train, y_test = train_test_split(X, y)

Model Creation & Prediction

It’s time to create our model. We will use Random Forest Classifier to built the model. We’ll fit the model using the training data and predict the testing data. Our model’s accuracy turns out to be 81.38 %, which is great!

model = RandomForestClassifier(n_estimators=1000, max_depth=10, random_state=0)model.fit(X_train, y_train)preds = model.predict(X_test)
print('Accuracy:', round(accuracy_score(y_test, preds)))
Output

We have successfully built a model for music genre classification. If you found this tutorial helpful, hit Follow to join the community.

--

--