Classification of Music into different Genres using Keras

Sanket Doshi
Jan 5 · 4 min read

We’ll extract various features explained in the blog here. And using these features we’ll classify the music clips into various genres present in our training set.

Classification after extracting features

We’ll use GTZAN genre collection dataset. If this site doesn’t work than you can get the dataset from here. This dataset consists of 10 genres and each genre consist of 100 music clips each is 30 seconds long.

We’ll be using package Keras which uses tensorflow package at the backend.

Import packages

import librosa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import csv
# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
#Keras
import keras

Install all the packages required and import them.

Important note:

Check the version of tensorflow it should be greater than 1.1 otherwise various keras features will fail.

Pip may not get latest tensorflow version so install tensorflow using this command:

pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py3-none-any.whl

Creating Dataset

We’ll process dataset as per our requirements. We’ll create a CSV file with the data we required.

header = 'filename chroma_stft rmse spectral_centroid spectral_bandwidth rolloff zero_crossing_rate'
for i in range(1, 21):
header += f' mfcc{i}'
header += ' label'
header = header.split()

Here we are generating headers for our CSV file.

If you have read the blog of features extraction we’ll get 20 mfcc for given sampling rate because it is calculated for each frame so mfcc has 20 columns.

Now, we’ll calculate all the features.

file = open('data.csv', 'w', newline='')
with file:
writer = csv.writer(file)
writer.writerow(header)
genres = 'blues classical country disco hiphop jazz metal pop reggae rock'.split()
for g in genres:
for filename in os.listdir(f'./genres/{g}'):
songname = f'./genres/{g}/{filename}'
y, sr = librosa.load(songname, mono=True, duration=30)
chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
rmse = librosa.feature.rmse(y=y)
spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
zcr = librosa.feature.zero_crossing_rate(y)
mfcc = librosa.feature.mfcc(y=y, sr=sr)
to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
for e in mfcc:
to_append += f' {np.mean(e)}'
to_append += f' {g}'
file = open('data.csv', 'a', newline='')
with file:
writer = csv.writer(file)
writer.writerow(to_append.split())

We’ve calculated all the features using librosa package and has created a dataset with the data.csv file name and has inserted all the feature values of given music in given headers.

Preprocessing Dataset

Reading a dataset

data = pd.read_csv('data.csv')
data.head()

Dropping unnecessary columns

# Dropping unneccesary columns
data = data.drop(['filename'],axis=1)
data.head()

‘Filename’ column is not required.

Now, we’ll encode genres into integers

genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)
print(y)

Here, we created a mapping between genres and integers. Each integer represents the specific genre.

Music genre

Normalizing the dataset

scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

In this x is calculated by removing the mean and dividing by the variance.

Splitting the dataset into training and testing dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

We’ve split the dataset into training and testing dataset in 80:20 ratio.

Model creating, training and testing

Creating a Model

from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(X_train.shape[1],)))
model.add(layers.Dense(128, activation='relu'))model.add(layers.Dense(64, activation='relu'))model.add(layers.Dense(10, activation='softmax'))

We’ll be using keras sequential model.

There are 4 layers in our network. First one is an input layer therefore, input size has to be given. Then there are 2 hidden layers and the last layer is an output layer. The value inside dense represents the dimension of an output space. So the 1st layer has 256 neurons so the dimension of it’s output space is 256. Such that the output layer has 10 neurons as we are classifying into 10 genres.

Learning Process of a model

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

Before starting a training process we need to configure the training process of how the model should be trained. Optimizer represents an optimization algorithm to be used we’ll be using adam algorithm which is largely used in deep learning. To know more about adam optimizers read this blog. A loss is a function by which we evaluate the network efficiency. The model’s goal is to reduce this loss function. Metric is a metric to be evaluated, we’ll be evaluating accuracy. Loss function sparse_categorical_crossentropy is similar to categorical_crossentropy just used when we have multiple classification fields.

Training a model

history = model.fit(X_train,
y_train,
epochs=20,
batch_size=128)

Using fit function we’ll train the model for given training input and output.

Evaluate the model

test_loss, test_acc = model.evaluate(X_test,y_test)
print('test_acc: ',test_acc)

We’ll get the accuracy by the model can predict the genre of given music based on the features extracted. This model achieved the accuracy of 67% which is not that good but we can modify the model to achieve higher accuracy.

Prediction

predictions = model.predict(X_test)
np.argmax(predictions[0])

predict function gives us percentage by how much that music matches to each genre. The highest percentage for the given genre is our final result which is calculated by argmax .

You can find the code here.

Sanket Doshi

Written by

Currently working as a backend developer. And enthusiast in ML and AI. Blogger at Towards Data Science.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade