Music Genre Classification using Convolutional Neural Network

Octaviano
BISA.AI
Published in
2 min readMar 11, 2020
Music photo created by ArthurHiddenwww.freepik.com

In this post, Music Genre Classification using Convolutional Neural Networks is performed by involving high-level features such as Spectrogram Feature and Chroma Feature. Python programming language will be used for several steps of works from dataset collection, segmentation, feature extraction, until classification. Hence knowing Python Programming basic is a must, please refers to previous post to learn about Python Programming for Artificial Intelligence.

In order to create Music Genre Classification program, we have to run several steps as follow:

1. Music Dataset Collection

There are several Music Data provider such as Million Song Dataset or other sources. We have to be careful about music licenses, because music or other speech file usually have proprietary or licenses, hence we have to use GPL license instead. In this post, I used music from Indonesian traditional music that consists of two classes: Sundanese music and Minang music. Both of music have 100 music files for training, 10 music files for validation and 2 music files for testing. You can request to me by mailing to octav@bisa.ai for further dataset

2. Extract features from music data

I used one a popular music or speech library called Librosa. This library is powerful because there are many functions included, such as feature extraction. In this work, Librosa is used to extract Spectrogram feature as follow:

Specrtogram Music Data in one minutes
Preparing Music File

3. Train Model

Train music dataset in Spectrogram feature with Convolutional Neural Network (CNN). The architecture of CNN can be seen below:

We can see from the architecture above, there are several layer consist in CNN like Input layer, Convolutional Layer, Subsampling/Pooling Layer, Fully Connected Layer and so on. We did some modification in typical CNN like:

  • Input data are 100 spectrograms for Sundanese and Minang Music respectively. The size of each Spectrograms was modified into 128x5168.
  • We use Sequential model based on Cho et al references. Below is modification from original one(data will be flow through Sequential model from top to bottom)

--

--