This is a series of our work to classify and tag Thai music on JOOX. This first part will explain how we use the python library, LibROSA, to extract audio spectrograms and the four audio features below. In later blog posts, we’ll use these as inputs for our classification models.
Here are all the necessary libraries:
First, let’s load an audio file. The file can be both .wav or .mp3; sr is the sampling rate which has a default value of 22,050.
We can listen to the audio using IPython.display:
- Mel-scaled power spectrogram
The mel scale (the name mel comes from the word melody) is a perceptual scale of pitches that are considered by human ears to be equal in distance from one another.
2. MFCC or Mel Frequency Cepstral Coefficient
You can visit here, for a detailed explanation of MFCC.
Chromagram tells us the intensity of each of the 12 notes at a specific point in time.
Librosa’s tempo estimates the beats per minute (BPM) of the audio clip.
Beats are extracted in 3 stages, as explained on the Librosa document:
“Measure onset strength -> Estimate tempo from onset correlation -> Pick peaks in onset strength approximately consistent with estimated tempo”
These are just a few of the features that can be extracted using librosa. We primarily focus on them because we are currently experimenting with them to classify Thai music.
Please stay tuned for Part 2 where we will explore how the information we have extracted can help us tag music!