Using LibROSA to extract audio features

KK GG
Tencent (Thailand)
Published in
3 min readFeb 15, 2019

This is a series of our work to classify and tag Thai music on JOOX. This part will explain how we use the python library, LibROSA, to extract audio spectrograms and the four audio features below. In later blog posts, we’ll explain how we build our classification models for Thai songs.

  1. Spectrogram
  2. MFCC
  3. Chromagram
  4. Tempo
  5. Beats

Here are all the necessary libraries:

First, let’s load an audio file. The file can be both .wav or .mp3; sr is the sampling rate which has a default value of 22,050.

We can listen to the audio using IPython.display:

  1. Mel-scaled power spectrogram

The mel scale (the name mel comes from the word melody) is a perceptual scale of pitches that are considered by human ears to be equal in distance from one another.

2. MFCC or Mel Frequency Cepstral Coefficient

You can visit here, for a detailed explanation of MFCC.

3. Chromagram

Chromagram tells us the intensity of each of the 12 notes at a specific point in time.

4. Tempo

Librosa’s tempo estimates the beats per minute (BPM) of the audio clip.

5. Beats

Beats are extracted in 3 stages, as explained on the Librosa document:

“Measure onset strength -> Estimate tempo from onset correlation -> Pick peaks in onset strength approximately consistent with estimated tempo”

These are just a few of the features that can be extracted using librosa. We primarily focus on them because we are currently experimenting with them to classify Thai music.

Please stay tuned for other parts where we will explore how the information we have extracted can help us tag music!

--

--

KK GG
Tencent (Thailand)

Bunny data scientist who likes throwing carrots