Introduction to LibROSA

Technocrat
CoderHack.com
Published in
3 min readSep 14, 2023

LibROSA is a Python package for audio and music analysis. It provides various functions to quickly extract key audio features and metrics from your audio files. LibROSA can be used to analyze and manipulate audio files in a variety of formats such as WAV, OGG, MP3, FLAC, etc.

Photo by Wes Hicks on Unsplash

Installing LibROSA

LibROSA can be installed using pip:

pip install librosa

It can also be installed using conda:

conda install -c conda-forge librosa

Loading Audio Files

LibROSA allows you to load various audio file formats. You can load an audio file like this:

import librosa

audio_data, sampling_rate = librosa.load('audio_file.wav', sr=22050)

This will load the WAV file and return the raw audio data along with the sampling rate.

Inspecting Audio

Once an audio file is loaded, you can inspect various properties of it:

Duration: You can get the duration of the audio in seconds using librosa.get_duration(). For example:

duration = librosa.get_duration(audio_data, sr=sampling_rate)

Sampling Rate: The sampling rate is the number of samples per second captured from the analog signal. It is returned when loading the audio, but can also be accessed with audio_data.sr.

Shape: The shape of the audio_data array represents (length of audio, number of channels). You can access it with audio_data.shape.

Plotting the Waveform: You can visualize the audio waveform using librosa.display.waveplot(). For example:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
librosa.display.waveplot(audio_data, sr=sampling_rate)
plt.show()

This will display an interactive matplotlib waveform plot, allowing you to view the raw audio visual representation.

Feature Extraction

LibROSA allows you to extract various audio features from your data. Some examples:

MFCC: Mel Frequency Cepstral Coefficients are a very commonly used feature for speech/music analysis. You can extract MFCC features with librosa.feature.mfcc():

mfcc = librosa.feature.mfcc(y=audio_data, sr=sampling_rate, n_mfcc=13)

This will return a 2D array of 13 MFCC values for each frame in the audio.

Chroma Features: Chroma features aim to capture the harmonic progression of an audio signal. You can extract chroma features with librosa.feature.chroma_cqt():

chroma = librosa.feature.chroma_cqt(y=audio_data, sr=sampling_rate)

Contrast: Spectral contrast features highlight regions of high spectral activity. You can compute contrast features with librosa.feature.spectral_contrast():

contrast = librosa.feature.spectral_contrast(y=audio_data, sr=sampling_rate)

Tonnetz: The tonnetz features map the chroma features into a six-dimensional space. They can be extracted with librosa.feature.tonnetz():

tonnetz = librosa.feature.tonnetz(y=audio_data, sr=sampling_rate)

Manipulating Audio

LibROSA provides various functions to manipulate your audio:

Resampling: You can resample an audio signal to a different frequency with librosa.resample():

new_audio = librosa.resample(audio_data, sr, new_sr)

Trimming: You can trim an audio signal to a shorter segment with librosa.trim():

new_audio = librosa.trim(audio_data, top_db=10, trim_db=20)

Joining: Multiple audio clips can be joined together with librosa.concatenate():

new_audio = librosa.concatenate([audio1, audio2, audio3], sr)

Fading: Fade-in and fade-out effects can be applied with librosa.fade():

faded_in_audio = librosa.fade(audio_data, fade_in_len) 
faded_out_audio = librosa.fade(audio_data, fade_out_len, fade_out=True)

Pitch Shifting: The pitch of the audio can be shifted with librosa.effects.pitch_shift():

new_audio = librosa.effects.pitch_shift(audio_data, sr, n_steps)

Time Stretching: The speed and tempo of audio can be changed with librosa.effects.time_stretch():

new_audio = librosa.effects.time_stretch(audio_data, sr, new_sr)

Conclusions

LibROSA is a powerful library for audio analysis and manipulation in Python. It can be used to extract features from audio files, manipulate audio in various ways, and build machine learning models for tasks like:

  • Speech recognition
  • Music genre classification
  • Instrument recognition

The capabilities discussed here only scratch the surface of what LibROSA can do. I hope this helps you get started with audio analysis in Python! Please let me know if you have any other questions.

--

--