Pre-Processing of Audio Data

Published in

AI Skunks

6 min readMar 27, 2023

INTRODUCTION:

Data preprocessing is crucial for successful machine learning models, as the accuracy and usefulness of results depend heavily on the quality of preprocessed data. In audio analysis and modeling, preprocessing is vital for converting complex and noisy raw audio data into a suitable format for further analysis. Techniques such as filtering, normalization, segmentation, feature extraction, and encoding are used to remove noise, extract relevant features, and improve analysis and modeling accuracy. Effective preprocessing is essential for achieving reliable results and enhancing the overall quality and usefulness of audio data analysis and modeling.

CONTENT:

Importing Libraries
Loading a data file
Spectogram
Creating an Audio Signal
Spectral Rolloff
Spectral Bandwidth
Zero Crossing Rate
MFCC
Chroma Frequencies

IMPORTING LIBRARIES

LIBROSA

Librosa is a Python module, it is a tool that can be used to analyze audio signals with a specific focus on music. It provides the necessary components to create a music information retrieval system. The software is well-documented, and it offers numerous examples and tutorials to assist users in utilizing its capabilities effectively.

Audio Data

This function returns an audio time series as a numpy array with a sample rate (sr) of 22KHZ mono as the default. This behavior can be changed by resampling at 44.1KHz.

VISUALIZING AUDIO:

We can plot the audio array using librosa.display.waveplot:

SPECTROGRAM

To visually display the energy levels of a signal at different frequencies over time, a spectrogram is used. It demonstrates the variations in energy levels over time as well as the relative strength of the frequencies present in a waveform. A spectrogram is a visual representation of the signal strength, or “loudness,” of a signal across time at different frequencies contained in a specific waveform. One can see not only if there is more or less energy at, say, 2 Hz vs 10 Hz, but also how energy levels change over time. Spectrograms are commonly represented as heat maps, which employ color or brightness to show the strength of the signal.

librosa.display.specshow can be used to display a spectrogram.

The function .stft() transforms data into a short term Fourier transform, which allows us to determine the amplitude of a given frequency at a specific time. By utilizing STFT, we can identify the amplitude of multiple frequencies that are present in an audio signal at a specific time. To display a spectrogram, we use the .specshow function. The frequency axis is represented on the vertical axis, ranging from 0 to 10kHz, while the time of the audio clip is shown on the horizontal axis. When all the activity appears to be taking place at the bottom of the spectrum, we can change the frequency axis to a logarithmic one.

CREATE AN AUDIO SIGNAL

The .spectral centroid function will produce an array with columns equal to the number of frames in your sample.

SPECTRAL ROLLOFF

The spectral rolloff is a parameter that describes the characteristics of a signal’s shape. It indicates the frequency at which the high frequencies start to diminish and eventually reach zero. In order to compute the spectral rolloff, it is necessary to determine the proportion of bins in the power spectrum where 85% of its power is concentrated at lower frequencies.

The librosa.feature.spectral_rolloff function is used to calculate the rolloff frequency for every frame within an audio signal.

SPECTRAL BANDWIDTH

Spectral bandwidth is a measure of the range of frequencies present in a signal. It refers to the width of the frequency band at the half of the peak amplitude or the full width at half maximum (FWHM). The two red lines and λSB on the wavelength axis represent the spectral bandwidth.

ZERO-CROSSINGS

The term refers to the speed at which a signal transitions from a positive value to zero, then to a negative value, or from a negative value to zero, and then to a positive value.

Mel-Frequency Cepstral Coefficients(MFCCs)

Mel-frequency cepstral coefficients (MFCC), which have 39 characteristics.The MFCC feature extraction approach consists of windowing the signal, performing the DFT(Discrete Fourier transform), calculating the log of the magnitude, and then warping the frequencies on a Mel scale prior to actually performing the inverse DCT(Discrete Cosine Transform).

CHROMA FEATURE

It is an effective method for analyzing music where pitch can be adequately categorized (generally into twelve categories) and whose intonation approximates the equitable scale.

CONCLUSION

In this research paper, we explore various pre-processing techniques for audio data that are important for applications such as source separation, speech recognition, and musical information retrieval. We discuss techniques for reducing noise and channel distortions, detecting vocal activity to distinguish speech from background noise, and improving source separation accuracy. We also demonstrate how to apply these techniques using the Librosa Python module, which is specifically designed for analyzing audio signals. The paper provides a comprehensive overview of audio pre-processing techniques that can help improve audio signal quality and source separation accuracy.

REFERENCES

Librosa: https://librosa.org/doc/latest/index.html

License

All code in this notebook is available as open source through the MIT license.

All text and images are free to use under the Creative Commons Attribution 3.0 license. https://creativecommons.org/licenses/by/3.0/us/

These licenses let people distribute, remix, tweak, and build upon the work, even commercially, as long as they give credit for the original creation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.