Audio Data Augmentation

Ali Buğra Kanburoğlu
4 min readMay 27, 2018

--

Data augmentation is generally used for machine learning and deep learning in order to achieve a good performance after training generating a large amount of data. There are some data augmentation techniques such as; image augmentation, audio augmentation. These are used for image processing and speech processing studies. On the internet, there are excessively stories, tutorials and codes about image augmentation techniques. Unlike image augmentation techniques, there is not enough information about audio data augmentation techniques.

In this post, I will show how to generate new audio files using an input audio file with some audio augmentation techniques. Firstly, we will start by importing dependency libraries which are given in following prerequisites part. In order to apply any technique on given audio file, we need to read it. To read input audio file, there is a python package which is “LibROSA” for music and audio analysis. By using librosa, we will read input audio file and apply some effects on it. Then, we will save new audio files as output, and show the waves of output sounds.

Prerequisites

Reading Audio File

After we import dependency libraries, we can start to create new class which is “AudioAugmentation”. The first method of this class will be “read_audio_file” method that takes one parameter as “file_path”. With the help of librosa library, given “wav” file can be loaded as below.

import librosa
import numpy as np
import matplotlib.pyplot as plt
class AudioAugmentation:
def read_audio_file(self, file_path):
input_length = 16000
data = librosa.core.load(file_path)[0]
if len(data) > input_length:
data = data[:input_length]
else:
data = np.pad(data, (0, max(0, input_length - len(data))), "constant")
return data

Effects on Audio File

For now, we write three methods to apply new effects on given audio file. These methods are “add_noise”, “shift” and “stretch”. In “add_noise” method, we add random noise which is generated by numpy library to given audio. In “shift” method, we shift given audio data by using numpy library again. Lastly, “stretch” method applies a time_stretch that belongs to librosa effects.

In following, you can see the python implementations of these three methods.

def add_noise(self, data):
noise = np.random.randn(len(data))
data_noise = data + 0.005 * noise
return data_noise
def shift(self, data):
return np.roll(data, 1600)
def stretch(self, data, rate=1):
input_length = 16000
data = librosa.effects.time_stretch(data, rate)
if len(data) > input_length:
data = data[:input_length]
else:
data = np.pad(data, (0, max(0, input_length - len(data))), "constant")
return data

Save Generated Audio Files

In order to apply these effects, we can generate new audio files. To store them in a folder to use in next studies, we can use “write_wav” function of librosa as in below. So, there will be new sounds which have new effects or noises in your dataset.

def write_audio_file(self, file, data, sample_rate=16000):
librosa.output.write_wav(file, data, sample_rate)

Now, we can create a new instance from “AudioAugmentation” class and call any method of it. (After reading input sound, we applied noise on it.)

aa = AudioAugmentation()# Read cat sound
data = aa.read_audio_file("data/cat.wav")
aa.plot_time_series(data)
# Adding noise to sound
data_noise = aa.add_noise(data)

Plotting Time Series

To see waves of generated sounds, we can write the following method by using matplotlib library.

def plot_time_series(self, data):
fig = plt.figure(figsize=(14, 8))
plt.title('Raw wave ')
plt.ylabel('Amplitude')
plt.plot(np.linspace(0, 1, len(data)), data)
plt.show()

Finally, we can call this “plot_time_series” method in order to show the waves of generated sound files. In below, for each sounds (raw cat sound, and sounds with effects), there are some wave plots with raw wave and amplitude.

Cat Sound Wave
Cat Sound Wave with Random Noise
Shifting the Cat Sound Wave
Stretching the Cat Sound Wave

Conclusion

In conclusion, we have learned how to augment a given audio file with some effects using a library. You can see all codes and files on my Github repository.

References

--

--