How to do FastAI even faster

Speed up preprocessing using fastai’s built in ‘parallel’ function

Made-Up Masters
Mar 20 · 6 min read
The power of parallel processing. Photo by Marc-Olivier Jodoin on Unsplash

I am entering my first Kaggle competition, the already ended
Tensor Flow Speech Recognition Challenge, which involves processing
one-second raw audio clips to understand/predict which word is being said.

My method involves converting the raw audio to spectrograms before doing image classification. I had built a working and accurate model, and was excited to try it on the test set and make my first submission, when I saw the test set was 150,000+ wav files. I’m currently producing around 4 spectrograms/second at size 224x224. That’s around 10 hours, there must be a better way.

Immediately, multiprocessing and multithreading come to mind, but I don’t know how to do those, they look scary, and they require more learning when my plate is already really full. Luckily fastai has a “parallel” feature that Jeremy casually mentions in Lesson 7. Step one is always to check the docs.

In-notebook (basic) documentation for fastai.core.parallel

Show in docs [source][test]

Awesome, it looks simple as can be. I pass in a function and a collection of arguments to that function to be executed in parallel, and fastai handles the rest.

A Toy Example of Parallel

The function below will take a number and print out it’s square.

def print_square(num):
print(number ** 2)

Next we can generate a list that contains lists of numbers we want to get the sums of using a list comprehension.

num_list = [x for x in range(10)]
parallel(print_square, num_list)

Run this code and you’ll see the following:

It looks like it worked great, but where are our squares? We were supposed to print them out. If you read more closely here, you’ll see what is happening:

Full documentation for fastai.core.parallel

must accept both the value and index of each element”

The function you pass in needs to be of a special type, accepting only two arguments:

  1. A value (this is what contains, and is the normal argument to your function)
  2. The index* of in arr (note: your function doesn’t actually need to do anything with the index, it just needs to have it in the function definition)

* Sylvain Gugger informed me on the fastai forums that the index is required to make work. Also parallel is meant to be a convenience function for internal use, so if you’re more advanced, you can implement your own version using ProcessPoolExecutor

Let’s rewrite our function to accept an index. Again our function need not do anything with the index, and we could even replace it with in the function definition.

def print_square(num, index):

And now we try our call to parallel again…

It works! If you have a simple function that takes one argument, you’re done. You now know how to use fastai’s function to do it 2–10x faster!* Just alter the parameters to accept an index, and pass your function to with a collection of arguments.
*This is a toy example, and it is about 1000x slower to use parallel, see below for a real example with real benchmarks

What this looks like in practice, with a more realistic example…

Here is my actual code for generating and saving spectrograms.
Original code provided by John Hartquist and Kajetan Olszewski
TLDR: Read wav file at src_path/fname, create spectrogram, save to dst_path.

def gen_spec(fname, src_path, dst_path):
y, sr = librosa.load(src_path/fname)
S = librosa.feature.melspectrogram(y, sr=sr, n_fft=1024,
hop_length=512, n_mels=128,
power=1.0, fmin=20, fmax=8000)
plt.figure(figsize=(2.24, 2.24))
pylab.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])
librosa.display.specshow(librosa.power_to_db(S, ref=np.max),
y_axis='mel', x_axis='time')
save_path = f'{dst_path/fname}.png'
pylab.savefig(save_path, bbox_inches=None,pad_inches=0,dpi=100)

Before we go any further, let’s peek at the source code for parallel.

Source code for fastai.core.parallel, not nearly as scary as it looks

It looks intimidating, but all that is really happening here is that we are getting each value from our collection of arguments and storing it in and also storing the index of said value in and then calling
For us, this means will call for every fname in the collection (in our case, a list) that we submit, and it will handle the parallel processing for us resulting in much faster processing.

But what do we do if our function accepts more than one argument?

As you can see, our function takes 3 arguments, and is expecting a function that takes two. The solution depends on whether our additional arguments are always the same like a filepath or a constant, or whether they will vary.

A. If the additional arguments are fixed/static, make a new function with default values, or use python’s to create a function that fits parallels model. I prefer using so that’s what I’ll demonstrate below.

B. If you have multiple arguments that are going to change with each function call, pass them as a tuple of arguments, and then unpack them.

Solution A: All 150,000 of my wav files are located in the same , and I will be outputting all spectrograms to the same , so the only argument changing is fname. This is the perfect place to use a

Since doesn’t name explicitly in the function call, it always needs to be the 2nd argument in our definition. Let’s fix that. Now we have:

def gen_spec(fname, index, src_path, dst_path):

Next we make a new function by passing in our static paths

gen_spec_partial = partial(gen_spec, src_path=path_audio, 

That’s it, we’re done. Let’s create 1000 spectrograms using both on it’s own, and and compare how long it takes.

296 seconds without , 104 seconds with , nearly 3x faster.

Solution B: Now for our final case, what do we do if our additional arguments aren’t static? We rewrite our function to accept a tuple of arguments, and an index, and then we pass a collection of tuples containing the arguments. For our spectrogram example, that looks like this:

def gen_spec_parallel_tuple(arg_tuple, index):
fname, src_path, dst_path = arg_tuple
# the remaining code is the same and has been omitted

We then pack all the arguments we want to pass into a tuple of the proper size, and then pass and our to

It works! Now you know how to take functions with an arbitrary number of arguments and run them in parallel to speed up your preprocessing and spend more time training.

Thanks to Sanyam Bhutani.

Made-Up Masters

Written by

All the skills of going back to school, with none of the debt.