Deep Learning . Training with Huge amount of data . PART3

Cristian Zantedeschi
2 min readOct 11, 2017

--

We left in the previous story with the following configuration:

CPU1: reads a big chunk from disk ( 3200 samples ), shuffle it, and insert a memory Buffer ( Queue)

CPU2: the second one is the generator that extract a small amount from this buffer ( the batch size = 32 samples)

GPU: last one is the Keras/Tensorflow calculations done in the GPU

Now I am thinking. Why don’t go further and load the data in parallel from different Disks/Cpus into the Memory Queue ?

First of all I have created two Pytables as split 50/50 of the original one:

Then I define the two parallel Queue Loaders:

I have added a further trick: A multi-workers generator. This is used by the Keras .fit_generator.

To do that it needs to be Thread-safe:

The only change in the Keras fit_generator is the added workers = 2

This is the result , the core utilization is very high !:

--

--