Deep Learning . Training with Huge amount of data . PART3
We left in the previous story with the following configuration:
CPU1: reads a big chunk from disk ( 3200 samples ), shuffle it, and insert a memory Buffer ( Queue)
CPU2: the second one is the generator that extract a small amount from this buffer ( the batch size = 32 samples)
GPU: last one is the Keras/Tensorflow calculations done in the GPU
Now I am thinking. Why don’t go further and load the data in parallel from different Disks/Cpus into the Memory Queue ?
First of all I have created two Pytables as split 50/50 of the original one:
Then I define the two parallel Queue Loaders:
I have added a further trick: A multi-workers generator. This is used by the Keras .fit_generator.
To do that it needs to be Thread-safe:
The only change in the Keras fit_generator is the added workers = 2
This is the result , the core utilization is very high !: