Deep Learning . Training with Huge amount of data . PART2

3 min readOct 10, 2017

Github link : https://github.com/cristianku/CarND-Behavioral-Cloning-Project3/blob/master/model.ipynb

If you read the first part we left at the point that we are goint to build a multi-thread, multiprocesses mechanism.

CPU1: reads a big chunk from disk ( 3200 samples ), shuffle it, and insert a memory Buffer ( Queue)

CPU2: the second one is the generator that extract a small amount from this buffer ( the batch size = 32 samples)

GPU: last one is the Keras/Tensorflow calculations done in the GPU

For this purpose we do use the Python Queue:

The Queue module implements multi-producer, multi-consumer queues.
It is especially useful in threaded programming when information must be exchanged safely between multiple threads.
The Queue class in this module implements all the required locking semantics.
It depends on the availability of thread support in Python; see the threading module.

Let’s define the queues:

One important thing to note about the Python Queue , is , that defining the max size, we can decide HOW MANY IMAGES, put in memory.

And the instruction <Queue name>.put will put data until the Queue is full and then wait until the Queue will be have free space.

In our case the put method will be wait for Tensorflow to get the data

First we need a Function that can be put in a threaded process :

And the generator function will become now very short :

This because the dirty work is done by the Queue loader → read_images_into_queue. In the generator we need only to send back to Tensorflow that Samples, as many as the batch_size

We will now create two threads ( processes )

The first for the Training samples and the second for the Validation Samples. They are completely separated from each other:

Remember to terminate the producers ( processes ) when you finish to use them , because if you remember the Queue Loader has a infinite loop ( while 1: ).

Now we are going to create the generators objects, normally, just using the Queues instead of the Pytables ( or instead of numpy arrays )

Last step: Train the Network

As you can see below we are using here two cores and the GPU is all the time around 70%

Here the training statistic:

Epoch 1/1
2807/2806 [==============================] - 29s - loss: 0.0368 - val_loss: 0.0271

Total number of train samples: 89820 ( shape 128x128)

Batch Size                   : 32

Duration                     : 0:00:29.998415
  
 .. model saved to selfdrive_model.h5

And… yes because the Intel i3 that I am using is Hyperthreading.. it nicely splits the work between the cores… But its also thanks all the asynchronous process that we have built that the each core usage is between 40–50 % !

Remember to terminate the Queue Loaders !:

training_producer.terminate()
validation_producer.terminate()

Deep Learning . Training with Huge amount of data . PART2

FOLLOW ON PART 3….

Written by Cristian Zantedeschi