Extract VGG16 bottleneck features of 6000 images without GPU

I have recently started taking the great Jeremy Howard’s Deep learning for coders Part 1 MOOC course. I cannot express enough my appreciation for his great work and contribution to the data science community, and, it’s for free! He just cannot get more awesome.


Deep learning without GPU, why am I doing this to myself ?

A brief introduction of myself, I am currently a data scientist intern living in Paris, France. My mission of this internship is essentially an image classification/recognition task. So the knowledge I applied are taught in Lesson 1 until Lesson 3/4 of the MOOC. However, the difficult part of this mission is not the algorithm itself, but to do everything on a single laptop machine (Windows 7, 8GB RAM, CPU) of which I don’t have administrative privileges. Picture the scene when a 7-year Mac user whose previous Windows experience was during Windows Vista period receiving a Windows laptop first day of work and then spending the first week at IT office trying to install all necessary data science tools. Hands up if you feel the pain !

When I realise I will be using Windows full-time to do data science work for 6 months

Alright, so what does it mean to do deep learning on a laptop ?

Speed, oh the speed…

I hadn’t done much deep learning work except for taking MOOCs before I started my internship. When you take any deep learning course, the first thing the instructor does is to make sure you set-up a AWS instance (we all agree, right?) and then it comes the whole theory about how matrix calculation is done faster on GPU than CPU. There is even this impressive demonstration video from NVIDIA that for sure will blow your mind if you have not already watched it. And they are totally right, in my case, to fit a 4 convolutional layer model with 18946 images took Jeremy 159 seconds per epoch using AWS p2 instance, whereas 5823 images and 2223 seconds (37 mins) on my machine.

Memory issue when transfer learning

Sometimes, not only the speed but also the memory is an issue. This other day I was running this one line of code which pre-computes the convolutional features of VGG16 model.

conv_feat = conv_model.predict_generator(batches, batches.samples)

conv_model is a Keras Sequential model with all the convolutional layers of VGG16 model. batches generates images by groups of a pre-set size (I used batch_size=50 ) from my training image set. So basically what it does is to pre-compute the features recognised by the pre-trained (with ImageNet images) convolutional layers of VGG16, and these features of shape (1, 512, 14, 14) for each image will be input for the customised dense layers (fully-connected layers) which will be trained to classify your own image set. This is one of Transfer learning techniques, learn more about it here.

The problem is that, each image will create a Numpy array shaped (1, 512, 14, 14), which makes conv_feat.shape = (5823, 512, 14, 14). That’s 584,349,696 of 32 bit float numbers already. On top of that, your convolutional model is computing the features of 50 images at a time with millions of hyperparameters (use conv_model.summary to check the exact number of hyperparameters if you are curious). You can do the math, but I’m afraid that it wouldn’t be possible to run this code without exploding my 8GB RAM memory.

And that comes to the whole point of this blog, sorry this blog is a bit lengthy (it’s my first time writing blog). This is when I had to invent this piece of wheel that I would like to share with everyone with the same pain.

This is definitely not the fastest algorithm out there, but it manages to extract VGG16 bottleneck features of 5823 images without using GPU successfully.

This is it for my first blog. Thank you for reading until here. Happy learning!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.