Transfer Learning in TensorFlow on the Kaggle Rainforest competition

Luuk Derksen
6 min readJul 31, 2017

--

When I first noticed the Kaggle competition: “Planet: Understanding the Amazon from space” I was immediately thinking of trying out Transfer Learning using a pre-trained model. I had never really played with Transfer Learning before so I thought this would be a good one to try it out on. Transfer Learning is described by Wikipedia as:

“a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem”

where in this case the ‘relatedness’ of the problem is that both the Kaggle competition and the pre-trained model(s) are addressing computer vision problems. For more information on Transfer Learning there is a good resource from Stanfords CS class and a fun blog by Sebastian Ruder.

The model I choose to use is the ResNet50 model that was developed by Microsoft (original paper can be found here) and that “won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.” As can be seen from the original github repository the ResNet50 model is one of a number of models (some others are ResNet101, ResNet152) where the number denotes the depth of the model. For more information about how residual networks work recently Vincent Fung wrote this amazing writeup of the architecture and the recent innovations around it.

So lets see how we can get the pre-trained model, implement it in TensorFlow and use it to do some transfer learning!

Note: this notebook is very much for myself to keep track of what I did and why I did some steps. But otherwise: enjoy and I hope its useful

Pre-trained Model Weights

Keras has a number of implementations of the well known vision models in its github repository, but I wanted to try and build it myself to properly understand how to load weights into a TensorFlow graph. In the Keras implementation of ResNet50 it has a link to the weights it uses:

https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5

So those weights are the ones I downloaded to get started.

To load the data using python you will need to use the hp5y library:

import h5pyPARAMS_PATH = ''
PARAMS_FILE = PARAMS_PATH + 'resnet50_weights_tf_dim_ordering_tf_kernels.h5'
data_h5 = h5py.File(PARAMS_FILE, 'r')

The data_h5 object now holds all the weights and is accessible like you would access a dictionary. To get all the keys that actually hold weights (the activation functions will not for example) we’ll see if their dictionary holds any keys:

variables = [ key for key in data_h5.keys() if len(data_h5[key])>0 ]
print variables
[out]
[u'bn2a_branch1', u'bn2a_branch2a', ...,
u'res5c_branch2b', u'res5c_branch2c']

Each of these relates to a tensor and holds the variables associated with that tensor. To see what variables it holds use the key and:

print list(data_h5['bn2a_branch1'])[out]
[u'bn2a_branch1_beta:0', u'bn2a_branch1_gamma:0', u'bn2a_branch1_running_mean:0', u'bn2a_branch1_running_std:0']

which shows you that this is a tensor for Batch Normalization. You can now also inspect the shape or values of the tensor by:

print data_h5['bn2a_branch1']['bn2a_branch1_beta:0'].shape
print data_h5['bn2a_branch1']['bn2a_branch1_gamma:0'].value
[out]
(256,)
[ 0.30266267 1.10643625 1.773862
... 2.0589807 0.69311738]

Perfect, this gives us all the information we need to access the variables and load them into the graph. All we need is the graph itself.

TensorFlow model

To reconstruct the graph itself I first followed visualization by netscope which will give you all the information on the blocks you need to replicate the network. The network consists of two big building blocks: a convolutional block and an identity block that are both built up out of blocks of convolutional layers and batch normalization layers.

Before we can build the bigger blocks of the network we first need be able to easily create the convolution and batch-normalization layers.

Convolutional Layer for ResNet50
Batch Normalization layer for ResNet50

Notice how we are passing the data and the layer_name into the layers. This allows us to use the pre-trained weights (because that is using the layer_name as the keys for its dictionary) and set those as constants for the network (essentially creating a static network that you can not retrain since the constants are not variables.

Now that we have those two building blocks we can create the larger blocks. Lets start with the Identity block:

Identity block for ResNet50

Here we are passing a stage variable because we can then dynamically generate the layer_name from the stage. Similar to the Identity block we can now also create a function to provide us with the Convolutional block:

and this gives us all the building blocks we need to configure the actual model:

I have put a full notebook online that has all the code shown. I also tried out the model on some images to see if it is working properly:

It seems to be working great!

Transfer Learning

So, to the main bit: how can we now leverage this pre-trained network to use it on a different problem set (i.e., satellite images of the rainforest)? Can we use all or some of the layers of the original ResNet50 model and have it output the different classes for the satellite images…

From reading bits and pieces of how one could do transfer learning I assumed I had a few options to try out:

  1. Use the entire network up to the final connected layer and replace the top 1000 classes with the 17 classes from the Kaggle competition.
  2. Use the entire network up to the final connected layer and then extend the network (i.e., making it deeper) leading up to an output layer with 17 classes.
  3. Any of the two previous ones but also retraining or fine-tuning some of the original layers.

I thought the last option made most sense. Probably the network would benefit from keeping some of the early layers constant, but retraining the last stage of the network for example. To do this I had to augment my code from above and allow for locking a layer (i.e., having constant variables) or setting them as trainable variables so they can be updated. To do this I modified the convolutional layer and batch normalization layer to:

where I have added a lock argument that indicates if you want to lock the layer and use constants or if you want the layer to be trainable and use variables. In addition I modified the blocks to be passing on the lock argument:

So now we can pass on per stage/block if we want to make those layers available for fine-tuning. To do this I separated the learning into the fine-tuning and all the rest of the network that I added myself. To do this I am grabbing the variables separately for the stage-5 and the final stages (the part I added):

One final tweak I made was to pre-calculate the stage-4 output for all images in the dataset so that I would not have to feed-forward every image through the network for all the epochs. This saved me a lot of time during the training but has one massive downside: you can not do any image augmentation (e.g., rotation, flipping, brightness) on the fly, you would need to do that during the pre-calculation and that would mean saving the images multiple times. So I ended up training without any image augmentation and with only a few added layers and a lot of dropout in between those to avoid overfitting.

In the end it gave me a leaderboard score of 0.92125 on a single model, which is not too bad I guess without image augmentation. I did not explore further training as I assumed the competition would be won by (big) ensembles with heavy image augmentation, like the solution described by the team Urucu who finished at place 13 overall, and I would not stand a chance anyways ;).

--

--

Luuk Derksen

Co-founder / CTO of @orbiit_ai. Data (Scientist) junky. All views my own.