Transfer Learning/Data Augmentation for using ConvNets

Dharti Dhami
4 min readNov 24, 2018

--

Transfer Learning

Neural networks are difficult or finicky to replicate because a lot of details about tuning of the hyper parameters such as learning decay and other things that make some difference to the performance. It’s difficult to replicate someone else’s polished work just from reading their paper. Many times the implementation is open sourced on github.

git clone <repo>

One of the advantages of using github is that sometimes these networks take a long time to train, and someone else might have used multiple GPUs and a very large dataset to pre-train some of these networks. And that allows us to do transfer learning using these networks.

In all the different applications of deep learning, computer vision is one where transfer learning is something that one should almost always do unless we have an exceptionally large data set to train everything else from scratch ourselves.

Let’s say we are building a cat detector to recognize our own pet cat. Let’s say our cats are called Tiger and Misty and there’s also neither of them. We have a classification problem with three clauses. Is this picture Tigger, or is it Misty, or is it neither. Now, we probably don’t have a lot of pictures of Tigger or Misty so our training set will be small.

So let’s find some open source implementation of a neural network and download not just the code but also the weights. The ImageNet data sets have a thousand different clauses so the network might have a softmax unit that outputs one of a thousand possible clauses. What we can do is then get rid of the softmax layer and create our own softmax unit that outputs Tigger or Misty or neither.

In terms of the network, all the layers are frozen ie we use the exact same parameters/weights except for our own softmax layer.

One rule of thumb is if you have a larger label data set then freeze fewer layers. There are a couple of ways to do this. You could take the last few layers weights and just use that as initialization and do gradient descent from there or you can also blow away these last few layers and just use your own new hidden units and in your own final softmax outputs.

Finally, if you have a lot of data, one thing you might do is take the open source network and weights and use the whole thing just as initialization and train the whole network.

Data Augmentation

When you’re training in computer vision model, often data augmentation will help. And this is true whether you’re using transfer learning or using someone else’s pre-trained ways to start, or whether you’re trying to train something yourself from scratch.

Mirroring and random cropping are frequently used and in theory, you could also use things like rotation, shearing of the image, introduce various forms of local warping and so on.

The second type of data augmentation that is commonly used is color shifting.

So, given a picture let’s say you add to the R, G and B channels different distortions. The values R, G and B, are drawn from some probability distribution. And the motivation for this is that if maybe the sunlight was a bit yellow or maybe the in-goal illumination was a bit more yellow, that could easily change the color of an image, but the identity of the cat or the identity of the content, the label y, just still stay the same. And so introducing these color distortions or by doing color shifting, this makes your learning algorithm more robust to changes in the colors of your images.

And similar to other parts of training a deep neural network, the data augmentation process also has a few hyper-parameters such as how much color shifting do you implement and exactly what parameters you use for random cropping?

So, similar to elsewhere in computer vision, a good place to get started might be to use someone else’s open source implementation for how they use data augmentation.

Next: Summarizing state of Computer vision here.

--

--

Dharti Dhami

Mom, Tech Enthusiast, Engineering lead @Youtube Music.