NanoNets : How to use Deep Learning when you have Limited Data

Sarthak Jain
Jan 30, 2017 · 9 min read

Disclaimer: I’m building nanonets.com to help build ML with less data

Part 2 : Building Object Detection Models with Almost no Hardware

I think AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket you need a huge engine and a lot of fuel.

The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms. — Andrew Ng

There has been a recent surge in popularity of Deep Learning, achieving state of the art performance in various tasks like Language Translation, playing Strategy Games and Self Driving Cars requiring millions of data points. One common barrier for using deep learning to solve problems is the amount of data needed to train a model. The requirement of large data arises because of the large number of parameters in the model that machines have to learn.

A few examples of number of parameters in these recent models are:

Details of Deep Learning Models

Neural Networks aka Deep Learning are layered structures which can be stacked together (think LEGO)

Size(Model) ∝ Size(Data) ∝ Complexity(Problem)

What AlexNet sees at every step

Transfer Learning to the Rescue!

Qiang Yang, Sinno Jialin Pan, “A Survey on Transfer Learning”, IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 1345–1359, October 2010, doi:10.1109/TKDE.2009.191

Transfer Learning is like the best kept secret that nobody is trying to keep. Everybody in the industry knows about it but nobody outside does.

Google Trends Machine Learning vs Deep Learning vs Transfer Learning

Referring to Awesome — Most Cited Deep Learning Papers for the top papers in Deep Learning, More than 50% of the papers use some form of Transfer Learning or Pretraining. Transfer Learning becomes more and more applicable for people with limited resources (data and compute) unfortunately the idea has not been socialised nearly enough as it should. The people who need it the most don’t know about it yet.

If Deep Learning is the holy grail and data is the gate keeper, transfer learning is the key.

With transfer learning, we can take a pretrained model, which was trained on a large readily available dataset (trained on a completely different task, with the same input but different output). Then try to find layers which output reusable features. We use the output of that layer as input features to train a much smaller network that requires a smaller number of parameters. This smaller network only needs to learn the relations for your specific problem having already learnt about patterns in the data from the pretrained model. This way a model trained to detect Cats can be reused to Reproduce the work of Van Gogh

Another major advantage of using transfer learning is how well the model generalizes. Larger models tend to overfit (ie modeling the data more than the underlying phenomenon) the data and don’t work as well when you test it out on unseen data. Since transfer learning allows the model to see different types of data its learning underlying rules of the world better.

Think of overfitting as memorizing as opposed to learning. — James Faghmous

Data Reduction because of Transfer Learning

Calculating the number of parameters needed to train for this problem using transfer learning:

No of parameters = [Size(inputs) + 1] * [Size(outputs) + 1]

= [2048+1]*[1+1]~ 4098 parameters

We see a reduction in number of parameters from 1.4*10⁸ to 4*10³ which is 5 orders of magnitude. So we should be fine collecting less than hundred images of dresses. Phew!

If your impatient and can’t wait to find out the actual color of the dress, scroll down to the bottom and see how to build the model for dresses yourself.

A step by step guide to Transfer Learning — Using a Toy Example for Sentiment Analysis

  1. 62 have no assigned sentiment, these will be used to pretrain the model
  2. 8 have sentiment assigned to it, these will be use to train the model
  3. 2 have sentiment assigned to it, these will be used to test the models

Since we only have 8 labelled sentences (sentences that have sentiment associated with them) we first pretrain the model to just predict context. If we trained a model on just the 8 sentences it gives a 50% accuracy (50% is as good as flipping a coin to predict).

To solve this problem we will use transfer learning, first training a model on 62 sentences. We then use a part of the first model and train a sentiment classifier on top of it. Training on the 8 sentences it produces 100% accuracy when testing on the remaining 2.

Step 1

Step 2

Step 3

Instead of using sentences directly, we set the vector of the sentence to the average of all out its words (in actually tasks we would use something like an LSTM instead). The sentence vector will be passed as an input and the output will be score of being positive or negative. We will use one hidden layer in between and train model on our labelled sentences. As you can see, only on 10 examples of each, we have achieved 100% test accuracy using this model.

Even though this is a toy example we can see the very significant accuracy improvment going from 50% -> 100 using Transfer Learning. To see the entire example and code check here:

https://gist.github.com/prats226/9fffe8ba08e378e3d027610921c51a78

Some Real Examples of Transfer Learning

In Text: Zero Shot Translation, Sentiment Classification

Difficulty implementing Transfer Learning

Some of the issues with transfer learning are listed below:

  1. Finding a large dataset to pretrain on
  2. Deciding which model to use for pretraining
  3. Difficult to debug which of the two models is not working
  4. Not knowing how much additional data is enough to train the model
  5. Difficulty in deciding where to stop using the pretrained model
  6. Deciding the numer of layers and number of parameters in the model used on top of the pretrained model
  7. Hosting and serving the combined models
  8. Updating the pretrained model when more data or better techniques becomes available

Finding a data scientist is hard. Finding people who understand who a data scientist is, is equally hard. — Krzysztof Zawadzki

NanoNets make Transfer Learning easier

Transfer Learning with NanoNets (architecture is only for representation)

Building your first NanoNet (Image Classification)

2. With one click we Search the Web and build a model (you can also Upload Your own Images)

3. Solve the Mystery of the Blue vs Gold Dress (Once the model is ready we give you an easy to use web interface to upload a test image as well as a language agnostic API)

Get started building your first NanoNet at nanonets.com

NanoNets

NanoNets: Machine Learning API

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store