Intelligent Signals : Nuts and Bolts of Applying Deep Learning by Andrew Ng

Sayon Dutta
5 min readMay 2, 2017

--

This article refers to the talk given by Andrew Ng at Bay Area Deep Learning School (held on 25th and 26th September, 2016). Andrew’s talk emphasised more on when and how to apply deep learning in different use cases. These days Artificial Intelligence is taking world like a storm and last half decade has witnessed deep learning becoming an integral part of the research as well as the applied arm of the AI arsenal.

Let’s dive into some important points which were discussed by him.

Firstly, there’s always a question that why Deep Learning now all of a sudden since the field isn’t new and the researchers have been exploring the field for over three decades. The main reason being is the scale at the present moment. Yes, the scale of data and the scale of computation power at this time of human civilisation.

It’s been over four decades of internet. Digital footprints of lots of businesses, researches, phenomenon etc. have grown over these four decades owing to betterment of storage technology and increases in computation power to process this massive amount of data. Today, as per our current state we have the power of utilise those massive amount of data to verify all the discoveries which best of the researchers have done since past three decades with heavy computation engines.

So, what do we need to implement deep learning ?

- Need a large amount of data

- Train a reasonably large network on that data

Why we need these two things, why not we implement a large neural network on small amount of data. These are the question which makes you dig deeper to realise the requirement of a technology to thrive. Think it in form of a data structure. You always tend to use those structure which are sufficient to handle that particular kind of value. e.g. you will not store a scalar value in a variable having a tensor data type.

Similarly, large networks tend to play a good role when they are able to create good representations of the input data. And these representations becomes distinct and able to comprehend a pattern/behaviour in the data having a high volume and wide variety.

Traditional Machine Learning algorithms after a certain time converges over time as they are not able to absorb.

Check the bottom left part of the graph, near the origin. This is the region where the data is not defined, i.e. relative ordering of the algorithms is not well defined here. Since the size of the data is small, therefore, representations are not that distinct. At this level, better feature engineering performs better. These hand engineered features fail with the increase in data size. That’s where deep neural networks comes to the picture as they are able to capture better representation owing the vast data size.

Point is clear here, you don’t just fit in a deep learning architecture to any data you encounter. There’s a volume and variety requirement for the data obtained. Sometimes data with small size works better with traditional Machine Learning algorithms.

Deep Learning further can be segregated into different buckets based on the area of research and application:

General Deep Learning : Densely connected layers/Fully connected networks

Sequence Models : Recurrent Neural Networks, Long Short Term Memory Networks, Gated Recurrent Units, etc.

Image Models 2D/3D(mainly spatial data) : Convolutional Neural Networks, Generative Adversarial Networks

Other : Unsupervised Learning, Reinforcement Learning, Sparse Encoding, etc.

Looking across the industry, value is mostly driven by the first three buckets but the fourth bucket is where the future of Artificial Intelligence lies.

Journey towards End to End Deep Learning?

Till now ML models were giving real numbers as output e.g. movie reviews(sentiment score), image classification(class object). But now apart from the numbers other type of outputs are being generated. e.g. Image Captioning(input: Image, output: text), Machine Translation(input: text, output: text), Speech Recognition(input: audio, output: text), etc.

After all this research and exploration, End to End Deep Learning is not the solution. This is because, for End to End Deep Learning to work properly it needs to incorporate a lot of labelled data.

e.g. Using traditional approach, if we have to find the age of a person from image, we need to incorporate many hand engineered features but on the other hand using CNN ( Convolutional Neural Networks) we might provide a end to end approach. But here’s the catch is the data enough with target labels to help a CNN model this problem accurately.

So when our model doesn’t work we try all weird approaches possible:

Fetch more data

Add more layers to Neural Network

Try some new approach in Neural Network

Train longer (increase the number of iterations)

Change batch size

Try Regularisation

Check Bias Variance trade-off to avoid under and overfitting

Use more GPUs for faster computation

Jumping on to the need of human level performance being commonly applied in Deep Learning. A human level accuracy becomes constant after some time converging to the highest possible to Optimal Error Rate(Bayes Error Rate i.e. lowest possible error rate for any classifier of a random outcome).

The reason behind this is that a lot of problems have theoretical limit in performance owing to the noise in the data. Therefore, human level accuracy is a good approach to improve your models by doing error analysis by incorporating human level error, training set error and validation set error to estimate bias variance effects and even getting more human labelled data.

These various efforts help us to benchmark the level of improvements with respect to each other and therefore, help us to take some crucial decision i.e. whether to invest in deep learning, or go with traditional approaches if they are crossing the threshold.

At the last, Andrew answered two most important and frequently asked questions:

Q. What AI can do ?

A. Possibility of AI to do a lot of things in future, presently big focus would be to automate most of the human tasks requiring less than a second and prediction the next event in a sequence.

Q. How to build career in ML ?

A. As far as career in ML is concerned, I would totally agree with Andrew i.e. to develop sheer interest for the field and learn things step by step through courses, reading papers and implement them in form of a code. Since, working in AI looks cool but it’s equally dirty.

He finally ended up on a positive note saying “Just like electricity transformed agriculture and energy industry. Similarly, AI has the power to transform almost all sectors.”

You can watch the whole talk below.

--

--