How ‘deep’ should it be to be called Deep Learning?

Deep Learning is everywhere now. It is the bleeding edge of AI, and everyone seems to be pursuing it.

When we first try to grasp the concept of Deep Learning, there is one question that usually comes up,

“How deep a Machine Learning model needs to be for it to be considered a Deep Learning model?”

This may sound like a valid question. After all, in Deep Learning we are using deeper and more complex models.

It turns out, we are asking a wrong question. We need to look at Deep Learning from a different angle to see why.

Let’s take a couple of definitions for Deep Learning.

“A sub-field within machine learning that is based on algorithms for learning multiple levels of representation in order to model complex relationships among data. Higher-level features and concepts are thus defined in terms of lower-level ones, and such a hierarchy of features is called a deep architecture” — Deep Learning: Methods and Applications.
“The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.” — Deep Learning. MIT Press, Ian Goodfellow and Yoshua Bengio and Aaron Courville.

These talk of a concept called Hierarchical Feature Learning. In order to understand it, let’s take a step back and see how a Deep Learning model works.

Let’s take Convolutional Neural Networks as an example.

Convolutional Neural Networks are a prime example for Deep Learning. They were inspired by how the neurons are arranged in the visual cortex (the area of the brain which processes visual input). Here, not all neurons are connected to all of the inputs from the visual field. Instead, the visual field is ‘tiled’ with groups of neurons (called Receptive fields) which partially overlap each other.

Convolutional Neural Networks (CNNs) work in a similar way. They process in overlapping blocks of the input using mathematical convolution operators (which approximates how a receptive field works).

A Convolutional Neural Network

The first convolution layer of a typical CNN uses a set of convolution filters to identify a set of low level features from the input image. These identified low level features are then pooled (from the pooling layers) and given as the input to the next convolution layer, which uses another set of convolution filters to identify a set of higher level features from the lower level features identified earlier. This continues for several layers, where each convolution layer uses the inputs from the previous layer to identify higher level features than the previous layer. Finally, the output of the last convolution layer is passed on to a set of fully-connected layers for the final classification.

In essence, the convolution filters of a CNN attempts to identify lower-level features first, and use those identified features to identify higher-level features gradually through multiple steps.

This is the Hierarchical Feature Learning we talked about earlier, and it is the key of Deep Learning, and what differentiates it from traditional Machine learning algorithms.

Hierarchical Feature Learning

A Deep Learning model (such as a Convolutional Neural Network) does not try to understand the entire problem at once.

I.e. it does not try to grasp all the features of the input at once, as traditional algorithms tried to do.

What it does is look at the input piece by piece, and derive lower level patterns/features from it. It then uses these lower level features to gradually identify higher level features, through many layers, hierarchically.

This allows Deep Learning models to learn complicated patterns, by gradually building them up from simpler ones. This also allows Deep Learning models to comprehend the world better, and they not only ‘see’ the features, but also see the hierarchy of how those features are built upon.

And of course, having to learn features hierarchically means that the model must have many layers in it. Which means that such a model will be ‘deep’.

That brings us back to our original question: It is not that we call deep models as Deep Learning. It is that, in order to achieve hierarchical learning the models need to be deep. The deepness is a by-product of implementing Hierarchical Feature Learning.

Hierarchical Feature Learning is what allows Deep Learning models to not to have a “Plateau in Performance” as in traditional Machine Learning models.

The (lack of) Plateau in Performance in Deep Learning

So, how do we identify whether a model is a Deep Learning model or now?
Simply, if the model uses Hierarchical Feature Learning — identifying lower level features first, and then build upon them to identify higher level features (e.g. by using convolution filters) — then it is a Deep Learning model. If not, then no matter how many layers your model has then it’s not considered a Deep Learning model.

Which means that a neural network with a 100 fully-connected layers (and only fully-connected layers) wouldn’t be a Deep Learning model, but a network with a handful of convolutional layers would be.

If you would like to learn more about Deep Learning, check out my book: Build Deeper: Deep Learning Beginners’ Guide at Amazon.

Originally published at on September 5, 2017.