Deep Learning Book — Chapter 1 Introduction

Abhinav Singh
6 min readSep 26, 2018

“Thinking is a human feature. Will AI someday really think? That’s like asking if submarines swim. If you call it swimming then robots will think, yes.” — Noam Chomsky

This is part 1 of my The Deep learning book series. This series contains chapter wise summary of “The Deep Learning Book” by Aaron Courville, Ian Goodfellow, and Yoshua Bengio. This series contains only the summary along with an intuitive explanation of concepts but it cannot be used as a substitute.

Trying to build a machine that can think is a goal that is being dreamed since ages. Today also the goal of entire Artificial intelligence research community is to built General AI which can solve any problem given to it. The current solution which this book provides can be thought of as hierarchy of concepts i.e using simple concepts to build more complex concepts by structurally putting simple concepts in an hierarchical form. This deep structure of concepts arranged in an hierarchical way can be thought of as a reason why this field is called Deep Learning.

Computers can be best when it comes to tasks like playing chess or multiplying numbers but they can perform worse in simple task like object recognition or speech detection. The knowledge which is required to perform these simple tasks is difficult to present in an formal way, this is one of the key challenges in artificial intelligence. We have realized long before that this knowledge cannot be hard coded but can only be understood by finding patterns in raw data. This realization has given birth to the field of Machine Learning. A simple machine learning technique like logistic regression can be used to distinguish between spam and not spam e-mail. This can only be possible if we are able to understand a “general pattern” in data or if we are able to provide a definition in terms of features as what can be a spam e-mail, but these techniques heavily depends upon the representations in data. Consider this technique as a function which is defined in such a manner, that it is able to correlate features or representation in with desired output. This can be achieved by extracting features from the data and providing them to algorithm, but sometimes when solving some problem it is not possible to find or extract relevant features from the data this is where representation learning comes into picture.

Representation Learning tries to find solution to this problem, which is using machine learning not only to find correlation between representations with desired output but also to find representations itself. One of the classic example of representation learning is Autoencoders.

A Simple approach while designing features is to keep and separate all the influences which contribute in prediction of the output. Though it is necessary to capture all representations in data but it is also necessary to eliminate some representations, a classic example can be the angle from which a image is taken should not affect the objects present in that image. The idea of allowing computers to learn representations by themselves is one of the perspective of deep learning.

One key example of a deep learning model is a feed-forward deep MLP(Multi Layer Perceptron). Consider MLP as a mathematical function which maps inputs to output. Function can be composite i.e the MLP can contain multiple layers with each containing neurons. Each neuron tries to preserve certain representation of data, by making it multi-layer the neurons in later layers tries to preserve more complex representations build by hierarchical structuring simple representations. Fig. 1.0 illustrate a simple example on how a MLP will preserve representation.

Fig 1.0 source: https://www.deeplearningbook.org/

The above structure has shown two perspective, first the depth is used to preserve more complex representation, second how these representations are related. This structure can also be understood in terms of sequential instructions executed in order to predict class. In above MLP each neuron can be understood as a logistic unit where we multiply weight of each feature and pass it to a sigmoid function . Weights are used to define importance of each feature, a feature with a weight value less in magnitude means it has less contribution in predicting output. This is a simple example of what deep learning looks like. Fig 1.1 shows venn diagram representation of how different types of learning is related.

Fig. 1.1 source:https://www.deeplearningbook.org/

Learning deep learning requires a path which must be good in order to maximize the output. This book follows a certain curriculum illustrated in Fig. 1.2.

Fig. 1.2 source:https://www.deeplearningbook.org/

Most people which are new to deep learning tries to compare human brain with certain deep learning architectures which is not quite correct, though human mind is huge inspiration for deep learning researchers but no longer a guide for this field as our understand about human mind is not enough, but some rough conclusion which can be drawn are:

  1. human mind is made up of many units which are connected to each other in an hierarchical form.

2. Connection between some neurons are stronger than others and some are weak.

These patterns can also be seen in deep learning architectures, AN are connected with each other in an hierarchical structure and some connections are stronger than others.

Assuming deep learning architectures are eventually trying to simulate brain is not correct way of explaining the goal of deep learning , this may be true for the field of “computational science” but in case of deep learning the goal is to build systems that can use intelligence to solve task.

During 1980’s a movement or say a idea came into picture connectionism, the core idea behind that movement was if we are able to connect many computationally intelligent units together and keep on increasing size then we can achieve higher intelligence, which does not last very long but some of the key ideas under connectionism one of which is central part of deep learning is distributed representation. The idea of distributed representation is “each input to a system should be represented by
many features, and each feature should be involved in the representation of many possible inputs
” — Deep Learning Book. This can be understood by using simple example a neuron can be trained to get activate when it sees a particular color, but the same can also be trained to get activated upon seeing a color related to a particular object. The concept of distributed representation is discussed in chapter 15. Another idea from Connectionism movement was the use of back-propagation which is still in use to train deep learning models. At this time around mid 1990’s the AI research community began to make unrealistic assumptions regarding what a neural net’s were capable of doing, not able to fulfill these expectations became the reason for the fall of Neural Networks. This also has given boost in research of kernel based machines and graphical models as both achieved good results.

Rise of Deep Learning

After the fall of Neural Network era, it again came into picture in 2006 when Geoffrey Hinton showed a neural network called Deep-belief Network which was trained using greedy layer-wise approach. Another significant achievement was made in 2012 when a Convolution Neural net named Alexnet made significant reduction in top-5 error from 26.15% to 15.3% in ILSVRC. This was the turning point in field of Deep learning as a simple 12 layers defeated all previous image processing techniques with huge difference. Deep learning has proven to be one of the best techniques when trying to solve complex problems like learning sequence of characters or when trying to write content in memory cells using neural turing machines or when trying to play Atari using Q-Learning.

Some of the Major factors which contributed in the growth :

  1. Increase in size of Datasets.
  2. Increase in Computation power
  3. Increase in size of Deep Learning Models.

In General Deep Learning is an approach to machine learning that heavily depends on our knowledge of applied mathematics which is developed in past several decades.

Further Readings:

Alex Net: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

LSTM: https://www.bioinf.jku.at/publications/older/2604.pdf

Sequence to Sequence Learning with Neural Networks: https://arxiv.org/pdf/1409.3215.pdf

--

--

Abhinav Singh

Sr. Machine Learning Engineer @Goto group Ex- Snapdeal