# A Beginners Guide to Deep Learning

# INTRODUCTION

Machine-learning technology powers many aspects of modern society: from

web searches to content filtering on social networks to recommendations on e-commerce websites, and it is increasingly present in consumer products such as cameras and smartphones. Machine-learning systems are used to identify objects in images, transcribe speech into text, match news items, posts or products with users’ interests, and select relevant results of search.

Increasingly, these applications make use of a class of techniques called deep

learning.[5]

**Deep learning** (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data. In a simple case, you could have two sets of neurons: ones that receive an input signal and ones that send an output signal. When the input layer receives an input it passes on a modified version of the input to the next layer. In a deep network, there are many layers between the input and output (and the layers are not made of neurons but it can help to think of it that way), allowing the algorithm to use multiple processing layers, composed of multiple linear and non-linear transformations.[1][2][3][4][5][6][7][8][9]

Deep Learning has revolutionized the machine learning recently with some of the great works being done in this field. These methods have dramatically

improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. But, the ancient term “Deep Learning” was first introduced to Machine Learning by Dechter (1986)[10], and to Artificial Neural Networks (NNs) by Aizenberg et al (2000)[11]. It was further popularized by the development of Convolutional Networks Architecture by Alex Krizhevsky named ‘AlexNet’ that won the competition of ImageNet in 2012 by defeating all the image processing methods and creating a way for deep learning architectures to be used in Image Processing.[12]

# DEEP LEARNING ARCHITECTURE

**Generative deep architectures**, which are intended to characterize the

high-order correlation properties of the observed or visible data for

pattern analysis or synthesis purposes, and/or characterize the joint

statistical distributions of the visible data and their associated classes. In

the latter case, the use of Bayes rule can turn this type of architecture

into a discriminative one.**Discriminative deep architectures**, which are intended to directly provide discriminative power for pattern classification, often by characterizing the posterior distributions of classes conditioned on the visible data; and**Hybrid deep architectures**, where the goal is discrimination but is

assisted (often in a significant way) with the outcomes of generative

architectures via better optimization or/and regularization, or

discriminative criteria are used to learn the parameters in any of the

deep generative models in category 1) above. [13]

Despite the complex categorization of the deep learning architectures, the

one’s that are in practice are **deep feed-forward networks, Convolutionnetworks **and

**Recurrent Networks**.

# DEEP FEED FORWARD NETWORKS

**Deep feed-forward networks**, also often called **feed-forward neural networks**, or **multilayer perceptrons (MLPs)**, are the quintessential deep learning models.

The goal of a feed-forward network is to approximate some function f∗. For

example, for a classifier, y = f∗ (x) maps an input x to a category y. A feedforward network defines a mapping y=f(x;θ) and learns the value of the

parameters θ that result in the best function approximation.[1]

In simple terms, the network can be defined as input, hidden and output nodes with data coming in from input nodes, processing is done in hidden nodes and then the output is produced through output nodes. The information flows through the function being evaluated from x, through the intermediate computations used to define f, and finally to the output y. There are no feedback connections in which outputs of the model are fed back into itself and hence the models is called as feed-forward network. The model is shown in Figure [1].

# CONVOLUTION NEURAL NETWORKS

In machine learning, a **convolutional neural network** (CNN, or ConvNet) is a type of feed-forward artificial neural network in which the connectivity pattern between its neurons is inspired by the organization of the animal visual cortex.

Individual cortical neurons respond to stimuli in a restricted region of space

known as the **receptive field**. The receptive fields of different neurons partially overlap such that they tile the visual field.

The response of an individual neuron to stimuli within its receptive field can be approximated mathematically by a convolution operation.[15] Convolutional networks were inspired by biological processes[16] and are variations of multilayer perceptrons designed to use minimal amounts of preprocessing.[17]

They have wide applications in image and video recognition, recommender

systems[18] and natural language processing.[19]

**LeNet** was one of the very first convolutional neural networks which helped

propel the field of Deep Learning. This pioneering work by Yann LeCun was

named LeNet5 after many previous successful iterations since the year 1988.

At that time the LeNet architecture was used mainly for character recognition tasks such as reading zip codes, digits, etc.

There are four main components in a ConvNet shown in Figure 2 above:

1. Convolutional Layer

2. Activation Function

3. Pooling Layer

4. Fully Connected Layer

**Convolutional layer**

Convolutional Layer is based on the term ‘Convolution’, which is a

mathematical operation performed on two variables (f*g) to produce a third

variable. It is similar to cross-correlation. The input to a convolutional layer is a m x m x r image where m is the height and width of the image and r is the number of channels, e.g. an RGB image has r=3. The convolutional layer will have k filters (or kernels) of size n x n x q where n is smaller than the

dimension of the image and q can either be the same as the number of

channels r or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce k feature maps of size m−n+1.8 [20]

**Activation Function**

To implement complex mapping functions, activation functions are needed, that are non-linear in order to bring in the much needed non-linearity property that enables them to approximate any function. Activation functions are also important for squashing the unbounded linearly weighted sum from neurons.

This is important to avoid large values accumulating high up the processing

hierarchy. A lot of activation functions are present that can be used with some of the primarily used ones being sigmoid, tanh and ReLU.

**Pooling Layer**

Pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing it’s dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.

This is done to in part to help over-fitting by providing an abstracted form of

the representation. As well, it reduces the computational cost by reducing the number of parameters to learn and provides basic translation invariance to the internal representation.

Some of the most prominently used pooling techniques are **Max-Pooling, MinPooling **and** Average-Pooling**.

**Fully Connected Layer**

The term “Fully Connected” implies that every neuron in the previous layer is connected to every neuron on the next layer. The Fully Connected layer is a traditional Multi Layer Perceptron that uses a softmax activation function or any other similar function in the output layer. [21]

# RECURRENT NEURAL NETWORKS

In a traditional neural network we assume that all inputs (and outputs) are

independent of each other. But for many tasks that’s a very bad idea. If you

want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. [22]

A RNN has loops in them that allow information to be carried across neurons while reading in input. In the figure[4], x_t is some input, A is a part of the RNN and h_t is the output. Essentially you can feed in words from the sentence or even characters from a string as x_t and through the RNN it will come up with a h_t.s. Some of the types of RNNs are LSTM, Bidirectional RNNs, GRUs and more.

RNNs can be used in NLP, Machine Translations, Language Modeling, Computer vision, Video Analysis, Image generation, Image captioning, and so on due to the fact that any number of inputs and outputs can be fixed in a RNN making it one to one — many to many model possible. Some of the possible architectures are shown in figure[5] with possible explanation of the models.

# APPLICATIONS

There has been a lot of research going on in the field of Deep Learning and a

lot of unique questions are solved using deep learning models. Some of the

best applications of deep learning are:

**Colorization of Black and White Images**

Deep learning can be used to use the objects and their context within the

photograph to color the image, much like a human operator might approach

the problem. the approach involves the use of very large convolutional neural networks and supervised layers that recreate the image with the addition of color.[24][28]

**Machine Translations**

Text translation can be performed without any pre-processing of the sequence, allowing the algorithm to learn the dependencies between words and their mapping to a new language. Stacked networks of large LSTM recurrent neural networks are used to perform this translation.[25]

**Object Classification and Detection in Photographs**

This task requires the classification of objects within a photograph as one of a set of previously known objects.

State-of-the-art results have been achieved on benchmark examples of this

problem using very large convolutional neural networks. A breakthrough in this problem by Alex Krizhevsky et al.[12] results on the ImageNet classification problem called AlexNet.[28]

**Automatic Handwriting Generation**

This is a task where given a corpus of handwriting examples, generate new

handwriting for a given word or phrase.

The handwriting is provided as a sequence of coordinates used by a pen when the handwriting samples were created. From this corpus the relationship between the pen movement and the letters is learned and new examples can be generated ad hoc.[26][28]

**Automatic Game Playing**

This is a task where a model learns how to play a computer game based only

on the pixels on the screen.

This very difficult task is the domain of deep reinforcement models and is the breakthrough that DeepMind (now part of Google) is renown for achieving.[27] [28]

**Generative Model Chatbots**

A sequence to sequence based model was used to create a chatbot which learned to generate it’s own answers when trained on a lot of real live conversational datasets. To know more in detail, visit the link.

# CONCLUSION

It can be concluded from the article that the deep learning models can be used in a variety of tasks due to their capability of simulating the human brain. A lot of research has been done in the area and a lot of research is going to be done in near future. Although, trust issues are there at the moment, but things will be more clear in the near future.

# REFERENCES

- Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). Deep

Learning. MIT Press. Online - Deng, L.; Yu, D. (2014).”Deep Learning: Methods and Applications”(PDF).Foundations and Trends in Signal Processing.7(3–4): 1–199.doi:10.1561/2000000039.
- Bengio, Yoshua (2009).”Learning Deep Architectures for

AI”(PDF).Foundations and Trends in Machine Learning.2(1): 1–

127.doi:10.1561/2200000006. - Bengio, Y.; Courville, A.; Vincent, P. (2013). “Representation Learning: A

Review and New Perspectives”.IEEE Transactions on Pattern Analysis and

MachineIntelligence.35(8):17981828.arXiv:1206.5538.doi:10.1109/tpami.

2013.50. - Schmidhuber, J. (2015). “Deep Learning in Neural Networks: An

Overview”.NeuralNetworks.61:85117.arXiv:1404.7828.doi:10.1016/j.neun

et.2014.09.003. - Bengio, Yoshua; LeCun, Yann; Hinton, Geoffrey (2015). “Deep

Learning”.Nature.521: 436–444.doi:10.1038/nature14539.PMID26017442. - Deep Machine Learning — A New Frontier in Artificial Intelligence Research– a survey paper by Itamar Arel, Derek C. Rose, and Thomas P. Karnowski. IEEE Computational Intelligence Magazine, 2013
- Schmidhuber,Jürgen(2015).”DeepLearning”.Scholarpedia.10(11):32832.d

oi:10.4249/scholarpedia.32832. - Carlos E. Perez.”A Pattern Language for Deep Learning”.
- R. Dechter (1986), University of California, Computer Science

Department, Cognitive Systems Laboratory. - I. Aizenberg, N.N. Aizenberg, and J. P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet

classification with deep convolutional neural networks.”Advances in

neural information processing systems. 2012. - Deng,Li, Three Classes of Deep Learning Architectures and Their

Applications: A Tutorial Survey, Microsoft Research, Redmond, WA 98052, USA. - Feed forward Neural Network
- “Convolutional Neural Networks (LeNet) — DeepLearning 0.1

documentation”. DeepLearning 0.1. LISA Lab. Retrieved 31 August 2013. - Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). “Subject independent facial expression recognition with robust face detection using a convolutional neural network”(PDF). Neural Networks. (5): 555–559. doi:10.1016/S0893–6080(03)00115–1.
- LeNet-5, convolutional neural networks
- van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013–01- 01). Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., eds. Deep content-based music recommendation (PDF). Curran Associates, Inc. pp.2643–2651
- Collobert, Ronan; Weston, Jason (2008–01–01). “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask

Learning”.Proceedings of the 25th International Conference on Machine

Learning. ICML ’08. New York, NY, USA: ACM: 160–167.doi:10.1145/1390156.1390177. ISBN 978–1–60558–205–4. - CNN
- Intuitive Explaination ConvNets
- RECURRENT NEURAL NETWORKS TUTORIAL, PART 1 — INTRODUCTION TO RNNS
- RNN effectiveness
- Cheng, Zezhou, Qingxiong Yang, and Bin Sheng. “Deep colorization.”

Proceedings of the IEEE International Conference on Computer Vision.

2015. - Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence

Learning with Neural Networks.” arXiv preprint arXiv:1409.3215 (2014). - Graves, Alex. “Generating sequences with recurrent neural networks.”

arXiv preprint arXiv:1308.0850 (2013). - Mnih, Volodymyr, et al. “Playing atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602 (2013).
- Application Deep Learning
- Generative Model Chatbots
- Understanding LSTMs

To know about the cool things we do at Botsupply every week, follow the weekly posts by Giovanni Toschi every week.

Check out my other posts on chatbots here.