Introduction to Deep Learning

Published in

bawilabs

5 min readSep 15, 2017

Continuing the topic presented in the article Desmitificando Machine Learning, some important concepts about Deep Learning will be presented here. Do not worry, we will not enter much into the math of the subject :P

First, let’s talk about the data set, which represents the data that will be used during training and evaluation of the network. The data set is divided into the training set, validation set, and test set.

The training set represents the data that will be used in the training step. In this step, a certain input passes through the network generating a response that will be compared with the expected result, allowing to estimate the error and the correction on the network weights through backpropagation (keep calm, we will define this term!).

The validation set entries are also used in the training step, but their output is not used to perform the network correction, it determines if the training is being performed efficiently and if it is occurring overfitting in training.

Finally, unlike the training set and the validation set, the test set is not used in the training stage. In fact, it should NEVER be used during training because it will be used as input only for the evaluation of the network’s accuracy.

A network conventionally presents 3 types of layers:

Input Layer

This layer represents the network input. It is important to understand that each node in this layer represents a feature of the input that will pass through the network. As an example, we can imagine a network that tries to determine the personality of a person, so the input would be a set of data about the person. In this way, each feature of this person would be a feature of that entry. In an array of network inputs, we have that each row represents an input and each column represents a feature.

Hidden Layer

In fact, this is not just one layer, because the structure of your network may have N hidden layers. This layer is responsible for abstracting the information contained in the inputs, in order to allow the network to interpret the data when applying weights to the inputs.

Each hidden layer is represented by an array of weights. This array has one column for each node of this layer and one row for each entry of this layer. In this case, the inputs can be the features of the input layer or the outputs of an earlier hidden layer.

Output Layer

This is the output layer of your network, in which the forecast or evaluation performed by your network is obtained. The number of nodes depends on the architecture of your network, and may be only a node in the case of a simple quantitative or binary evaluation, or have N nodes, so that each node represents a possible evaluation of its input.

This layer is also represented by an array of weights, having one column for each output node and one line for each input (remembering that in this case the Output Layer inputs are the outputs of a Hidden Layer).

Both the hidden layer and the output layer nodes can have an activation function. This function has as inputs: a bias and the sum of the multiplication of each input with its respective weight.

There are several types of activation functions, for example, the sigmoid, which is an exponential function whose output is limited between 0 and 1. Thus, if a node has an activation function, its output will be the result of this function.

Now let’s finally understand how the network learns!

During training, your network makes the predictions according to the input features. This prediction is compared to the expected result in order to analyze whether the network was successful or not. With this, we make use of a quadratic error function, which performs the sum of the square of the errors. That way, we have the cost of the network, which means how good or bad the network predictions are. The next step is to use the error to perform the backpropagation.

Backpropagation

This is the training step in which the weight corrections are performed in each of the weight matrices, both the output layer and the hidden layers.

It is this correction of the weights that performs the learning of the network, in this way, the network adapts to the context. This correction occurs with the propagation of the error in the opposite direction, that is, from the output layer to the first hidden layer, hence the name backpropagation. In this step, a variation for the weights is calculated through a technique called Descending Gradient.

Descending Gradient

Here is a partial derivative of the network error function related to the weights. That is, the variation of the network error is measured based on the weights, thus a vector is obtained to correct the weights that points to the minimum error region. Simply put, this technique allows the weights to be updated in the correct direction, avoiding corrections with random values. The downward gradient greatly increases the capacity of the network to converge to more satisfactory results.

In short, your network is represented by a series of input arrays and weights. During the training phase of your network, the learning takes place through the backpropagation and the descending gradient, which performs the correction of the weights present in the matrices of the hidden layers and the output layer. As training generations go through, the weights are being adjusted and the network adapts to the scenario. Finally, the network efficiency is analyzed by passing its test set by it.

I hope this article helps you understand a few more concepts of Deep Learning! Until the next!