Understanding of Multilayer perceptron (MLP)

5 min readNov 21, 2018

Welcome to my new post. In this post, I will discuss one of the basic Algorithm of Deep Learning Multilayer Perceptron or MLP.

If you are aware of the Perceptron Algorithm, in the perceptron we just multiply with weights and add Bias, but we do this in one layer only.

We update the weight when we found an error in classification or miss-classified. Weight update equation is this…

weight = weight + learning_rate * (expected - predicted) * x

You can see the Python implementation of the Perceptron Algorithm here.

Now comes to Multilayer Perceptron(MLP) or Feed Forward Neural Network(FFNN).

In the Multilayer perceptron, there can more than one linear layer (combinations of neurons). If we take the simple example the three-layer network, first layer will be the input layer and last will be output layer and middle layer will be called hidden layer. We feed our input data into the input layer and take the output from the output layer. We can increase the number of the hidden layer as much as we want, to make the model more complex according to our task.

Feed Forward Network, is the most typical neural network model. Its goal is to approximate some function f (). Given, for example, a classifier y = f ∗ (x) that maps an input x to an output class y, the MLP find the best approximation to that classifier by defining a mapping, y = f(x; θ) and learning the best parameters θ for it. The MLP networks are composed of many functions that are chained together. A network with three functions or layers would form f(x) = f (3)(f (2)(f (1)(x))). Each of these layers is composed of units that perform an affine transformation of a linear sum of inputs. Each layer is represented as y = f(WxT + b). Where f is the activation function (covered below), W is the set of parameter, or weights, in the layer, x is the input vector, which can also be the output of the previous layer, and b is the bias vector. The layers of an MLP consists of several fully connected layers because each unit in a layer is connected to all the units in the previous layer. In a fully connected layer, the parameters of each unit are independent of the rest of the units in the layer, that means each unit possess a unique set of weights.

In a supervised classification system, each input vector is associated with a label, or ground truth, defining its class or class label is given with the data. The output of the network gives a class score, or prediction, for each input. To measure the performance of the classifier, the loss function is defined. The loss will be high if the predicted class does not correspond to the true class, it will be low otherwise. Sometimes the problem of overfitting and underfitting occurs at the time of training the model. In this case, Our model performs very well on training data but not on testing data. In order to train the network, an optimization procedure is required for this we need loss function and an optimizer. This procedure will find the values for the set of weights, W that minimizes the loss function.

A popular strategy is to initialize the weights to random values and refine them iteratively to get a lower loss. This refinement is achieved by moving on the direction defined by the gradient of the loss function. And it is important to set a learning rate defining the amount in which the algorithm is moving in every iteration.

Activation function:

Activation functions also known non- linearity, describe the input-output relations in a non-linear way. This gives the model power to be more flexible in describing arbitrary relations. Here are some popular activation functions Sigmoid, Relu, and TanH. I will describe these in my next blog.

Training the Model-

There are basically three steps in the training of the model.

Forward pass
Calculate error or loss
Backward pass

1. Forward pass

In this step of training the model, we just pass the input to model and multiply with weights and add bias at every layer and find the calculated output of the model.

2. Loss Calculate

When we pass the data instance(or one example) we will get some output from the model that is called Predicted output(pred_out) and we have the label with the data that is real output or expected output(Expect_out). Based upon these both we calculate the loss that we have to backpropagate(using Backpropagation algorithm). There is various Loss Function that we use based on our output and requirement.

3. Backward Pass

After calculating the loss, we backpropagate the loss and updates the weights of the model by using gradient. This is the main step in the training of the model. In this step, weights will adjust according to the gradient flow in that direction. For Depth understanding of the Backpropagation algorithm check this nice blog by Andrej Karpathy here.