Multi-Layer Perceptrons (MLPs)

nahid
Udacity PyTorch Challengers
5 min readJan 6, 2019

As with any new subject or topic, it is important to lay down the basics to set the tone, line of thought and context. The basics may or may not be essential to the bigger picture BUT it is a necessary starting point. And the starting point to this topic we are delving into was my previous article — What is a Perceptron?

As simple and straightforward a typical or linear perceptron is, in actual fact perceptrons are much more versatile than that. Versatility is almost a necessity as most AI case studies do not have only one solution, there’s always a better and more efficient way to approach problems.

The problem with linear perceptrons

Perceptrons do really well when the data or the situation we’re dealing with is linearly separable ie when the data can be easily classified or separated. Previously, we dealt with the cancer problem and while that was a great example to start with, any cancer expert will tell you that the data that determines the result of a cancer test is a lot more complicated than that.

Let’s take an even simpler example — University Admissions. The simplest way to approach this would be to select high school students with the best academic results. So, in doing this we can list out all the University applicants’ results. Arrange them in decreasing order and accept the top 100 students into the University. Which can easily be implemented by a linear perceptron.
But, there are many more factors that determine a student’s contribution to and performance at the University level for example — co-curricular activities, sports, financial capacity etc. And if we were to factor in a few of these more factors, our problem will get more and more complicated and less and less linear. Suddenly, linear perceptrons aren’t looking so great.

Basically, linear perceptrons will struggle at solving problems that aren’t so black and white ie non-linear data. Another great example that depicts this is how easily a perceptron can describe the typical boolean gates — AND & OR but fails at describing the XOR gate. More on this in the resources provided below.

Linear data (above), Non-linear data (below)

Multi-Layer Perceptrons (MLPs)

In order to solve more complex and non-linear problems we can begin by stacking linear perceptrons over each other. This network of stacked perceptrons or deep, artificial neural networks are known as Multi-Layer Perceptrons. “Deep” due to the presence of multiple layers. And “network” due to the this layered arrangement working together towards a common goal.

An MLP consists of an input layer, output layer and a fixed number of layers in between known as hidden layers. The number of these hidden layers vary depending on the application. Also note that just as in a singular or linear perceptron each node in the input and hidden layer have an associated weight attached to it.

Source: https://www.oreilly.com/library/view/getting-started-with/9781786468574/ch04s04.html
Source: https://www.kdnuggets.com/2015/04/preventing-overfitting-neural-networks.html

The above image is an illustration of different approaches one can take with an MLP to classify your data (more on these approaches in future articles!). Here I’d like to highlight the non-linearity in the data points (green & red dots). As you can clearly see in these images, it is impossible for a single layer perceptron or a linear perceptron to classify these data points by plotting a straight line because there is NO SUCH straight line that will separate them. Hence the use of an MLP, while not ideal it does an extremely good job at classifying most of these data points correctly.

MLPs in action!

Just like the linear perceptron, MLPs are often applied to supervised learning problems where the network is able to learn from the data given. Due the presence of multiple layers in the network, more precise relationships and connections can be drawn and deduced from that data that it’s given. This process of making sense of the data is called training. And this training is achieved by adjusting variables such as the weights and biases with the goal of minimizing an error function.

As you can imagine, there are many ways one can train their Artificial Neural Network (ANN) and one approach that we will discuss here is the Backpropagation Algorithm.
Remember the first time you rode a bicycle? That probably came about after at least falling over 10 times. So essentially, each time you fell over your brain kept figuring out what your mistakes were and every time you kept trying your slowly fixed those mistakes until eventually you were able to ride mistake-free.
Your brain’s ability to learn is essentially what the backpropagation algorithm does. It corrects the mistakes made by the ANN.

Let’s say you feed an ANN with some data at the input, this flows through your network, random weights and biases are assigned to each node and an output is calculated. This output is compared with the data initially present, given that this is a supervise learning problem. This comparison will naturally give rise to an error between the network’s result and the expected result in the data. The backpropagation algorithm now “propagates” or sends this error back into the network and based on this, the weights are adjusted in order to minimize this error. This process continues until it reaches a stopping condition which can either be a specific error threshold or a specific number of training rounds.
Consider your network, trained! Now it is capable to dealing with new or unfamiliar inputs and will likely give results/predictions that are in keeping with the data it trained on.

Further reading:

For more details on the math and coding -

--

--