For Dummies — The Introduction to Neural Networks we all need ! (Part 1)
This is going to be a 2 article series. This article gives an introduction to perceptrons (single layered neural networks)
Update: Part2 of the series is now available for reading here!
Our brain uses the extremely large interconnected network of neurons for information processing and to model the world around us. Simply put, a neuron collects inputs from other neurons using dendrites. The neuron sums all the inputs and if the resulting value is greater than a threshold, it fires. The fired signal is then sent to other connected neurons through the axon.
Now, how do we model artificial neurons?
The figure depicts a neuron connected with n other neurons and thus receives n inputs (x1, x2, ….. xn). This configuration is called a Perceptron.
The inputs (x1, x2, …. xn) and weights (w1, w2, …. wn) are real numbers and can be positive or negative.
The perceptron consists of weights, summation processor and an activation function.
Note: It also contains a threshold processor (known as bias) but we will talk about that later!
All the inputs are individually weighted, added together and passed into the activation function. There are many different types of activation function but one of the simplest would be step function. A step function will typically output a 1 if the input is higher than a certain threshold, otherwise it’s output will be 0.
Note: There are other activation functions too such as sigmoid, etc which are used in practice.
An example would be,
Input 1 (x1) = 0.6
Input 2 (x2) = 1.0
Weight 1 (w1) = 0.5
Weight 2 (w2) = 0.8
Threshold = 1.0
Weighing the inputs and adding them together gives,
x1w1 + x2w2 = (0.6 x 0.5) + (1 x 0.8) = 1.1
Here, the total input is higher than the threshold and thus the neuron fires.
Training in perceptrons!
Try teaching a child to recognize a bus? You show her examples, telling her, “This is a bus. That is not a bus,” until the child learns the concept of what a bus is. Furthermore, if the child sees new objects that she hasn’t seen before, we could expect her to recognize correctly whether the new object is a bus or not.
This is exactly the idea behind the perceptron.
Similarly, input vectors from a training set are presented to the perceptron one after the other and weights are modified according to the following equation,
For all inputs i,
W(i) = W(i) + a*(T-A)*P(i), where a is the learning rate
Note: Actually the equation is W(i) = W(i) + a*g’(sum of all inputs)*(T-A)*P(i), where g’ is the derivative of the activation function. Since it is problematic to deal with the derivative of step function, we drop that out of the equation here.
Here, W is the weight vector. P is the input vector. T is the correct output that the perceptron should have known and A is the output given by the perceptron.
When an entire pass through all of the input training vectors is completed without an error, the perceptron has learnt!
At this time, is an input vector P (already in the training set) is given to the perceptron, it will output the correct value. If P is not in the training set, the network will respond with an output similar to other training vectors close to P.
What is the perceptron actually doing?
The perceptron is adding all the inputs and separating them into 2 categories, those that cause it to fire and those that don’t. That is, it is drawing the line:
w1x1 + w2x2 = t, where t is the threshold
and looking at where the input point lies. Points on one side of the line fall into 1 category, points on the other side fall into the other category. And because the weights and thresholds can be anything, this is just any line across the 2 dimensional input space.
Limitation of Perceptrons
Not every set of inputs can be divided by a line like this. Those that can be are called linearly separable. If the vectors are not linearly separable, learning will never reach a point where all vectors are classified properly. The most famous example of the perceptron’s inability to solve problems with linearly non-separable vectors is the boolean XOR problem.
The next part of this article series will show how to do this using muti-layer neural networks, using the back propagation training method.
If you enjoyed reading this article, hit the little green heart button to show your love!
To stay updated for the next article in the series, please follow :)
And if you want your friends to read this too, click share!