Network of Perceptrons, The need for a smooth function and sigmoid neuron

Shikhar Goswami
Analytics Vidhya
Published in
3 min readJun 20, 2020

A single perceptron can only represent linearly-separable boolean functions. Before doing anything, let’s first informally define what is meant by linearly-separable and linealry-inseparable functions.

A function f(y) = w.x + b is said to be linearly-separable if there exists a line (or a plane if it’s 3-D) separating a set of points (xi,yi) from another set of points (xj,yj) such that all the points (xi,yi) lie on or above the line, and points (xj,yj) lie below the line.

Now here’s the problem:

In XOR, w1≥-w0 and w2≥-w0 but their sum, w1+w2<-w0. Also, from the graph, we see that you can’t really separate all the X’s from all the O’s by a line. Therefore what we need is a network of perceptrons.

How a network of perceptrons solves this problem

Let’s consider a network of input layer, 1 hidden layer, and an output layer. Let’s also take weights b/w input and hidden layer as shown:

Output neuron will “fire” only if one of the four conditions satisfy. Also, there’s no contradiction in any two g(x) inequalities as was earlier. In other words, there is a separate neuron for each of the combinations of inputs such that all the points are nicely separated.

Need for smoothness

A perceptron has a very strict decision boundary. Meaning that, if the threshold θ=0.5, any g(x) which is close to 0.5, say 0.49, will result in y=0 and g(x)=0.51 will result in y=1. But, in real life, these values are almost the same and you would expect the same y for these values. We want a somewhat smooth function for this:

Now, there’s no longer an if-else condition in y w.r.t decision boundary. y can take any value b/w 0 and 1.

And we are particularly interested in sigmoid because it is a continuous and differentiable function. Hence, it’s easy to find weights that minimizes the error. Also, sigmoid is a representation of probabilty of an output and tells us how likely an output y is to happen given w and x.

A network of perceptrons can ‘represent’ any boolean function. In the same way a network of sigmoid neurons can ‘approximate’ any arbitrary function in the world. That’s why deep neural networks are so powerful!

Now that we have a function to relate all the x’s and y’s . We need our network to learn the weights to approximate and bring our predictions as close to the gorunf truth. Feed-forward neural networks and gradient descent algorithm in the next post!

Note: The concept of this article was based on videos of course CS7015: Deep Learning taught at NPTEL Online.

--

--