Perceptron: Theory and Practice

Zihan Guo

Published in

Data Alchemist

4 min readMay 26, 2019

the gate towards neural network and deep learning…

That…

Introduction

It is not that. Neural Network is nothing close to that, so what is a neural network?

In a nutshell, a neural network is a mathematical model which structure mimics the mechanism of biological neural networks. It follows precise mathematical rules and is nowhere close to a strong A.I. at all.

Therefore, if we break down the model component by component, we then can fathom its complexity and to be able to use the model effectively.

Classification Problem

Almost always, we start with a binary classification problem. Say we want to be able to classify salmon from sea bass using two features: width and lightness. The classification problem is to come up with a mathematical equation that can differentiate the two. In other words, we want to come up with a decision boundary to separate dots on this plane such that dots or data points falling on one side of the line is salmon, where on the other side is sea bass.

image reference: https://mhesham.wordpress.com/tag/decision-boundary/

However, things are rarely simple. Often we don’t just have one feature or two features. Instead, we might have multiple features. The graph below shows how a non-linear decision boundary looks like in a three-dimensional space. In which case, data points inside the circle and outside of the circle are classified to be two different labels.

Image Reference: https://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

Perceptron

The simplest neural network is one layer neural network (yeah, there are layers, like onions…). There are four components of the perceptron shown below:

input layers (X, Y, bias) and output
weights (w0, w1, w2)
weighted sum equation
activation function (sigmoid in this case)

*Notice*: the X, Y here are just variables. Y is not the label, in this case. It is just a predictor in the input layer.

Image Source: https://www.cs.utexas.edu/~teammco/misc/perceptron/

Now, let’s take a moment to appreciate the beauty of nature. The design of a perceptron was inspired by the structure of neurons inside of our body. Dendrites are like the input layer (X, Y, bias), and the Axon is like the weighted sum using weights (w0, w1, w2) which then processed through the activation function to produce output response.

image source: https://biology.stackexchange.com/questions/9026/what-are-the-functions-and-differences-between-axons-and-dendrites

Perceptrons as Logical Operators

To take the theory into practice, let’s look at the AND operator. The combinations of two inputs and output can be seen in the table below.

# write w0, w1, w2 yourself to see if you can figure out their values
w0 = None
w1 = None
w2 = Noneinputs = [(0, 0), (0, 1), (1, 0), (1, 1)]
correct_outputs = [False, False, False, True]
outputs = []for i in inputs:
  outputs.append(int((w0 + w1*i[0] + w2*i[1])>=0))

There are several solutions. The simplest version is to set w0 = -1, w1 = 0.5,and w2 = 0.5 . To practice, write down weights for OR, NOT in a single percepton. How about XOR? Can you find a perceptron that can represent XOR?

It appears that a perceptron can only create a linear boundary. In order to represent XOR, we will have to construct multi-layer perceptrons or a neural network. However, before diving into multi-layer perceptron, we are missing an important concept: so far, we have been acting as the perceptron by thinking what weights we should assign to each input. However, in reality, we can’t do that with Gigabytes of data. The concept is to let the perceptron learns the weights themselves, so how should perceptron learn its weights

How does perceptron learn?

Well, if we can implement the perceptron algorithm, then the answer would be crystal clear, so let’s implement a perceptron. However, before we start coding, we need to know the algorithm itself (i.e. how does the model learns weights). Initially, we randomly assign weights (w0, w1, w2). Then we predict with these random weights. There are two outcomes: either the prediction is correct or incorrect. In case it is correct, we don’t need to update anything. However, when the prediction is incorrect, we then update weights. Assume that we predict it to be 1 but we have 0, then we would want to increase each weight by its corresponding input value times the learning rate. On the other hand, if we predict it to be 0 but we have 1, then we would want to decrease each weight by its corresponding input value times the learning rate. The bias term is simply added or substracted learning rate itself. Now the last question is: when should we stop. We need to give a num_epochs which represents the number of iteration we will go through before stopping.

Below is the code implementation (all credits of code go to Udacity Deep Learning).