Building a Neural Network Zoo From Scratch: The Perceptron

Gavin Hull
6 min readOct 2, 2022

--

Visualization of the Perceptron from the Asimov Institute.

The Perceptron, widely considered the first neural network, was originally demonstrated by Cornell University Psychologist Frank Rosenblatt in 1958. Originally thought to be the saviour of early AI, it was quickly discovered that the Perceptron could only recognize the most basic patterns, like AND gates. It wasn’t until many years later that Multilayer Perceptrons would be invented, changing the course of machine learning and AI forever.

So, what is a Perceptron?

In simple terms, a Perceptron is a function that we can fine-tune over time to give us the result we’re looking for.

If you think back to your high school math class, this function should look familiar — it’s the equation for a line. This is why the Perceptron is considered a linear binary classifier: it’s a straight line on a graph that can separate two groups.

AND function (left) seperated by a potential Perceptron line & XOR function (right) which cannot be seperated by Perceptron line.

Here, then, lies the problem with the Perceptron: it can only differentiate between things that can be separated by a straight line. The AND function (shown on the left) can easily be separated; the XOR function, on the other hand, cannot. It is simply not possible to draw a straight line with the green X’s on one side and the red on the other.

How does a Perceptron work?

The Perceptron has two steps: a forward pass and backward pass. During the training phase, the Perceptron ‘function’ is called an arbitrary number of times (the forward pass), and each time it is called we compare the output that we get to the output we want, changing the function ever so slightly to get an answer closer to the one we were looking for (the backward pass).

This backward pass is called Backpropagation and is generally the bane of every new AI enthusiast’s existence. Backpropagation is a mess of high-dimensional matrices and multivariate calculus, but the easiest way to understand it is with computational graphs.

Step 1 of the computational graph for the Perceptron.

Computational graphs are very easy to make: reading left to right, each circle represents a function (in this case multiplication and addition), each variable going into a circle is an input to the aforementioned function, and any line coming from a circle is the function’s output. This particular graph represents our Perceptron, because it first multiplies x and W, and then adds b to the result.

Step 2 of the computational graph of the Perceptron.

Next, we will put the current function at each connection above the line in green to make things easier to understand.

Step 3 for the computational graph of the Perceptron.

The last step is the most difficult. Starting at the rightmost connection (the arrow coming from the addition function) the error will be written underneath. The error of any step can be calculated as follows:

  1. The last connection will just be e, which stands for error.
  2. The error of any other connection will be the upstream error (the error directly to the right) multiplied by the derivative of the upstream function (the function directly to the right) with respect to the connection you are calculating.
Error of b.
Error of Wx.
Error of W.
Error of x.

If any of this goes over your head, I recommend giving this video a watch, which will go into much more detail on the process of building and understanding a computational graph. You will need to be familiar with this process to understand neural networks.

Enough math… show me the code!

First, import NumPy. This will be the only library we use in this tutorial so that you can get an in-depth understanding of what’s going on behind the scenes of the Perceptron.

Import NumPy.

We will be using a class to represent our Perceptron to keep everything organized, but feel free to implement it however you like.

Perceptron class.

Our class will take three inputs: input_size, which is how large our input will be; num_epochs, which is how many times we would like to update our function (generally, more epochs means more accuracy!); and learning_rate, which is an extra constant that will change how quickly our Perceptron learns.

We also need to initialize our network. self.weights is our W, which will be set to a matrix of zeros the same size as our input. self.bias is our b which will be set to zero to start as well.

Forward propagation function.

Next is our forward()function, which will call our Perceptron function f(x). It takes an input appropriately named input, multiplies it with our self.weights variable (W) and adds self.bias (b) to it. The next line may look a little complicated to those of you who aren’t familiar with NumPy, but I assure you it’s straightforward. Because we’re trying to classify a binary function, we want our output to either be a 1 or a 0, so np.where(layer_output > 0, 1, 0) is going to return a 1 if layer_output is greater than 0 or it will return a 0 if it is not.

Backpropagation function.

Next is everyone’s favourite backpropagation function. Using our computational graph, we know that the error of self.weights (W) is going to be ex, or our error multiplied by our input, and the error of self.bias is simply e. As mentioned earlier, you should also use a learning rate to control how fast your network learns. This will be factored in by multiplying our errors by the learning rate.

Train & test functions.

Finally, the train() and test() functions. The train() function will take two inputs: inputs and labels. These are lists of inputs to our Perceptron function and their corresponding desired outputs. Using the zip() function, we iterate through each pair, call our forward() function with our input and calling backward() with our error being the difference between the output we want and the predicted output. If it isn’t immediately clear why this is how we calculate error, here is a table to show it more clearly.

Table of our output, desired output, error and the output + the error.

Now you see: when the error and prediction are summed, we get our desired output. This is why we add the error to the weights and biases in the backpropagation function: to get an output closer to what we’re looking for.

Perceptron initialization & utilization.

Finally to initialize our Perceptron, train it, and test it. In this example I used 1000 epochs (or training steps) and a learning rate of 0.01, but I would highly recommend playing around with those hyperparameters yourself to get a feel for why they’re there.

That concludes the first article in this series. Eventually, I’m hoping to write an article explaining each of the networks in the Asimov Institute’s Neural Network Zoo. The full code for this article can be found here. The next tutorial on Feed Forward Neural Networks can be found here.

Thanks to Emily Hull for editing!

--

--

Gavin Hull

I am a second year Computer Science & Pure Math student at Memorial University, I have been programming for ~7 years and I have a penchant for AI.