19-line Line-by-line Python Perceptron

Learning Machine Learning Journal #4

Skip the noise; get the code: here, “regression” type tests here.

So far, we’ve been doing a lot of learning, with not a lot of “machine.” Today, that changes, because we’re going to implement a perceptron in Python.

What makes this Python perceptron unique, is that we’re going to be as explicit as possible with our variable names and formulas, and we’ll go through it all, line-by-line, before we get clever, import a bunch of libraries, and refactor.

Before we begin, we’ll start with a little recap and summary.

Recap & Summary

In Learning Machine Learning Journal #1, we looked at what a perceptron was, and we discussed the formula that describes the process it uses to binarily classify inputs. We learned that the perceptron takes in an input vector, x, multiplies it by a corresponding weight vector w, and then adds it to a bias, b. It then uses an activation function, (the step function, in this case), to determine if our resulting summation is greater than 0, in order to to classify it as 1 or 0.


In Learning Machine Learning Journal #2, we looked at how we could use a perceptron to mimic the behavior of an AND logic gate. We walked through, and reasoned about, how to determine the values of the weight vector, w, and the bias, b, in order for our perceptron to accurately classify the inputs from the AND truth table.

In Learning Machine Learning Journal #3, we looked at the Perceptron Learning Rule. We learned that by using labeled data, we could have our perceptron predict an output, determine if it was correct or not, and then adjust the weights and bias accordingly. In the end, we ended up with two formulas to describe the perceptron:

f(x) = 1 if w · x + b > 0
0 otherwise
w <- w + (y - f(x)) * x

In Summary, we now have in our arsenal a classification algorithm.

Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels of new instances, based on past observations.

- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.

Supervised learning, is a subcategory of Machine Learning, where learning data is labeled, meaning that for each of the examples used to train the perceptron, the output in known in advanced.

When considering what kinds of problems a perceptron is useful for, we can determine that it’s good for tasks where we want to predict if an input belongs in one of two categories, based on it’s features and the features of inputs that are known to belong to one of those two categories.

These tasks are called binary classification tasks. Real-world examples include email spam filtering, search result indexing, medical evaluations, financial predictions, and, well, almost anything that is “binarily classifiable.”

Today, we’ll be continuing with AND:

 A   B  | AND 
--- --- |-----
1 1 | 1
1 0 | 0
0 1 | 0
0 0 | 0

The Code:

I would be remiss to say, “that’s it,” because it took me quite a bit of work to write these 19 lines (minus newlines), but when considering what these 19 lines can do, it’s kind of surprising that this is all it takes. Let’s walk through it.


import numpy as np

If you’re like me, not familiar with the numpy module, the only important thing to know here is that we’re using it to evaluate our dot product w · x during our summation. numpy lets us create vectors, and gives us both linear algebra functions and python list-like methods to use with it. We access its functions by calling them on np.

class Perceptron(object):

Here, we’re creating a new class Perceptron. This will, among other things, allow us to maintain state in order to use our perceptron after it has learned and assigned values to its weights.

def __init__(self, no_of_inputs, threshold=100, learning_rate=0.01):

In our constructor, we accept a few parameters that represent concepts that we looked at the end of Learning Machine Learning Journal #3.

The no_of_inputs is used to determine how many weights we need to learn.

The threshold, is the number of epochs we’ll allow our learning algorithm to iterate through before ending, and it’s defaulted to 100.

The learning_rate is used to determine the magnitude of change for our weights during each step through our training data, and is defaulted to 0.01.

The threshold and learning_rate variables can be played with to alter the efficiency of our perceptron learning rule, because of that, I’ve decided to make them optional parameters, so that they can be experimented with at runtime.

self.threshold = threshold        
self.learning_rate = learning_rate

These two lines set the threshold and learning_rate arguments to instance variables.

self.weights = np.zeros(no_of_inputs + 1)

Here, we initialize our weight vector. np.zeros(n), will create a vector with an n-number of 0’s. Here, we use the no_of_inputs, (which again, is number of inputs in our input vector, x), plus 1.

Remember in Learning Machine Learning Journal #3, we move our bias into the weight vector, so that we didn’t have to deal with it independently of our other weights? This bias is the +1 to our weight vector, and is referred to as the bias weight.

def predict(self, inputs):

Now, we define our predict method. This is the method we first looked at, way back in Learning Machine Learning Journal #1. This method will house the f(x) = 1 if w · x + b > 0 : 0 otherwise algorithm.

The predict method takes one argument, inputs, which it expects to be an numpy array/vector of a dimension equal to the no_of_inputs parameter that the perceptron was initialized with on line 5.

summation = np.dot(inputs, self.weights[1:]) + self.weights[0]

This is where the numpy dot product function comes in, and it works exactly how you might expect. np.dot(a, b) == a · b. It’s important to remember that dot products only work if both vectors are of equal dimension. [1, 2, 3] · [1, 2, 3, 4] is invalid. Things get a bit tricky here because we’ve added an extra dimension to our self.weights vector to act as the bias.

There are two options here, either we can add a 1 to the beginning of our inputs vector, like we discussed in Learning Machine Learning Journal #3, or, we can take the dot product of the inputs and the self.weights vector with the the first value “removed”, and then add the first value of the self.weights vector to the dot product. Either way works, I just happened to think that this way was cleaner.

We then store the result in the variable, summation.

if summation > 0:          
activiation = 1
activation = 0
return activation

This is our step function. It kind of reads like pseudocode: if the summation from above is greater than 0, we store 1 in the variable activation, otherwise, activation = 0, then we return that value.

We don’t need the temporary variable activation, but for now, the goal is to be explicit.

def train(self, training_inputs, labels):

Next, we define the train method, which takes two arguments: training_inputs and labels.

training_inputs is expected to be a list made up of numpy vectors to be used as inputs by the predict method.

labels is expected to be a numpy array of expected output values for each of the corresponding inputs in the training_inputs list.

In essence, the input vector at training_inputs[n] has the expected output at labels[n], therefore len(training_inputs) == len(labels).

for _ in range(self.threshold):

This creates a loop wherein the following code block will be run a number of times equal to the threshold argument we passed into the Perceptron constructor. If one hasn’t been passed in, it’s defaulted to 100 epochs. Because we don’t care to use an iterator variable, convention has us set it to _.

for inputs, label in zip(training_inputs, labels):

There are three important steps happening in this line:

  1. We zip training_inputs and labels together to create a new iterable object
  2. We loop through the new object
  3. While we iterate through, we store each elements in the training_inputs list into the inputs variable, and each of the elements in labels, in the variable label.

In the code block after this line, when we reference label, we get the expected output of the input vector stored in the inputs variable, and we do this once for every inputs/label pair.

prediction = self.predict(inputs)

Here, we pass the inputs vector into our previously defined predict method, and we store the result in the prediction variable.

self.weights[1:] += self.learning_rate * (label - prediction) * inputs

This is almost all of the learning rule implementation:

w <- w + α(y — f(x))x

We find the error, label — prediction, then we multiply it by our self.learning_rate, and by our inputs vector, we then add that result to the weight vector (with the bias weight removed), and store it back into self.weights[1:].

Remember that self.weights[0] is our bias weight, so we can’t add self.weights and inputs vectors directly, as they’re of different dimensions.

There were several options to take care of this, but I think the most explicit was is to mimic what we have done early, by only considering the vector created by “removing” the bias weight at self.weights[0].

We can’t just ignore the bias, so we deal with it next:

self.weights[0] += self.learning_rate * (label - prediction)

We update the bias in the same way as the other weights, except, we don’t multiply it by the inputs vector.


In just 19 lines of explicit code, we were able to implement a perceptron in Python!


Let’s put it to work and finally wrap up implementing AND

import numpy as np
from perceptron import Perceptron

First, we import numpy so that we can create our vectors, then we import our new perceptron.

training_inputs = []
training_inputs.append(np.array([1, 1]))
training_inputs.append(np.array([1, 0]))
training_inputs.append(np.array([0, 1]))
training_inputs.append(np.array([0, 0]))

Next, we generate our training data. These inputs are the A and B columns from the AND truth table stored in an array of numpy arrays, called training_inputs.

labels = np.array([1, 0, 0, 0])

Here, we store the expected outputs, or labels in the label variable, making sure that each label index lines up with the index of the input it’s meant to represent.

perceptron = Perceptron(2)

We instantiate a new perceptron, only passing in the argument 2 therefore allowing for the default threshold=100 and learning_rate=0.01. Note that such a large threshold and such a small learning rate probably isn’t needed, so feel free to play around to find what’s most efficient! What happens if learning_rate=10? What if threshold=2?

perceptron.train(training_inputs, labels)

Now we train the perceptron by calling perceptron.train and passing in our training_inputs and labels.

This should finish rather quickly. Even though there are 100 epochs, our training data is so small and numpy is very efficient!

inputs = np.array([1, 1])
#=> 1
inputs = np.array([0, 1])
#=> 0

That’s it! Now, we can start to use the perceptron as a logic AND!

It may seem a bit bizarre that we’ve trained our perceptron with four inputs and we only really need it to classify those four inputs. Is that all perceptrons are good for? No! Remember, perceptrons can be used to classify almost any number of binarily classifiable things, (though there are some major caveats, see below).

What would happen if you removed one of the training inputs? Removed two of them? Are you able to remove the [1, 1] training input? What other logic operators can you train the perceptron on? What happens if we add more inputs?

Test! Experiment! Play!


This concludes our AND implementation, so now is a good time to sum up everything we’ve learned.

Perceptrons were first published in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory. He proposed a rule that could automatically determine the weights for each of the artificial neuron’s input features, (one input vector example), by using supervised learning to determine a decision boundary, (see below), between two binary classes.

The perceptron classifies inputs by finding the dot product of an input feature vector and weight vector and passing that number into a step function, which will return 1 for numbers greater than 0, or 0 otherwise.

f(x) = 1 if w · x + b > 0
0 otherwise

In order to the determine the weights, the Perceptron Learning Rule:

  • Predicts an output based on the current weights and inputs
  • Compares it to the expected output, or label
  • Update its weights, if the prediction != the label
  • Iterate until the epoch threshold has been reached

To update the weights during each iteration, it:

  • Finds the error by subtracting the prediction from the label
  • Multiplies the error and the learning rate
  • Multiplies the result to the inputs
  • Adds the resulting vector to the weight vector
w <- w + α(y - f(x))x

Appendix and Further Exploration

There are a few concepts we haven’t touch on yet. Notably, the limitations of the perceptron.

The Perceptron Convergence Theorem is, from what I understand, a lot of math that proves that a perceptron, given enough time, will always be able to find a decision boundary between two linearly separable classes.

It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small. If the two classes can’t be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications — the perceptron would never stop updating the weights otherwise.

- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.

Linearly separable means that there exists a linear hyperplane, (line), that can separate input vectors into their correct classes; one class’ vectors falling on one side of the hyperplane, and the other class’, on the other.

In terms of our binary operator AND, linear separability means that:


We plot each of our A and B inputs, from our truth table, as points,(A, B), on a 2-D plane…


We could draw a single line on that plane in such a way so that all of the (A, B)points on one side of the line are the A and B inputs that give us 1, and all the points on the other side, give us 0.

Linear Separability of AND

Here is ourAND and its truth table:

( A , B ) | AND 
--- --- |-----
( 0 , 0 ) | 0
( 0 , 1 ) | 0
( 1 , 0 ) | 0
( 1 , 1 ) | 1

We see that all of the pairs of inputs that return 0 are red and on one side of the line, and the input that gives us 1, is on the other side of the line.

This is a graphical representation of what our perceptron does! Our perceptron defines a line to draw in the sand, so to speak, that classifies our inputs binarily, depending on which side of the line they fall on! This line is call the decision boundary, and when employing a single perceptron, we only get one.

In other words, if there is no single line that can separate our training data into two classes, our perceptron will never find weights that can satisfy all of our data. It doesn’t take long to hit this limitation. Take a look the XOR Perceptron Problem.

Perceptrons have gotten us pretty far, but we’re not done with them yet. Now that we’ve gotten our hands on some code, we can begin digging deeper into using Python as a tool to further explore machine learning and neural networks.

Next, we’ll refactor our perceptron code, take a look at how we can use our model to classify more complex data, and look at how to use tools like matplotlib to visualize decision boundaries.


Perceptron Convergence Theorem

Python Machine Learning — 2nd Ed. by Sebastian Raschka & Vahid Mirjalili

Single-Layer Neural Networks and Gradient Descent

10.2: Neural Networks: Perceptron Part 1 — The Nature of Code

Appendix F — Introduction to NumPy from Introduction to Artificial Neural Networks and Deep Learning A Practical Guide with Applications in Python by Sebastian Raschka

I'm a piece of work. Actor, Dancer, Climber, and Coffee Snob. | Software Consultant @ 8th Light