So far, we’ve been doing a lot of learning, with not a lot of “machine.” Today, that changes, because we’re going to implement a perceptron in Python.
What makes this Python perceptron unique, is that we’re going to be as explicit as possible with our variable names and formulas, and we’ll go through it all, line-by-line, before we get clever, import a bunch of libraries, and refactor.
Before we begin, we’ll start with a little recap and summary.
Recap & Summary
In Learning Machine Learning Journal #1, we looked at what a perceptron was, and we discussed the formula that describes the process it uses to binarily classify inputs. We learned that the perceptron takes in an input vector,
x, multiplies it by a corresponding weight vector
w, and then adds it to a bias,
b. It then uses an activation function, (the step function, in this case), to determine if our resulting summation is greater than
0, in order to to classify it as
In Learning Machine Learning Journal #2, we looked at how we could use a perceptron to mimic the behavior of an
AND logic gate. We walked through, and reasoned about, how to determine the values of the weight vector,
w, and the bias,
b, in order for our perceptron to accurately classify the inputs from the
AND truth table.
In Learning Machine Learning Journal #3, we looked at the Perceptron Learning Rule. We learned that by using labeled data, we could have our perceptron predict an output, determine if it was correct or not, and then adjust the weights and bias accordingly. In the end, we ended up with two formulas to describe the perceptron:
f(x) = 1 if w · x + b > 0
0 otherwisew <- w + (y - f(x)) * x
In Summary, we now have in our arsenal a classification algorithm.
Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels of new instances, based on past observations.
- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.
Supervised learning, is a subcategory of Machine Learning, where learning data is labeled, meaning that for each of the examples used to train the perceptron, the output in known in advanced.
When considering what kinds of problems a perceptron is useful for, we can determine that it’s good for tasks where we want to predict if an input belongs in one of two categories, based on it’s features and the features of inputs that are known to belong to one of those two categories.
These tasks are called binary classification tasks. Real-world examples include email spam filtering, search result indexing, medical evaluations, financial predictions, and, well, almost anything that is “binarily classifiable.”
Today, we’ll be continuing with
A B | AND
--- --- |-----
1 1 | 1
1 0 | 0
0 1 | 0
0 0 | 0
I would be remiss to say, “that’s it,” because it took me quite a bit of work to write these 19 lines (minus newlines), but when considering what these 19 lines can do, it’s kind of surprising that this is all it takes. Let’s walk through it.
import numpy as np
If you’re like me, not familiar with the
numpy module, the only important thing to know here is that we’re using it to evaluate our dot product
w · x during our summation.
numpy lets us create vectors, and gives us both linear algebra functions and python
list-like methods to use with it. We access its functions by calling them on
Here, we’re creating a new class
Perceptron. This will, among other things, allow us to maintain state in order to use our perceptron after it has learned and assigned values to its
def __init__(self, no_of_inputs, threshold=100, learning_rate=0.01):
In our constructor, we accept a few parameters that represent concepts that we looked at the end of Learning Machine Learning Journal #3.
no_of_inputs is used to determine how many
weights we need to learn.
threshold, is the number of epochs we’ll allow our learning algorithm to iterate through before ending, and it’s defaulted to
learning_rate is used to determine the magnitude of change for our weights during each step through our training data, and is defaulted to
learning_rate variables can be played with to alter the efficiency of our perceptron learning rule, because of that, I’ve decided to make them optional parameters, so that they can be experimented with at runtime.
self.threshold = threshold
self.learning_rate = learning_rate
These two lines set the
learning_rate arguments to instance variables.
self.weights = np.zeros(no_of_inputs + 1)
Here, we initialize our weight vector.
np.zeros(n), will create a vector with an
0’s. Here, we use the
no_of_inputs, (which again, is number of inputs in our input vector,
Remember in Learning Machine Learning Journal #3, we move our bias into the weight vector, so that we didn’t have to deal with it independently of our other weights? This bias is the
+1 to our weight vector, and is referred to as the bias weight.
def predict(self, inputs):
Now, we define our predict method. This is the method we first looked at, way back in Learning Machine Learning Journal #1. This method will house the
f(x) = 1 if w · x + b > 0 : 0 otherwise algorithm.
predict method takes one argument,
inputs, which it expects to be an
numpy array/vector of a dimension equal to the
no_of_inputs parameter that the perceptron was initialized with on line
summation = np.dot(inputs, self.weights[1:]) + self.weights
This is where the
numpy dot product function comes in, and it works exactly how you might expect.
np.dot(a, b) == a · b. It’s important to remember that dot products only work if both vectors are of equal dimension.
[1, 2, 3] · [1, 2, 3, 4] is invalid. Things get a bit tricky here because we’ve added an extra dimension to our
self.weights vector to act as the bias.
There are two options here, either we can add a
1 to the beginning of our
inputs vector, like we discussed in Learning Machine Learning Journal #3, or, we can take the dot product of the
inputs and the
self.weights vector with the the first value “removed”, and then add the first value of the
self.weights vector to the dot product. Either way works, I just happened to think that this way was cleaner.
We then store the result in the variable,
if summation > 0:
activiation = 1
activation = 0
This is our step function. It kind of reads like pseudocode: if the summation from above is greater than
0, we store
1 in the variable
activation = 0, then we return that value.
We don’t need the temporary variable
activation, but for now, the goal is to be explicit.
def train(self, training_inputs, labels):
Next, we define the
train method, which takes two arguments:
training_inputs is expected to be a list made up of
numpy vectors to be used as inputs by the
labels is expected to be a
numpy array of expected output values for each of the corresponding inputs in the
In essence, the input vector at
training_inputs[n] has the expected output at
len(training_inputs) == len(labels).
for _ in range(self.threshold):
This creates a loop wherein the following code block will be run a number of times equal to the
threshold argument we passed into the
Perceptron constructor. If one hasn’t been passed in, it’s defaulted to
100 epochs. Because we don’t care to use an iterator variable, convention has us set it to
for inputs, label in zip(training_inputs, labels):
There are three important steps happening in this line:
labelstogether to create a new
- We loop through the new object
- While we iterate through, we store each elements in the
training_inputslist into the
inputsvariable, and each of the elements in
labels, in the variable
In the code block after this line, when we reference
label, we get the expected output of the input vector stored in the
inputs variable, and we do this once for every
prediction = self.predict(inputs)
Here, we pass the
inputs vector into our previously defined
predict method, and we store the result in the
self.weights[1:] += self.learning_rate * (label - prediction) * inputs
This is almost all of the learning rule implementation:
w <- w + α(y — f(x))x
We find the error,
label — prediction, then we multiply it by our
self.learning_rate, and by our
inputs vector, we then add that result to the
weight vector (with the bias weight removed), and store it back into
self.weights is our bias weight, so we can’t add
inputs vectors directly, as they’re of different dimensions.
There were several options to take care of this, but I think the most explicit was is to mimic what we have done early, by only considering the vector created by “removing” the bias weight at
We can’t just ignore the bias, so we deal with it next:
self.weights += self.learning_rate * (label - prediction)
We update the bias in the same way as the other weights, except, we don’t multiply it by the
In just 19 lines of explicit code, we were able to implement a perceptron in Python!
Let’s put it to work and finally wrap up implementing
import numpy as np
from perceptron import Perceptron
First, we import
numpy so that we can create our vectors, then we import our new perceptron.
training_inputs = 
Next, we generate our training data. These inputs are the
B columns from the
AND truth table stored in an array of
numpy arrays, called
labels = np.array([1, 0, 0, 0])
Here, we store the expected outputs, or labels in the
label variable, making sure that each label index lines up with the index of the input it’s meant to represent.
perceptron = Perceptron(2)
We instantiate a new perceptron, only passing in the argument
2 therefore allowing for the default
learning_rate=0.01. Note that such a large threshold and such a small learning rate probably isn’t needed, so feel free to play around to find what’s most efficient! What happens if
learning_rate=10? What if
Now we train the perceptron by calling
perceptron.train and passing in our
This should finish rather quickly. Even though there are 100 epochs, our training data is so small and
numpy is very efficient!
inputs = np.array([1, 1])
#=> 1 inputs = np.array([0, 1])
That’s it! Now, we can start to use the perceptron as a logic
It may seem a bit bizarre that we’ve trained our perceptron with four inputs and we only really need it to classify those four inputs. Is that all perceptrons are good for? No! Remember, perceptrons can be used to classify almost any number of binarily classifiable things, (though there are some major caveats, see below).
What would happen if you removed one of the training inputs? Removed two of them? Are you able to remove the
[1, 1] training input? What other logic operators can you train the perceptron on? What happens if we add more inputs?
Test! Experiment! Play!
This concludes our
AND implementation, so now is a good time to sum up everything we’ve learned.
Perceptrons were first published in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory. He proposed a rule that could automatically determine the weights for each of the artificial neuron’s input features, (one input vector example), by using supervised learning to determine a decision boundary, (see below), between two binary classes.
The perceptron classifies inputs by finding the dot product of an input feature vector and weight vector and passing that number into a step function, which will return
1 for numbers greater than
f(x) = 1 if w · x + b > 0
In order to the determine the weights, the Perceptron Learning Rule:
- Predicts an output based on the current weights and inputs
- Compares it to the expected output, or label
- Update its weights, if the prediction != the label
- Iterate until the epoch threshold has been reached
To update the weights during each iteration, it:
- Finds the error by subtracting the prediction from the label
- Multiplies the error and the learning rate
- Multiplies the result to the inputs
- Adds the resulting vector to the weight vector
w <- w + α(y - f(x))x
Appendix and Further Exploration
There are a few concepts we haven’t touch on yet. Notably, the limitations of the perceptron.
The Perceptron Convergence Theorem is, from what I understand, a lot of math that proves that a perceptron, given enough time, will always be able to find a decision boundary between two linearly separable classes.
It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small. If the two classes can’t be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications — the perceptron would never stop updating the weights otherwise.
- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.
Linearly separable means that there exists a linear hyperplane, (line), that can separate input vectors into their correct classes; one class’ vectors falling on one side of the hyperplane, and the other class’, on the other.
In terms of our binary operator
AND, linear separability means that:
We plot each of our
Binputs, from our truth table, as points,
(A, B), on a 2-D plane…
We could draw a single line on that plane in such a way so that all of the
(A, B)points on one side of the line are the
Binputs that give us
1, and all the points on the other side, give us
Here is our
AND and its truth table:
( A , B ) | AND
--- --- |-----
( 0 , 0 ) | 0
( 0 , 1 ) | 0
( 1 , 0 ) | 0
( 1 , 1 ) | 1
We see that all of the pairs of inputs that return
0 are red and on one side of the line, and the input that gives us
1, is on the other side of the line.
This is a graphical representation of what our perceptron does! Our perceptron defines a line to draw in the sand, so to speak, that classifies our inputs binarily, depending on which side of the line they fall on! This line is call the decision boundary, and when employing a single perceptron, we only get one.
In other words, if there is no single line that can separate our training data into two classes, our perceptron will never find weights that can satisfy all of our data. It doesn’t take long to hit this limitation. Take a look the XOR Perceptron Problem.
Perceptrons have gotten us pretty far, but we’re not done with them yet. Now that we’ve gotten our hands on some code, we can begin digging deeper into using Python as a tool to further explore machine learning and neural networks.
Next, we’ll refactor our perceptron code, take a look at how we can use our model to classify more complex data, and look at how to use tools like
matplotlib to visualize decision boundaries.
Python Machine Learning — 2nd Ed. by Sebastian Raschka & Vahid Mirjalili
Appendix F — Introduction to NumPy from Introduction to Artificial Neural Networks and Deep Learning A Practical Guide with Applications in Python by Sebastian Raschka