# 19-line Line-by-line Python Perceptron

## Learning Machine Learning Journal #4

**Skip the noise; get the code: ****here****, “regression” type tests ****here****.**

So far, we’ve been doing a lot of learning, with not a lot of “machine.” Today, that changes, because we’re going to implement a perceptron in Python.

What makes this Python perceptron unique, is that we’re going to be as explicit as possible with our variable names and formulas, and we’ll go through it all, line-by-line, before we get clever, import a bunch of libraries, and refactor.

Before we begin, we’ll start with a little recap and summary.

# Recap & Summary

**In ****Learning Machine Learning Journal #1**, we looked at what a perceptron was, and we discussed the formula that describes the process it uses to binarily classify inputs. We learned that the perceptron takes in an input vector, `x`

, multiplies it by a corresponding weight vector `w`

, and then adds it to a bias, `b`

. It then uses an activation function, (the step function, in this case), to determine if our resulting summation is greater than `0`

, in order to to classify it as `1`

or `0`

.

**In ****Learning Machine Learning Journal #2**, we looked at how we could use a perceptron to mimic the behavior of an `AND`

logic gate. We walked through, and reasoned about, how to determine the values of the weight vector, `w`

, and the bias, `b`

, in order for our perceptron to accurately classify the inputs from the `AND`

truth table.

**In ****Learning Machine Learning Journal #3**, we looked at the Perceptron Learning Rule. We learned that by using labeled data, we could have our perceptron predict an output, determine if it was correct or not, and then adjust the weights and bias accordingly. In the end, we ended up with two formulas to describe the perceptron:

f(x) = 1 if w · x + b > 0

0 otherwisew <- w + (y - f(x)) * x

**In Summary**, we now have in our arsenal a **classification algorithm**.

Classification is a subcategory of supervised learning where the goal is to predict the categorical class labels of new instances, based on past observations.

- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.

**Supervised learning**, is a subcategory of Machine Learning, where learning data is **labeled**, meaning that for each of the examples used to train the perceptron, the output in known in advanced.

When considering what kinds of problems a perceptron is useful for, we can determine that it’s good for tasks where we want to predict if an input belongs in one of two categories, based on it’s features and the features of inputs that are known to belong to one of those two categories.

These tasks are called **binary classification tasks**. Real-world examples include email spam filtering, search result indexing, medical evaluations, financial predictions, and, well, almost anything that is “binarily classifiable.”

Today, we’ll be continuing with `AND`

:

` A B | AND `

--- --- |-----

1 1 | 1

1 0 | 0

0 1 | 0

0 0 | 0

# The Code:

I would be remiss to say, “that’s it,” because it took me quite a bit of work to write these 19 lines (minus newlines), but when considering what these 19 lines can do, it’s kind of surprising that this is all it takes. Let’s walk through it.

# Line-by-line

`import numpy as np`

If you’re like me, not familiar with the `numpy`

module, the only important thing to know here is that we’re using it to evaluate our dot product `w · x`

during our summation. `numpy`

lets us create vectors, and gives us both linear algebra functions and python `list`

-like methods to use with it. We access its functions by calling them on `np`

.

`class Perceptron(object):`

Here, we’re creating a new class `Perceptron`

. This will, among other things, allow us to maintain state in order to use our perceptron after it has learned and assigned values to its `weights`

.

`def __init__(self, no_of_inputs, threshold=100, learning_rate=0.01):`

In our constructor, we accept a few parameters that represent concepts that we looked at the end of Learning Machine Learning Journal #3.

The `no_of_inputs`

is used to determine how many `weights`

we need to learn.

The `threshold`

, is the number of **epochs** we’ll allow our learning algorithm to iterate through before ending, and it’s defaulted to `100`

.

The `learning_rate`

is used to determine the magnitude of change for our weights during each step through our training data, and is defaulted to `0.01`

.

The `threshold`

and `learning_rate`

variables can be played with to alter the efficiency of our perceptron learning rule, because of that, I’ve decided to make them optional parameters, so that they can be experimented with at runtime.

`self.threshold = threshold `

self.learning_rate = learning_rate

These two lines set the `threshold`

and `learning_rate`

arguments to instance variables.

`self.weights = np.zeros(no_of_inputs + 1)`

Here, we initialize our weight vector. `np.zeros(n)`

, will create a vector with an `n`

-number of `0`

’s. Here, we use the `no_of_inputs`

, (which again, is number of inputs in our input vector, `x`

), plus `1`

.

Remember in Learning Machine Learning Journal #3, we move our bias into the weight vector, so that we didn’t have to deal with it independently of our other weights? This bias is the `+1`

to our weight vector, and is referred to as the **bias weight**.

`def predict(self, inputs):`

Now, we define our predict method. This is the method we first looked at, way back in Learning Machine Learning Journal #1. This method will house the `f(x) = 1 if w · x + b > 0 : 0 otherwise`

algorithm.

The `predict`

method takes one argument, `inputs`

, which it expects to be an `numpy`

array/vector of a dimension equal to the `no_of_inputs`

parameter that the perceptron was initialized with on line `5`

.

`summation = np.dot(inputs, self.weights[1:]) + self.weights[0]`

This is where the `numpy`

dot product function comes in, and it works exactly how you might expect. `np.dot(a, b) == a · b`

. It’s important to remember that dot products only work if both vectors are of equal dimension. `[1, 2, 3] · [1, 2, 3, 4]`

is invalid. Things get a bit tricky here because we’ve added an extra dimension to our `self.weights`

vector to act as the bias.

There are two options here, either we can add a `1`

to the beginning of our `inputs`

vector, like we discussed in Learning Machine Learning Journal #3, or, we can take the dot product of the `inputs`

and the `self.weights`

vector with the the first value “removed”, *and then* add the first value of the `self.weights`

vector to the dot product. Either way works, I just happened to think that this way was cleaner.

We then store the result in the variable, `summation`

.

`if summation > 0: `

activiation = 1

else:

activation = 0

return activation

This is our step function. It kind of reads like pseudocode: if the summation from above is greater than `0`

, we store `1`

in the variable `activation`

, otherwise, `activation = 0`

, then we return that value.

We don’t *need* the temporary variable `activation`

, but for now, the goal is to be explicit.

`def train(self, training_inputs, labels):`

Next, we define the `train`

method, which takes two arguments: `training_inputs`

and `labels`

.

`training_inputs`

is expected to be a list made up of `numpy`

vectors to be used as inputs by the `predict`

method.

`labels`

is expected to be a `numpy`

array of expected output values for each of the corresponding inputs in the `training_inputs`

list.

In essence, the input vector at `training_inputs[n]`

has the expected output at `labels[n]`

, therefore `len(training_inputs) == len(labels)`

.

`for _ in range(self.threshold):`

This creates a loop wherein the following code block will be run a number of times equal to the `threshold`

argument we passed into the `Perceptron`

constructor. If one hasn’t been passed in, it’s defaulted to `100`

epochs. Because we don’t care to use an iterator variable, convention has us set it to `_`

.

`for inputs, label in zip(training_inputs, labels):`

There are three important steps happening in this line:

- We
`zip`

`training_inputs`

and`labels`

together to create a new`iterable`

object - We loop through the new object
- While we iterate through, we store each elements in the
`training_inputs`

list into the`inputs`

variable, and each of the elements in`labels`

, in the variable`label`

.

In the code block after this line, when we reference `label`

, we get the *expected output *of the input vector stored in the `inputs`

variable, and we do this once for every `inputs`

/`label`

pair.

`prediction = self.predict(inputs)`

Here, we pass the `inputs`

vector into our previously defined `predict`

method, and we store the result in the `prediction`

variable.

`self.weights[1:] += self.learning_rate * (label - prediction) * inputs`

This is almost all of the learning rule implementation:

`w <- w + α(y — f(x))x`

We find the error, `label — prediction`

, then we multiply it by our `self.learning_rate`

, and by our `inputs`

vector, we then add that result to the `weight`

vector (with the bias weight removed), and store it back into `self.weights[1:]`

.

Remember that `self.weights[0]`

is our bias weight, so we can’t add `self.weights`

and `inputs`

vectors directly, as they’re of different dimensions.

There were several options to take care of this, but I think the most explicit was is to mimic what we have done early, by only considering the vector created by “removing” the bias weight at `self.weights[0]`

.

We can’t just ignore the bias, so we deal with it next:

`self.weights[0] += self.learning_rate * (label - prediction)`

We update the bias in the same way as the other weights, except, we don’t multiply it by the `inputs`

vector.

## TA DA!

In just 19 lines of explicit code, we were able to implement a perceptron in Python!

# Usage

Let’s put it to work and finally wrap up implementing `AND`

`import numpy as np`

from perceptron import Perceptron

First, we import `numpy`

so that we can create our vectors, then we import our new perceptron.

`training_inputs = []`

training_inputs.append(np.array([1, 1]))

training_inputs.append(np.array([1, 0]))

training_inputs.append(np.array([0, 1]))

training_inputs.append(np.array([0, 0]))

Next, we generate our training data. These inputs are the `A`

and `B`

columns from the `AND`

truth table stored in an array of `numpy`

arrays, called `training_inputs`

.

`labels = np.array([1, 0, 0, 0])`

Here, we store the expected outputs, or *labels* in the `label`

variable, making sure that each label index lines up with the index of the input it’s meant to represent.

`perceptron = Perceptron(2)`

We instantiate a new perceptron, only passing in the argument `2`

therefore allowing for the default `threshold=100`

and `learning_rate=0.01`

. Note that such a large threshold and such a small learning rate probably isn’t needed, so feel free to play around to find what’s most efficient! What happens if `learning_rate=10`

? What if `threshold=2`

?

`perceptron.train(training_inputs, labels)`

Now we train the perceptron by calling `perceptron.train`

and passing in our `training_inputs`

and `labels`

.

This should finish rather quickly. Even though there are 100 epochs, our training data is so small and `numpy`

is very efficient!

inputs = np.array([1, 1])

perceptron.predict(inputs)

#=> 1 inputs = np.array([0, 1])

perceptron.predict(inputs)

#=> 0

That’s it! Now, we can start to use the perceptron as a logic `AND`

!

It may seem a bit bizarre that we’ve trained our perceptron with four inputs and we only really need it to classify those four inputs. Is that all perceptrons are good for? No! Remember, perceptrons can be used to classify almost any number of binarily classifiable things, (though there are some major caveats, see below).

What would happen if you removed one of the training inputs? Removed two of them? Are you able to remove the `[1, 1]`

training input? What other logic operators can you train the perceptron on? What happens if we add more inputs?

Test! Experiment! Play!

# Conclusion

This concludes our `AND`

implementation, so now is a good time to sum up everything we’ve learned.

**Perceptrons** were first published in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory. He proposed a rule that could automatically determine the **weights **for each of the artificial neuron’s **input features**, (one input vector example), by using **supervised learning** to determine a **decision boundary**, (see below), between two **binary classes**.

The perceptron classifies inputs by finding the **dot product** of an **input feature vector** and **weight vector** and passing that number into a **step function**, which will return `1`

for numbers greater than `0`

, or `0`

otherwise.

`f(x) = 1 if w · x + b > 0`

0 otherwise

In order to the determine the **weights**, the **Perceptron Learning Rule**:

**Predicts**an output based on the current weights and inputs- Compares it to the expected output, or
**label** - Update its weights, if the
**prediction**!= the**label** - Iterate until the
**epoch threshold**has been reached

To update the weights during each iteration, it:

- Finds the
**error**by subtracting the**prediction**from the**label** - Multiplies the
**error**and the**learning rate** - Multiplies the result to the
**inputs** - Adds the resulting vector to the
**weight**vector

`w <- w + α(y - f(x))x`

# Appendix and Further Exploration

There are a few concepts we haven’t touch on yet. Notably, the limitations of the perceptron.

The **Perceptron Convergence Theorem** is, from what I understand, a lot of math that proves that a perceptron, given enough time, will always be able to find a **decision boundary** between two **linearly separable** classes.

It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small. If the two classes can’t be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications — the perceptron would never stop updating the weights otherwise.

- Sebastian Raschka, Vahid Mirjalili, Python Machine Learning — 2nd Ed.

**Linearly separable** means that there exists a linear hyperplane, (line), that can separate input vectors into their correct classes; one class’ vectors falling on one side of the hyperplane, and the other class’, on the other.

In terms of our binary operator `AND`

, linear separability means that:

If…

We plot each of our

`and`

A`inputs, from our truth table, as points,`

B`, on a 2-D plane…`

(A, B)Then..

We could draw a single line on that plane in such a way so that all of the

`points on one side of the line are the`

(A, B)`and`

A`inputs that give us`

B`, and all the points on the other side, give us`

1`.`

0

Here is our`AND`

and its truth table:

`( A , B ) | AND `

--- --- |-----

( 0 , 0 ) | 0

( 0 , 1 ) | 0

( 1 , 0 ) | 0

( 1 , 1 ) | 1

We see that all of the pairs of inputs that return `0`

are red and on one side of the line, and the input that gives us `1`

, is on the other side of the line.

This is a graphical representation of what our perceptron does! Our perceptron defines a line to draw in the sand, so to speak, that classifies our inputs binarily, depending on which side of the line they fall on! This line is call the **decision boundary***, *and when employing a single perceptron, we only get one*.*

In other words, if there is no single line that can separate our training data into two classes, our perceptron will never find weights that can satisfy all of our data. It doesn’t take long to hit this limitation. Take a look the **XOR Perceptron Problem.**

Perceptrons have gotten us pretty far, but we’re not done with them yet. Now that we’ve gotten our hands on some code, we can begin digging deeper into using Python as a tool to further explore machine learning and neural networks.

Next, we’ll refactor our perceptron code, take a look at how we can use our model to classify more complex data, and look at how to use tools like `matplotlib`

to visualize decision boundaries.

# Resources

Perceptron Convergence Theorem

*Python Machine Learning — 2nd Ed*. by Sebastian Raschka & Vahid Mirjalili

Single-Layer Neural Networks and Gradient Descent

10.2: Neural Networks: Perceptron Part 1 — The Nature of Code

*Appendix F — Introduction to NumPy* from Introduction to Artificial Neural Networks and Deep Learning A Practical Guide with Applications in Python by Sebastian Raschka