# The Perceptron Algorithm: pseudo code

What you need to remember before diving: ( probably, me just talking to my future self)

A ** perceptron** is a classification model that consists of a set of weights, or scores, one for every feature, and a threshold. The perceptron multiplies each weight by its corresponding score, and adds them, obtaining a score. If this score is greater than or equal to the threshold, then the perceptron returns a ‘yes’, or a ‘true’, or the value 1. If it is smaller, then it returns a ‘no’, or a ‘false’, or the value 0.

*Input:*

- A dataset of points, where every point has a positive or negative label.
- A number of epochs, n.
- A learning rate.

*Procedure:*

- Start with a random line. In other words, start with random values for the score of each word, and the bias.
- Repeat the following procedure n times:
- o Pick a random point.
- o Apply the perceptron trick to the point and the line. In other words, if the point is well classified, do nothing, and if it is misclassified, move the line a little bit closer to the point.
- Enjoy your well-fitted line!

Now the question, ** how many times should we run this procedure of picking a random point and adjusting the line to fit this point better?** There are several different criteria to figure out how many times to run it, such as the following:

- Run the algorithm a fixed number of times, which could be based on our computing power, or the amount of time we have.
- Run the algorithm until the error is lower than a certain threshold we set beforehand.
- Run the algorithm until the error doesn’t change significantly for a certain amount of time.

Normally, if we have the computing power, it’s ok to run it many more times than needed, since once we have a well-fitted line, it won’t move very much. In the following section, here I code the perceptron algorithm and analyze it by measuring the error in each step, so we’ll get a better idea of when to stop running it.

*Some small terms before we start:*

# The features are the number of times each word appears, and they are denoted as follows.

#The scores for the words are called the weights. They also include the bias, and they are denoted as follows.

#The score for a particular sentence is, just as before, the sum of the number of times each word appears (xi) times the weight of that word (wi), plus the bias (b).

# The prediction is simply 1 (happy) if the score is greater than or equal to zero, or 0 (sad) if the score is less than zero.

Finally! There we go:

#Coding the score function:

**def** **score**(weights, bias, features):**return** features.dot(weights) + bias

# The error function:

**def** **error**(weights, bias, features, label):

pred = prediction(weights, bias, features)

**if** pred == label:

**return** 0

**else**:

**return** np.abs(score(weights, bias, features))

# Total error:

` `**def** **total_error**(weights, bias, X, y):

total_error = 0

**for** i **in** range(len(X)):

total_error += error(weights, bias, X.loc[i], y[i])

**return** total_error

# The perceptron trick: checks if the point is correctly classified. If it is, then it does nothing. If it is misclassified, then it adds:

**def** **perceptron_trick**(weights, bias, features, label, learning_rate = 0.01):

pred = prediction(weights, bias, features)

if pred==label:

**return** weights, bias

**else**:

if label==1 and pred==0:

**for** i **in** range(len(weights)):

weights[i] += features[i]*learning_rate

bias += learning_rate

elif label==0 and pred==1:

for i in range( len(weights)):

weights[i] -= features[i]*learning_rate

bias -= learning_rate

**return** weights, bias

#Shortcut: Notice that the difference between the label and the prediction (namely label-pred) is 0 when they’re equal, +1 when the label is positive and the prediction is negative, and -1 when the label is negative and the prediction is positive. This helps simplify our code in the perceptron trick by using this quantity to remove the if statement as follows.

**def** **perceptron_trick_clever**(weights, bias, features, label, learning_rate = 0.01):

pred = prediction(weights, bias, features)

**for** i **in** range(len(weights)):

weights[i] += (label-pred)*features[i]*learning_rate

bias += (label-pred)*learning_rate

**return** weights, bias

# Last step: For ** the perceptron algorithm** all we need to do is repeat the perceptron trick many times (as many as the number of epochs). For convenience, we’ll also keep track of the errors at each epoch. As inputs, we not only have the data (features and labels), we also have the learning rate which we default to 0.01, and the number of epochs which we default to 200. Here is the code for the perceptron algorithm.

`def `**perceptron_algorithm**(X, y, learning_rate = 0.01, epochs = 200):

weights = [1.0 **for** i **in** **range**(len(X.loc[0]))]

bias = 0.0

errors = []

**for** i **in** **range**(epochs):

errors.**append**(total_error(weights, bias, X, y))

j = random.randint(0, len(features)-1)

weights, bias = perceptron_trick(weights, bias, X.loc[j], y[j]) **return** weights, bias, errors

Run! Yay! Done for the day zzz.

^ If you need a little bit more help ;)