Perceptron from (Almost) Zero and 3D Visualization

Neto Figueira
Analytics Vidhya
Published in
4 min readJan 25, 2021

In a previous post i’ve introduced the idea the Perceptron algorithm to solve an specific problem (the AND gate). The ideia was to get a feel of how the algorithm works, and now we can generalize it to solve any linearly separable problem (that’s why the ‘almost’ in the title).

The main adjust to our previous work on Perceptron to solve AND gate is to generalize the number of inputs that our algorithm will receive, so it can handle datasets with an arbitrary number of inputs. that is, instead of finding a line to separate the space, the generalized version will be able to find hyperplanes now.

GIPHY

The general structure of Perceptron looks like:

image by the author

I’ve wrapped the algorithm into a class, and the usage is divided in three methods:

  • Initialization: __init__ function sets the input length of our Perceptron, the activation function used (for now let’s work with the classic step function), the learning rate ‘eta’ (default=0.1) and the number of iterations (epochs) used (default =1000).
  • The Training Step: we call this method to run the learning algorithm in a train dataset. (binary classification problems, linearly separable).
  • Prediction: finally, we can use this method to make predicitons on a train set, and it provides an accuracy variable to evaluation of the performance of the algorithm.

The full code follows:

All right, to see our algorithm in action, lets use get some data. let’s load the iris dataset and make some predictions:

from sklearn import datasets
import matplotlib.pyplot as plt
iris = datasets.load_iris()#divide into features and target variables
X = iris.data[:100, ]
Y = iris.target[:100]

Iris dataset provides four features (sepal length, sepal width, petal length, petal width) for 3 different kinds of iris plants (virginica, setosa and versicolor). if you take a look at the full dataset it has three different labels for each class,

iris dataset classes

To make a binary classification we took the first 100 entries in X and Y arrays above. we can use the function below to plot the features:

# Plot the training points
def plot_iris(X1, X2, Y, X1_label, X2_label):
plt.scatter(X1, X2, c=Y, cmap=plt.cm.Set1,
edgecolor='k')
plt.xlabel(X1_label)
plt.ylabel(X2_label)
plot_iris(X[:, 0], X[:, 1], Y, iris.feature_names[0], iris.feature_names[1])
author

the image above clearly shows that the classes 0 and 1 can be linearly separated by the features sepal width and sepal length. This is a similar problem to the AND gate already solved by us. so lets use one more feature with our perceptron implementation to solve the problem. First, we separate the data into train and test with scikitlearn’s help:

from sklearn.model_selection import train_test_split# choosing three features to work with here: (X[:, :3])
X_train, X_test, y_train, y_test = train_test_split(X[:, :3], Y, test_size= 0.2)

We can visualize our data in 3d space:

Iris dataset for two first classes (setosa and versicolor)

now, we instantiate our Perceptron (you will need to create the activation function to pass to the class instance):

def step_function(x):
if x > 0:
return 1
else:
return 0
percep_iris = Perceptron(input_length=X_train.shape[1], eta=0.1, activ_f=step_function, epochs=1000)

we can get the weights randomly initialized with ‘percp_iris.weight’, and plot the random generated plane:

Plane generated with initialization (author)

now, lets actually train our algorithm:

percep_iris.train(X_train, y_train)

you can then evaluate the algorithm with the test set (you should get an accuracy of 1, because that’s an easy problem)

percep_iris.predict(X_test, y_test)
percep_iris.accuracy

and also we can plot the new plane with the correct weights generated in the trainning step:

Separating plane with trained weights

We can see that it really separates all the points of test set, and that why accuracy of 100%!

Conclusion

A quick post to review one of the most classic classification algorithms. I think that implementing machine learning from zero even if in a really basic level is the best way to really understand them and evolve to solve real-world problems. Any questions, concerts and critics will be appreciated. you can go to my github to check the full code to generate the plots.

References

https://matplotlib.org/3.1.1/gallery/mplot3d/scatter3d.html

Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill. ISBN: 978–0–07–042807–2

--

--