Iris classification with SVM on python

Studying and implementing a Support Vector Machine for classify the type of iris

We are going to create a model for classifying the the type of iris based on the variables of the dataset.

In first place, we’re going to identifying the variables

The sepal is the part that forms the calyx of a flower, typically function as protection for the flower in bud, and often as support for the petals when in bloom.

We have two variables

  • The Sepal lenght on centimeters
  • The Sepal Width on centimeters

Petals are modified leaves that surround the reproductive parts of flowers.

We have two variables

  • The Petal lenght on centimeters
  • The Petal Width on centimeters

Iris is a genus of 260–300 species of flowering plants with showy flowers. It takes its name from the greek word for a rainbow,Iris.

In the dataset we have three types of iris:

  • the Iris Setosa
  • Iris Versicolour
  • Iris Virginica

Let’s code

Import packages

For this template we’re use the next packages

We are going the read the dataset directly from the UCI MACHINE LEARNING REPOSITORY, but this dataset doesn’t have any name, then at first place we will define our column names and then read the dataset

And then we have

For this example we just have a categorical columns then by pandas we encode the column

And then we have

In first place, all the dataset is organized equally, there is not any type of flower with more data, there are 50 rows for each flower so trying to count any quantity will be unuseful.

So let’s look the correlation between the columns, in this way will see how important is a column for chossing wich type of flower.

And the result is

For our project we must see the last column of the heatmap, as we can see, the shape of the petals are the most correlationed columns with the type of flowers, with lower correlation there is the sepal length wich also haves a directly correlation and in last place there we have the negative correlation of the sepal width column, but this doen’t mean that is less important, is important but is inverse relationed with the type of flower.

So all the columns are important for the model, in the case we want to quit some columns, the candidates will be the sepal columns with the sepal width in first place.

For this classification problem, we’ll use the SVM classifier, this by a personal choice, with a small dataset and the good parameters we will have an accurate model .

And finally for check the acurracy of the model , we’ll use the confusion matrix and the cross validation

The results are:

We have a 98% of accuracy wich is a very good model, and with the confusion matrix we can see that we have just only one misclassified data.

Conclusion

The iris classification problem is a good project for predict the class and evaluate the columns to check his importance on the predictions

Math and Machine Learning Student on Sergio Arboleda University