Machine learning-Iris classification

Hai Everyone,

Welcome to the part two of the machine learning tutorial.Today we are going to develop the model that is going to classify the iris flowers for us.Before we get started to the problem I recommend you to go through the first tutorial, If you are not aware of it.In the first tutorial ,We built the model that can classify whether the given fruit is apple or orange.

As far as this tutorial is concerned, It is much more similar to the first tutorial.But here I will introduce you to the some inbuilt dataset and how to perform various operations on it.

Let’s get started.

Problem Statement:

Create the model that can classify the different species of the Iris flower.

Problem solving:

  1. create the dataset.
  2. Build the model
  3. Train the model
  4. Make predictions.

Iris Flower:

Iris is the family in the flower which contains the several species such as the iris.setosa,iris.versicolor,iris.virginica,etc.

1.Create the datasets:

Inorder to classify the different species of the Iris,We should prepare the datasets with features and labels.But sklearn comes with the inbuilt datasets for the iris classification problem.

Let us first understand the datasets

The data set consists of:

  • 150 samples
  • 3 labels: species of Iris (Iris setosa, Iris virginica and Iris versicolor)
  • 4 features: Sepal length,Sepal width,Petal length,Petal Width in cm

Scikit learn only works if data is stored as numeric data, irrespective of it being a regression or a classification problem. It also requires the arrays to be stored at numpy arrays for optimization. Since, this dataset is loaded from scikit learn, everything is appropriately formatted.

So now let us write the python code to load the Iris dataset.

from  sklearn import  datasets
iris=datasets.load_iris()

Assign the data and target to separate variables.

x=iris.data
y=iris.target

x contains the features and y contains the labels

Splitting the dataset;

Since our process involve training and testing ,We should split our dataset.It can be executed by the following code

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)

x_train contains the training features

x_test contains the testing features

y_train contains the training label

y_test contains the testing labels

2.Build the model

We can use any classification algorithm to solve the problem.we have solved the previous problem with decision tree algorithm,I will go with that.

from sklearn import tree
classifier=tree.DecisionTreeClassifier()

The above code will create the empty model. Inorder to provide the operations to the model we should train them.

Note:We can also use KNeighborsClassifier(efficiency is higher)

from sklearn import neighbors
classifier=neighbors.KNeighborsClassifier()

At this point,We have just made the model.But it cannot able to predict whether the given flower belongs to which species of Iris .If our model has to predict the flower,We have to train the model with the Features and the Labels.

3.Train the Model.

We can train the model with fit function.

classifier.fit(x_train,y_train)

Now the model is ready to make predictions

4.Make predictions:

Predictions can be done with predict function

predictions=classifier.predict(x_test)

these predictions can be matched with the expected output to measure the accuracy value.

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,predictions))

Full code:

from  sklearn import  datasets
iris=datasets.load_iris()
x=iris.data
y=iris.target
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
from sklearn import tree
classifier=tree.DecisionTreeClassifier()
classifier.fit(x_train,y_train)
predictions=classifier.predict(x_test)
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,predictions))

Output:

D:\iris_classification>python classifier.py
0.96

So the accuracy is 96%

Congo..we have made it!!!!!!!!