Iris Flower Classification using KNN

4 min readJun 28, 2020

Classification is an important part of machine learning. This machine learning project will classify the species of the iris flower. They are classified on the basis of length and width of sepals and petals.

I have used Jupyter Notebooks for this project.

Step 1: Loading the data set

Importing Pandas library to import a csv file using read_csv and creating a DataFrame named iris.

Output: A table full of details about the iris flower

Step 2: Data Analysis

Using the describe() to get the statistical data.

value_counts() to get the object counts of unique values.

Step 3: Data Visualization

Data visualization can reveal patterns that are not obvious and communicate the insights more effectively.

Importing Matplotlib library to get a graphical view about the data. Using iris.hist() plotting a histogram that plots of each individual data.

Using Scatter plot to differentiate the species based on their sepal_length and sepal_width. For loop is used to get a color for each species. The axis labels explain what the plotted data values are. We can specify the x and y axis labels and a title using plt.xlabel(), plt.ylabel() and plt.title(). To plot a legend we use plt.legend()

Step 4: Dividing into Input and Output values

We divide the data into input and output values. Taking x as a feature and y as a target. In machine learning, the concept of input and output values is important to understand. In the above data set you can see that the outcome of which species the flower is depends upon the values of sepal_length, sepal_width, petal_length and petal_width. Therefore, columns 0, 1, 2 and 3 are features and column 4 is a target.

Step 5: Training and Testing Data

To check how well our data would perform we have to train and test the model. Our model will be making predictions on data we don’t know the answer to, so we’d like to evaluate how well our model does on new data, not just the data it’s already seen. To do this we divide the dataset into training and testing data. The training data is used to build models whereas testing data is used to evaluate the models. We divide the data in a ration of 70:30. 70% of training data and 30% of testing data.

Importing train_test_split function from Scikit-Learn.

Step 6: Normalization of Data

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

Min-max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1

We have used MinMaxScaler.

Step 7: Calling the KNN Classifier

KNN algorithms use data and classify new data points based on similarity measures.

Importing KNeighborsClassifier from Scikit-learn.

Step 8: Predicting the testing data

Predicting the testing data using predict() function.

Checking the predicting output with real output which is stored in y_test.

Step 9: Predicting a new data

Predicting a new data by creating a new array.

Step 10: Accuracy Score, Confusion Matrix and Classification Report

Accuracy Score: Classification Accuracy is what we usually mean, when we use the term accuracy. It is the ratio of number of correct predictions to the total number of input samples

Confusion Matrix: A confusion matrix is a table that is often used to describe the performance of a classification model. Confusion matrix consist of 4 values True positive(TP), True negative(TN), False positive(FP), False negative(FN).