“Building a Flask-Based Iris Flower Classification Model: From Prediction to Deployment”

Snehal W.
8 min readJan 3, 2024

--

“Machine learning embodies the art of uncovering insights from data — a dynamic fusion of statistics, artificial intelligence, and computer science, often referred to as predictive analytics or statistical learning. Its widespread integration has seamlessly woven its prowess into the fabric of our daily existence. Iris flower dataset - It is one of the most well-known datasets in the world of machine learning and data science, and for good reason. It consists of 150 records of Iris flowers, including information about their sepal and petal length and width, as well as the type of Iris flower. In this blog post, we will be exploring the Iris dataset and learning about the different techniques and methods we can use to analyze, build predictive model and deploy it. Whether you are a beginner or an experienced data scientist, this post will provide valuable insights and tips for working with this classic dataset.”

Figure 1. Iris dataset species

The Iris flower dataset is a classic dataset in the field of machine learning and statistical analysis. It consists of 150 observations of iris flowers, including the sepal and petal length and width for each flower, as well as the species of the flower. The dataset was introduced by British statistician and biologist Ronald Fisher in his 1936 paper, “The use of multiple measurements in taxonomic problems.”

In this notebook, we will explore the Iris dataset and use various statistical and machine learning techniques to better understand the relationships between the different features and the species of the flowers. We will also use the dataset to build and evaluate a classifier that can predict the species of an iris flower based on its measurements.

• Problem statement : To Create a model which can classify different species of the Iris flower from given certain details.

• Labels :Iris setosa, Iris virginica and Iris versicolor

• Features : Sepal length, Sepal width, Petal length, Petal Width in cm

• Problem solving: Now, here are the basic steps we perform when we are creating a Machine Learning Model.

  1. Create a dataset
  2. Build a model
  3. Train the model
  4. Make predictions

Getting Set Up

The first step is to import the preloaded data sets from the scikit-learn python library. Let’s start by importing the necessary libraries and loading the dataset.

The data description will also give more information on the features, statistics, and sources.

The data is not already labeled, so we can reference the sklearn website for further information about the data and features.

In the documentation the data features are listed as:

  1. sepal length in cm 2.sepal width in cm 3.petal length in cm 4.petal width in cm

What do these features refer to?

• Sepal is the green part around the flower petals that enclose the flower when the flower is not in bloom. This is measured by length and width in centimeters as a feature for predicting what type of iris it is.

• The length and width of the petals, or the colorful leaves of the flower, are also measured in centimeters.

Lets rename the columns with these features so we know what the variables are referring to when we model the data later.

What do these numbers refer to in the target column?

0 = Iris Setosa

1 = Iris Versicolour

2 = Iris Virginica

The target data frame is only one column, and it gives a list of the values 0, 1, and 2. We will use the information from the feature data to predict if a flower belongs in group 0, 1, or 2.

Preprocess the data

In the preprocessing step, you will typically perform a series of operations on your raw data to get it ready for further analysis or modeling. The specific steps you take will depend on your goals and the characteristics of your data, but some common tasks include:

  1. Cleaning the data: This involves fixing errors or missing values, and removing duplicates or irrelevant information.
  2. Normalizing or scaling the data: You may want to scale the numeric data so that it is on the same scale, or to handle outliers.
  3. Encoding categorical data: If you have categorical data (data that can be divided into a fixed number of categories), you may need to encode it as numerical data so that it can be used in a machine learning model.
  4. Splitting the data: You may want to split your data into a training set, a validation set, and a test set in order to evaluate the performance of your model.
  5. Dimensionality reduction: If you have a large number of features, you may want to reduce the number of dimensions by selecting a subset of the most important features, or by using techniques such as principal component analysis (PCA).

Overall, the goal of the preprocessing step is to get your data into a form that is suitable for further analysis or modeling.

Exploratory Data Analysis (EDA)

To help us understand our data better, let’s first combine the two data frames we just created. By doing this we can see the features and class determination of the flowers together.

Visualizing

The next step in the EDA process is to start visualizing some relationships.

Correlations

The Seaborn library has a great heat map visual for mapping the correlations between features. The higher the number is, the greater the correlation between the two elements. A high positive correlation indicates that the two elements have a positive linear relationship (as one increases the other also increases), and a low negative correlation indicates a negative linear relationship (as one increases the other decreases).

Petal length and width is most correlated with the target, meaning that they are good features to take into consideration when deciding which class the flower is. Sepal width is most anti correlated, indicating that it does not have a strong relationship with deciding which class the flower is. There is also some intercorrelation amonth the sepal and petal features.

Outlier Checking

outliers can help you identify unusual or unexpected data points, which can be useful for identifying errors or anomalies in the data.

Modeling

Now that we have cleaned and explored the data, we can begin to develop a model. Our goal is to create a Logistic Regression classification model that will predict which class the flower is based on petal and sepal sizes.

Split into train and test data

Here 80% of the data will be the training data, and 20% will be the test data to evaluate our model

By stratifying on y we assure that the different classes are represented proportionally to the amount in the total data (this makes sure that all of class 1 is not in the test group only)

Standardize X values

This puts the X values on a standard scale for all features, while keeping the scale of differences between the values.

Find baseline prediction

The baseline is the probability of predicting class before the model is implimented. If the data is split into 2 classes evenly, there is already a 50% chance of randomly assigning an element to the correct class. The goal of our model is to improve on this baseline, or random prediction. Also, if there is a strong class imbalance (if 90% of the data was in class 1), then we would need to adjust the threshold for our model.

Algorithms

Logistic Regression

Support Vector Machine(SVM)

KNN (K-Nearest Neighbors)

Gaussian Naive Bayes

Decision Tree

RESULT

Using Decision Tree Classifier algorithms for training the model.

Building a Predictive System

After building predictive system, check your model is properly work or not. So use input and check model output.

Deploy model using Flask

API(Deployed Model)

So we got this result as 0 1 2

  • Here 0 is for the first unknown species and we have found that it is an iris setosa
  • 1 is for the second unknown species and we have found that it is an iris versicolor
  • 2 is for third unknown species and we have found that it is an iris virginica

Conclusion

Hopefully this walk-through helped to show some major steps in the process of a data science project. Of course this is not an exhaustive list of steps that could be taken with this data set, but it aims to carefully show some of the important steps of classification.

We also built a machine learning model to predict the species of iris based on the sepal and petal measurements, and were able to evaluate the performance of the model on a test set. Overall, the Iris flower dataset provides a great opportunity to learn about and practice data analysis and machine learning.

This is a classic data set because it is relatively straightforward, but the steps highlighted here can be applied to a classification project of any kind. Follow for more simple (and advanced) data set walk-throughs in the future!

Thanks for reading this article. Happy coding!

--

--

Snehal W.

Techie - Data Enthusiast | A Life-Long Learner | Enthusiast Earth Citizen | An Artist