Workflow of a Machine Learning Project

Kavalipurapu Harika
Life in Data Science
4 min readMay 28, 2020

From detecting fraud , to classifying vegetables, to predicting house prices, machine learning has to say only one thing with a wink on its face “Oh! I can do this very easily with just few steps”! ;)

Let’s now see what those few steps are.Get ready for a small journey to learn about the basic steps involved in a machine learning project.

If you are new to machine learning then visit this below link to grab some knowledge.

The workflow of a machine learning project goes like this :

  1. Gathering Data
  2. Data Preprocessing
  3. Exploratory Data Analysis
  4. Choosing a Model
  5. Training
  6. Testing
  7. Prediction
  8. Deploy(optional)

By now all of you must be wondering what are all these steps??? Don’t Worry lets go slowly with one by one :)! To make it more clear let’s take a simple example.

Let’s pretend that we have been asked to create a system which can predict if a person is suffering from corona virus or not.This question answering system that we are going to build is called a “model”, and this model can be created via a process called “training”. The goal of training is to create an accurate model that answers our questions correctly most of the time. But in order to train a model, we need to collect data to train on. This is where we begin.

  1. GATHERING DATA

It’s time for our first step that is data gathering.We have to gather all the data that is suitable to tell whether a person is suffering from corona virus or not.The data can include features like symptoms responsible for virus , age of the person, body temperature,etc.The data will obviously vary from one problem statement to another.This step is very important because the quality and quantity of data gathered will directly tell how good our predictive model can be.

Suggestion: Always store the collected data in tabular format.

2. DATA PREPROCESSING

In simple words just like before preparing any dish we clean all the vegetables to make the dish much more healthy and tasty, in the same way we have to clean the data collected before using it to prepare a perfect machine learning model.This step is all about that. If there are any NaN values or unsuitable data, remove them or replace them by using various methods available.Also if there are more outliers , our model will not be able to predict correct results.We have to remove outliers (if any) from the data.Data Preprocessing is a very important step which will effect the quality of prediction of any ML model.

3. EXPLORATORY DATA ANALYSIS

Once we are ready with clean data, we need to understand the data.This step is all about understanding the data and playing with the data.This steps is where all Data Scientists take lot of time because understanding the data is very important.There are various methods and ways available to understand the data. Some of which are plotting graphs between various features,finding correlations,etc. In our case we can find correlations between body temperature and other features to draw few inferences and conclusions that might be helpful in building the model.

4. CHOOSING A MODEL

The next step in our workflow is choosing a model. There are many models that researchers and data scientists have created over the years. Some are very well suited for image data, others for sequences (like text, or music), some for numerical data, others for text-based data.In our case we can go with a simple classification model which tells if the person is corona effected or not. For every problem the way of approach will be different and the right decision during model selection can give effective results in the end.

5. TRAINING

Now let’s move on to the main step that is training.In this step we will use our data to incrementally improve our model’s ability to predict whether a person is effected by corona or not. It is similar to a child learning to walk.First the child starts to crawl then slowly he tries to stand and then slowly he starts walking.Similarly after lots’s of practice and iterations, the model get’s completely trained to predict accurate results. For this we split the data into training data and testing data. We perform training only on training data. You will know about testing data in the next section.

6. TESTING

In this step we test and evaluate our model on how good it is predicting.If a student get’s trained on adding of two numbers “2+3=5”.We can evaluate the student’s performance as good only when he answers “6+3” as “9” which is a new question to him .Similarly we test our model on the data which it has not seen till date or to which it is completely new.

7. PREDICTION

This is the step for which all of us wait because this step finally answers all the questions we have.In our case now we can finally use our machine learning model to predict if a person is suffering from corona virus or not.

8. DEPLOY(Optional Step)

We can now deploy our model on mobile app or web app to make it more user friendly. Well this is an optional step.

Hope you enjoyed reading this :D

What’s Next?

Now that you have basic knowledge on the workflow of a Machine Learning Project,next time, we will build our first “real” machine learning model, using code. Till then Happy Learning! :)

--

--