Building your first Predictive Model ?

MUKUL JOSHI
IEEE Student Branch DIT University
6 min readNov 12, 2020

Follow these model building stages to get the best results.

Photo by Samuel Bourke on Unsplash

Introduction

Hey reader as you started your journey in the field of Machine Learning, first of all I wish you good luck, but I know you don’t need it. As you allowed yourself to be the beginner and passion led you here it’s my responsibility to guide you towards your end goal.

You can’t build a great building on a weak foundation.

As this quote justifies itself, the same lies in Machine Learning.
Being a Machine Learning practitioner it’s a necessity to have a very strong foundation regarding this field and once should know these six stages of predictive modelling.

Predictive Modelling

Before heading on the stages of Machine Learning, let’s discuss what is predictive modelling?

Predictive modelling also called predictive analysis is a mathematical process to make use of past data and trying to predict the future i.e when we have to predict some future values based on historical data.

Is predicting stock prices movement a predictive modelling task ?
The factors involved would be :
:> Analysing past stock price.
:> Analysing similar stocks.
:> Future stock price required
So what do you think now ?
… Since we need to predict the future movement of stock price based on past data along with other type of data, this is a predictive modelling task.

Since now we are comfortable with the term predictive modelling, it’s the right time to dive into the stages of predictive modelling.

We can broadly classify the model building life cycle in six following stages:

1> Problem Definition

The very first stage is one of the most important stage of predictive modeling as most of the learners fails to identify the problem statement and all this lead to bad model building.
One should identify the right model statement, ideally formulate the problem mathematically.

Let me discuss this with an example.
Bad problem statement : Want to improve the profitability of credit card customers.
few ways by which we can do this are :-
* Want to increase the APR (annual percentage rate of change/rate of interest) of credit cards.
* Want to deploy different APR for different segments of customers.
* Want to identify the customers segments having lower default rate.
* Want to have different APR and other benefits for different customer segments (on expected default rate) to maximise profit.
But if we see clearly these all are intermediate processes with all of them being dependent on default rate of customer.

So instead a good problem statement could have been
Want to predict the default rate of customer.

2> Understanding Hypothesis Generation

Listing down all possible variables, which might influence problem objective is known as Hypothesis Generation.

These variable should be free from personal bias and preference since quality of model is heavily dependent on quality of hypothesis.

Here is a tricky but very important question for you.
Should Hypothesis Generation be done before or after looking at the data ?

What do you think about this :)

Well Hypothesis Generation is to be done before looking at the data.
Why?
It let’s you think all the factors which might affect the problem without being raised.
If we look the data beforehand , it gets very difficult to look beyond available data.
Also it stops time wastage in analysing all available data.

3> Data Extraction or Collection

Extract/collect data from different sources and combine those for exploration and model building.
Also when we look at the data we might come across few more hypothesis which can improve the model, make sure to capture them in the list of hypothesis.

Example of Data Extraction and collection to predict the default rate of customers.

4> Data Exploration and Transformation

Basically exploration gives insights of data. But why gaining insights in important ?

Because almost always data is provided in tabular form and making sense out it is very hard.

Exploring data and transforming it is one of the crucial stage of model building and this stage clearly checks whether you are a good analyst or bad analyst.

So what is the difference between a good analyst and a bad analyst ?

Basically a good analyst knows his/her data well and so can modify the data and choose the best one technique. While a bad analyst always relies on tools and libraries.

There are around 7 steps for Data Exploration which are beautifully described in the link given below , so please go and check out after reading this completely.

5> Model Building

Model building is a process to create a mathematical model for estimating/predicting the future behaviour based on past data.

So what is a model?
It’s a specific mathematical or computational description that expresses the relationship between a set of input variables and one or more outcome variables that are being studied or predicted.

ex:- A retail bank wants to know the default behaviour of it’s credit card customers. They want to predict the probability of default for each customer with in 3 months.

Now let’s move on to the steps of Model Building

Steps of Model Building

i> Algorithm selection
We select the algorithm based on certain factors as depicted :-

ii> Training model
It is a process to learn relationship/correlation between independent and dependent variables.

Using Train data-set we train our model or learn the relationship between independent and dependent variables.
It also represents the past data.
Test data on the other hand is one does not have a dependent variable and is used to make predictions.

It also represents the future data whose dependent variable is unknown.

iii> Prediction/Scoring
It is a process to estimate/predict dependent variable of test data-set by applying model rules.

Note:- We always apply training learning to the test data-set for prediction/estimation to check the quality of our model as making prediction on train may provide inaccurate and overly optimistic results

6> Model Deployment/Implementation

We have now landed to the last stage of the Predictive Modelling.

A study found out that:
90% of machine learning models build by students, practitioners and data scientists never actually make it into production.

Deployment is also a skill, and an aspiring data scientist need to learn how to get their model into production.

There are a few deployment tools that we can use to deploy the model like Docker, Kubernetes, Heroku etc.

But at first once should focus on running code in fast, optimised way.
In the beginner stages, achieving the outcome is more important. In the later stages of code development, running it efficiently is very important.

End Notes:

In this article we have looked at the 6 stages of Predictive Modelling, how to approach a problem stage by stage and deal with it to get the best modelling results.

Now as you have got the theoretical knowledge of cooking up a model, it’s the right time to get your hands dirty and explore each stage for better understanding.
So once again all the best!
I hope you enjoyed this content.

Hey !!!

If you want to build your first classification model or learn more how to build a model in practical check out these links.

--

--