The Fine Line Between Predictive and Prescriptive Analytics (with examples)

And how to climb the stairs to the state-of-art of Analytics

Victor Almeida
Geek Culture
8 min readApr 14, 2021

--

Photo by Jukan Tateisi on Unsplash

Since 2012, we have heard that famous saying from Gartner that analytics, in a general way, have four levels before we reach the state-of-art. Even though this article was published 9 years ago, it represents well (even too much) the reality of most companies.

The four steps, or the four major kinds of analytics can be defined as:

· Descriptive — What has happened?

· Diagnostic — Why did it happened?

· Predictive — What will happen?

· Prescriptive — How can we make it happen?

The famous Gartner ascendancy model

Descriptive Analytics

We can easily understand the first two since its idea has been well spread across companies. We can say that descriptive analytics came within the first BI generation and made Excel spreadsheets famous. The main idea is that we we can describe (most obvious meaning) the numbers of our organization:

· What is the monthly revenue?

· How much we spent on certain expenses?

· How are the inventory levels?

It is the first step of understanding our data, working on information from the past.

Diagnostic Analytics

Once we know what happened in the past, the second step is diagnostic analytics, asking why it happened. This is solved by using the second BI generation, which we can correlate different data, coming from different sources. Some example questions are:

· Why our revenue is always higher in the end of the year?

· Why do we spend so much in automotive expenses?

· Why is our inventory decreasing so quickly?

This next level requires a little bit of curiosity by the user, and the will to solve problems. Here we focus on the actual problems and why they happen.

If we take a closer look, we could say that descriptive analytics looks back to the past while diagnostic tends to observe the present. By this logic, you shall be thinking that the next level should be to look upon the future. And guess what, you are right! What most people misunderstands is that only predictive analytics observes the future, but not prescriptive.

And then, after all this analytics maturity 101, we come back to the main point of this article.

What is the difference between predictive and prescriptive analytics?

The third and the fourth levels both were made famous by the advent of Data Science and Machine Learning. Do not mistake these terms by ‘foreseeing’ the future, this is still impossible even with the technology we have nowadays. What we data scientists do is infer (or predict) the most probable scenario, based on historical data. So, the real questions are not like:

· What will be my revenue for the next 6 months?

· How much will we spend on automotive expenses next month?

· How much inventory should we have to not have backlog for the next weeks?

But rather:

· What will be my revenue for the next 6 months based on the last two years?

· How much will we spend on automotive expenses next month based on the historical usage of our fleet, brand of the vehicle and the numbers of open delivery orders?

· How much stock should we have to not have backlog for the next weeks based on RFM and number of campaigns made last month?

Machine Learning has always to be based on historical data, we cannot infer something out of thin air. And also, we use the questions answered in the diagnostic phase as initial stage of the model development.

What is important to notice is the output of predictive analytics:

· What will be my revenue for the next 6 months based on the last two years?

Ans: $ 14,242,924.52

· How much will we spend on automotive expenses next month based on the historical usage of our fleet, brand of the vehicle and the numbers of open delivery orders?

Ans: $ 241,242.08

· How much inventory should we have to not have backlog for the next weeks based on RFM and number of campaigns made last month?

Ans: 6,236 items

And, we also get a model, which can be a mathematical equation, that we use to generate these numbers, for example:

Revenue = 1152.94 + 32.98 * <number of products sold> + 790 * <number of campaigns made>

For example, let us say we sold 35,743 products and made 15 campaigns. We get:

Revenue = 1152.94 + 32.98 * 35,743 + 790 * 15

Revenue = $ 1,191,807.08

Of course, we use much more advanced techniques to create models, but that is the main idea of machine learning: creating an ‘equation’ based on historical data to predict the most probable future scenario. Focus on that word, probable.

And what about prescriptive analytics, we wasted up our timeline: past, present, and future. What is there left?

At prescriptive analytics we focus on achieving these probable numbers. The questions now are:

· How do we achieve a revenue of $ 14,242,924.52?

· How do we budget the automotive expenses at $ 241,242.08?

· How do we guarantee we will not have backlog for certain item?

And some of the answers could be:

· How do we achieve a revenue of $ 14,242,924.52?

Ans: Reducing margin by 3% and increasing sales by 10%

· How do we budget the automotive expenses at $ 241,242.08?

Ans: Use this route for vehicle 1, this route for vehicle 2…

· How do we guarantee we will not have backlog for certain item?

Ans: Use the 6,236 items as a baseline and add 10% more as safety margin. Also, buy these items with these suppliers: <list>

In summary, prescriptive analytics are focused on the decision and/or the action.

The main difference in predictive and prescriptive analytics is that, in predictive analytics, we have a machine helping us to take decisions, while in prescriptive analytics we will have the machine telling us what to do to achieve the numbers we got in predictive analytics. Whether we will use the machine recommendation or not will be a human decision.

But why do we need a predictive model to build a prescriptive one?

Time for some hands-on

Let us have an example. Pricing is a retail practice of defining a price that optimizes profit (not revenue). In the example, we will use a private database, and we will not focus on the code, but on the business case.

We already know from basic administration courses in the academy that:

Profit = Revenue - Cost

Revenue = Quantity * Price

So,

Profit = Quantity * Price - Cost

We can estimate a simple demand model, where we target the quantity based on the price only. And we know that the higher the price, the fewer products are sold.

Let us use some data about a certain product in retail:

Monthly sales of a single random product

If we plot this data, we can see a downward pattern. As we expected, the more we increase the price, the less products we sell.

Linear Model for Price x Quantity (made with Seaborn)

From the Demand Theory, we should expect the line equation (y = ax + b) like:

Quantity = -a * Price + b

Where the coefficients a and b will be defined by the statistical model. The minus sign in a reflects the downward trend.

We will use the simplest model available to us, a Linear Regression (using Ordinary Least Squares or OLS). Once we create the model, we get the following results:

Model created with python lib statsmodels

Even though there is a lot of information here, we will focus on the coef column, where we get the information for the Intercept and the Price. The rest of the information tell us about the model performance, which we will not cover here.

Our Quantity x Price model now becomes:

Quantity = 818.72 - 5.14 * Price

Of course, we have an error associated with this equation, we can notice on the Quantity x Price scatterplot a lighter blue area, that is the error. A huge part of our jobs is to minimize this error as much as possible, so we get more accurate results.

Anyways, the final equation we get is the equation that represents is most probable (here is that important word again) scenario that will occur.

Once we have this (predictive) model, we can substitute in our profit equation:

Profit = Quantity * Price - Cost = (818.72 - 5.14 * Price) * Price - Cost

We notice that we have the price squared, and again coming back to our college classes, we know that these equations whose coefficients are powered to 2 (or second order equations) have a curve, which we can estimate the maximum point that represents the maximum profit we can get from a certain product.

For the cost, we will assume a fixed value, but as you may have noticed, we could also create a cost model based also on demand. But let us keep things simple, we will assume that the cost is $ 90.00 per product. Plotting this equation we will get the following chart:

Profit curve

Where we can easily get the maximum profit by setting the price at $124.70, and by that, ending of our prescriptive analysis.

Maximum value chart and value

At the end of the day, what we will have is a robot recommending your company a price for every single product you sell.

Wrap-up

I hope you enjoyed reading this article as I have writing it, and I hope it helps you to understand where you are, and where you want to be in your analytical journey. Notice that every analytical level is a step to the next one, so do not rush to get to the last phase without crossing the previous levels. Each one of the major kinds of analytics will give you the right information you need to ascend to the next one.

--

--

Victor Almeida
Geek Culture

Technology enthusiast, a hell of a curious person, trying to make people understand complex things. Also, a Data Scientist by passion