**The Fine Line Between Predictive and Prescriptive Analytics (with examples)**

## And how to climb the stairs to the state-of-art of Analytics

Since 2012, we have heard that famous saying from Gartner that analytics, in a general way, have four levels before we reach the state-of-art. Even though this article was published 9 years ago, it represents well (even too much) the reality of most companies.

The four steps, or the four major kinds of analytics can be defined as:

· Descriptive — What has happened?

· Diagnostic — Why did it happened?

· Predictive — What will happen?

· Prescriptive — How can we make it happen?

# Descriptive Analytics

We can easily understand the first two since its idea has been well spread across companies. We can say that **descriptive** analytics came within the **first BI generation** and made Excel spreadsheets famous. The main idea is that we we can describe (most obvious meaning) the numbers of our organization:

·

Whatisthe monthly revenue?·

How muchwe spent on certain expenses?·

How arethe inventory levels?

It is the first step of understanding our data, working on information from the **past**.

# Diagnostic Analytics

Once we know what happened in the past, the second step is **diagnostic **analytics**, **asking why it happened. This is solved by using the **second BI generation**, which we can correlate different data, coming from different sources. Some example questions are:

·

Whyour revenue is always higher in the end of the year?·

Whydo we spend so much in automotive expenses?·

Whyis our inventory decreasing so quickly?

This next level requires a little bit of curiosity by the user, and the will to solve problems. Here we focus on the **actual** problems and **why** they happen.

If we take a closer look, we could say that descriptive analytics looks back to the **past** while diagnostic tends to observe the **present**. By this logic, you shall be thinking that the next level should be to look upon the **future**. And guess what, you are right! What most people misunderstands is that **only** predictive analytics observes the future, but not prescriptive.

And then, after all this analytics maturity 101, we come back to the main point of this article.

**What is the difference between predictive and prescriptive analytics?**

The third and the fourth levels both were made famous by the advent of Data Science and Machine Learning. Do not mistake these terms by ‘foreseeing’ the future, this is still impossible even with the technology we have nowadays. What we data scientists do is **infer** (or predict) the most probable scenario, **based on historical data**. So, the real questions are **not** like:

· What will be my revenue for the next 6 months?

· How much will we spend on automotive expenses next month?

· How much inventory should we have to not have backlog for the next weeks?

But rather:

· What will be my revenue for the next 6 months

based onthe last two years?· How much will we spend on automotive expenses next month

based onthe historical usage of our fleet, brand of the vehicle and the numbers of open delivery orders?· How much stock should we have to not have backlog for the next weeks

based onRFM and number of campaigns made last month?

Machine Learning has **always** to be based on historical data, we cannot infer something out of thin air. And also, we use the questions answered in the diagnostic phase as initial stage of the model development.

What is important to notice is the output of predictive analytics:

· What will be my revenue for the next 6 months

based onthe last two years?Ans: $ 14,242,924.52

· How much will we spend on automotive expenses next month

based onthe historical usage of our fleet, brand of the vehicle and the numbers of open delivery orders?Ans: $ 241,242.08

· How much inventory should we have to not have backlog for the next weeks

based onRFM and number of campaigns made last month?Ans: 6,236 items

And, we also get a model, which can be a mathematical equation, that we use to generate these numbers, for example:

Revenue = 1152.94 + 32.98 *

<number of products sold>+ 790 *<number of campaigns made>

For example, let us say we sold **35,743** products and made **15** campaigns. We get:

Revenue = 1152.94 + 32.98 *

35,743+ 790 *15Revenue = $ 1,191,807.08

Of course, we use much more advanced techniques to create models, but that is the main idea of machine learning: creating an ‘equation’ based on historical data to predict the most **probable** future scenario. Focus on that word, **probable**.

And what about prescriptive analytics, we wasted up our timeline: past, present, and future. What is there left?

At prescriptive analytics we focus on **achieving **these probable numbers. The questions now are:

·

Howdo we achieve a revenue of $ 14,242,924.52?·

Howdo we budget the automotive expenses at $ 241,242.08?·

Howdo we guarantee we will not have backlog for certain item?

And some of the answers could be:

·

Howdo we achieve a revenue of $ 14,242,924.52?Ans: Reducing margin by 3% and increasing sales by 10%

·

Howdo we budget the automotive expenses at $ 241,242.08?Ans: Use this route for vehicle 1, this route for vehicle 2…

·

Howdo we guarantee we will not have backlog for certain item?Ans: Use the 6,236 items as a baseline and add 10% more as safety margin. Also, buy these items with these suppliers: <list>

In summary, prescriptive analytics are focused on the **decision** and/or the **action**.

The main difference in predictive and prescriptive analytics is that, in predictive analytics, we have a machine **helping us to take decisions**, while in prescriptive analytics we will have the machine **telling us what to do** to achieve the numbers we got in predictive analytics. Whether we will use the machine recommendation or not will be a human decision.

But why do we need a predictive model to build a prescriptive one?

# Time for some hands-on

Let us have an example. **Pricing** is a retail practice of defining a price that optimizes **profit **(not revenue). In the example, we will use a private database, and we will not focus on the code, but on the business case.

We already know from basic administration courses in the academy that:

Profit = Revenue - Cost

Revenue = Quantity * Price

So,

Profit = Quantity * Price - Cost

We can estimate a simple demand model, where we target the **quantity** based on the **price** only. And we know that the higher the price, the fewer products are sold.

Let us use some data about a certain product in retail:

If we plot this data, we can see a downward pattern. As we expected, the more we increase the price, the less products we sell.

From the Demand Theory, we should expect the line equation (y = ax + b) like:

Quantity =

-a* Price +b

Where the coefficients a and b will be defined by the statistical model. The minus sign in **a** reflects the downward trend.

We will use the simplest model available to us, a Linear Regression (using Ordinary Least Squares or OLS). Once we create the model, we get the following results:

Even though there is a lot of information here, we will focus on the **coef** column, where we get the information for the Intercept and the Price. The rest of the information tell us about the model performance, which we will not cover here.

Our Quantity x Price model now becomes:

Quantity = 818.72 - 5.14 * Price

Of course, we have an error associated with this equation, we can notice on the Quantity x Price scatterplot a lighter blue area, that is the error. A huge part of our jobs is to minimize this error as much as possible, so we get more accurate results.

Anyways, the final equation we get is the equation that represents is most **probable **(here is that important word again) scenario that will occur.

Once we have this (predictive) model, we can substitute in our profit equation:

Profit = Quantity * Price - Cost = (818.72 - 5.14 * Price) * Price - Cost

We notice that we have the price squared, and again coming back to our college classes, we know that these equations whose coefficients are powered to 2 (or second order equations) have a curve, which we can estimate the **maximum point** that represents the maximum profit we can get from a certain product.

For the cost, we will assume a fixed value, but as you may have noticed, we could also create a cost model based also on demand. But let us keep things simple, we will assume that the cost is $ 90.00 per product. Plotting this equation we will get the following chart:

Where we can easily get the maximum profit by setting the price at $124.70, and by that, ending of our **prescriptive** analysis.

At the end of the day, what we will have is a robot recommending your company a price for every single product you sell.

# Wrap-up

I hope you enjoyed reading this article as I have writing it, and I hope it helps you to understand where you are, and where you want to be in your analytical journey. Notice that every analytical level is a step to the next one, so do not rush to get to the last phase without crossing the previous levels. Each one of the major kinds of analytics will give you the right information you need to ascend to the next one.