Our data tell us everything we need

Saphirion AG
Performance Pricing
4 min readAug 31, 2015

In the last post our conclusion was, that it would be very helpful and beneficial having a method that allows to calculate a shouldcost money-value based on clearly measurable properties of a product.

If we are able to do this we could calculate the shouldcost for all our products we want to analysis and immediately find out which are too expensive or even shouldcost more than they do at the moment.

Such a method exists. Sounds like magic? Well, we have implemented it in our NLPP tool and the concept is known for quite a long time. The approach we use is known as a multiple-regression analysis. It’s a statistical process for estimating the relationships among independent and dependent variables.

NLPP result showing “Actual Price” VS “Shouldcost” and three benchmark lines

In our case, the variables are the price-drivers (properties) of the parts and the price. The price is called the “dependent variable” because we assume that a change in a price-driver value somehow changes the price. A simple example shows this: When you buy a car with more horse-power, it normally costs more. Here the horse-power is a price-driver, hence a “independent variable” and price is the “dependent variable”.

Now, the goal is to find a function of the independent variables (our price-drivers) to calculate the dependent variable (the price). Such a function is referred to as the regression function.

Ok, so our task is to find a function where our price-drivers are the variables and we calculate the shouldcost as a result. Something like (PD = price-driver):

PD1 * C1 + PD2 * C2 + … = Shouldcost

Let’s think about this for a moment. How many different functions could exist? What’s your guess? Correct, there are infinite possible functions we could write down. So, there appears to be something missing that we can move on. What’s missing is a simple constraint, a property, our result (the regression function) should full fill:

The resulting regression function should minimize the difference between the price we already know and the calculated shouldcost using the formula.

Now things become more interesting. What this constraint actually means is that the function should fit the reality, as good as possible. We don’t want to get some random results for our shouldcosts. Instead, we are interested in getting shouldcosts that have minimal difference to the actual price. Of course, this makes sense. And performance pricing is all about finding this “single best regression function”.

Perfect, so, let’s just do it. Mathematics have a way how to calculate such a function. And then, we can calculate our shouldcosts and see where the actual price differs by how much from the calculated shouldcost. Where the biggest difference is, that’s where our biggest potential saving is. Your savings goals will be achieved in just a few moments. You will be the hero of your company.

Sounds to good to be true? Well, in a way you are right because there is no single “best method” to calculate such a regression formula, that fulfills our constraint. Or more precisely, there are a lot of different methods how to calculate a regression formula, and I really mean a lot. The only way would be, to calculate all possible regression formulas and compare them which is “best”. That’s impossible to do. So, we are not any step further as before we added the constraint? In a sense yes, there are still infinite different possible best regression formulas. However, things become a lot simpler now.

There are a lot of separate regression methods to find our “best” regression formula. Some methods are better suited for price-analysis, what we do, others are much better at other use-cases. What we can do now is, limit the number of regression methods down to a set that fits our price-analysis use-case. If we do this, the number of methods becomes limited and hence the number of different regression formulas as well.

Knowing what we want to do, price-analysis, and knowing a couple of properties about how market-prices are created, behave, which fundamental economic rules are valid, etc. we can limit the number of possible methods in a way, that we can calculate all possible regression formulas.

What we then need is a means to compare two formulas to decide which one is “better” in our meaning. Which is, minimizes the difference between actual prices and shouldcosts for all the products we analyze in one data-set.

Please re-read the previous sentence. There is a significant (maybe a bit subtle) new constraint added to the game. You find it? It’s the “for all the products we analyze in one data-set” part. This means, there is no single world-wide, valid for all data and lasting forever “best” regression formula. If it would be, we only would have to find one magic formula and would be done forever. Instead we get the “best” regression formula based on our input data (for the regression methods we apply of course).

If we follow this approach, this leads to another very interesting property of our price-analysis regression problem we want to solve. Out input data “decides” of the method that gives the “best” regression formula. So, we can’t know upfront, which method will be utilized. We just have to try it out.

In contrast, this means, that there can’t be a “one size fits all method”. By using several different methods we actually raise the chance, that we find a model that fits reality appropriate. If you only have one method available, your chance is lot lower to really get an appropriate result.

This is the overall way how performance pricing works. There are plenty of things to take care about when doing the math. In the next part, we are giving you a rough overview of the challenges a good performance pricing solution must solve.

--

--

Saphirion AG
Performance Pricing

Most people make decisions by either guessing or using their gut. They will be either be lucky or wrong. You make better decisions with data science.