Evaluating “Performance Pricing” Solutions (Part 2)

Saphirion AG
Performance Pricing
9 min readAug 6, 2017

As we explained in the first section in “Evaluating Performance Pricing Solutions (Part 1)”: The first and most critical aspect to understand is that one regression method (one mathematical approach) can not handle all cases you face during everyday work. There is just no single “one size fits all” approach for statistical price analysis/performance pricing.

Ok, I agree, we have to prove our statement… This time we elaborate on the topic providing examples. These examples show immediately the importance of this aspect. And that there is no “good enough result” when you use the wrong regression method.

Before we start, some context

Performance pricing uses the overall idea of using regression analysis for price analysis. Regression analysis finds out the relationship between a dependent variable (price) and one or more independent variables (product properties). It helps one understand how the value of the dependent variable (price) changes when any one of the independent variables changes (product properties). Regression analysis is widely used for prediction and forecasting.

The result of a performance pricing analysis, using regressions analysis, is a target-price formula. Using this target-price formula you can feed product properties into it and get back a target-price/should-cost of that particular product. Once you have a target-price formula, you can predict prices of any products. This is how such a formula for a machined part might look:

target-price = 3,558 +0,021 * ‘Quantity [Pcs]’ + 0,102 * ‘Weight [kg]’ + 0,013 * ‘Diameter [mm]’ + 0,020 * ‘Width [mm]’

Quantity, Weight, Diameter and Width are product properties that differentiate between different machined parts.

A performance pricing solution uses the information (price and product properties) of a set of known machined parts as input to find this target-price formula.

From theory to reality

So far, so good. But reality makes such a regression analysis for price analysis pretty complicated. The following list shows some aspects that need to be considered:

  1. You want to get a reliable and realistic target-price formula you can trust. It doesn’t make sense if a performance pricing solution gives you target-prices that are unrealistic.
  2. Regression methods provide reliable results (in our case the target-price formula) only if their mathematical pre-conditions are fulfilled, and the method correctly captures the structure (how product properties impact the price) of the input data.
  3. It is easy to calculate many different regression models (target-price formula) which do not capture the structure of the input data and then calculate an unreliable and incorrect target-price for each part number.
  4. Only reliable regression models can capture the relationship between product properties and price.
  5. The regression methods used must extract the maximum amount of information from the input data (gain of knowledge) to calculate a regression model (target-price formula) with the best possible predicting power.

In summary, this leads to:

Only models that capture the structure of the input data and extract as much information as possible give reliable and usable results.

And now comes the big problem: Every regression method has some assumption about how the input data is structured. And only if this is the case is the result reliable. The problem is that we are unable to prove this assumption about the data structure up front for the methods we want to use.

What can we do in such a situation?

The first thing that should be clear is that it’s very likely that one regression method alone cannot give us a reliable target-price formula in all cases.

Next, if we are unable to prove the assumption about how the input data is structured before analysis, the only option left is to check how good the target-price formula is afterward.

If we have several regression methods at hand, we need to find out, which one gives the best target-price formula.

Reality ruins your analysis result

Our Non-Linear Performance Pricing solution NLPP supports six different regression methods. It can automatically find out which of the six methods best captures the structure of the input data.

If you are planning to do performance pricing with statistical tools like Minitab etc. or if you are evaluating different performance pricing solutions, you will end up using a method called LPP-LSM which stands for:

Liner-Performance-Pricing using Least-Square-Method

On Wikipedia, you can read more about the Ordinary Least Squares (OLS) method.

Synthetic Generated Data

We created a tool that generates data with a specific structure using a Monte Carlo simulation and adding a bit of randomness to the result. So, for the generated data we know up front which structure it has.

With such a data-set we can evaluate two things pretty easily:

  1. Does the performance pricing tool recognize the structure of the input data correctly?
  2. What happens if you apply a wrong method to the data?

The following sections will show what happens if you use a wrong method on your input data.

For our tests, we created data sets containing 100 entries (products) with one product property and a price. Such input data has the simplest structure for doing a regression analysis because we only have one parameter. Here is how such a generated file looks like:

Example of generated data with LPP-LSM structure

LPP-LSM Case

Let us start with the simplest case which most statistical software tools and performance pricing solutions support: The data is linear and has a normal distribution.

The following two graphics show a distribution plot of the independent variable (product property) and the dependent variable (price). The dashed line is the average, the solid line the median and 50% of all values are within the red box range. There are no outliers, to make the test even simpler.

Distribution Plot of the “Product Property” for LPP-LSM data
Distribution Plot of the “Price” for LPP-LSM data.

Now let us take a look at the result. The following graphic shows on the vertical axis the actual price from our input data and on the horizontal axis the target-price based on our regression result and calculated with the target-price formula.

Actual vs. Target price LPP-LSM

The three lines are benchmark lines which show most likely upper (red), target (blue) and lower price bounds (green) for every data point.

The result looks good, which is obvious because we use LPP-LSM method on LPP-LSM data.

The target-price formula (regression analysis result) looks like this:

target-price = 11,087.462 + 368.099 * ‘product-property’

Since NLPP supports six different regression methods, it calculates how many times more likely the above result is than any of the other five methods.

As you can see: The next likely model is the non-linear version (NLPP) using the same structure (LSM). After this the linear version (LPP) with the QR structure etc.

The order in the list is exactly expected and shows that NLPP can determine this reliable and correct.

Correct NLPP-QR Case

Now let us do the same with an NLPP-QR case. The data is non-linear and exponentially distributed. Such input data is pervasive in real-life price analysis.

Distribution Plot of the “Product Property” for NLPP-QR data
Distribution Plot of the “Price” for NLPP-QR data.

As you can see the distribution plot for the product property looks pretty much the same as for the LPP-LSM case. But the distribution plot for the price now looks entirely different. The distribution shows that the relation between product property and price must be somehow different.

Again, the plot of the regression result:

Actual vs. Target price NLPP-QR

The three benchmark lines are now not parallel because the data is non-linear.

And the regression formula:

target-price = exp (4,923 + 0,049 * ‘product-property’)

Taking a look at how many times more likely the above result is than any of the other five methods are kind of interesting now:

For an NLPP-QR structured input data, the other non-linear methods are much more likely than any linear method. This result makes a lot of sense.

If you compare the above table to the LPP-LSM table, you see that applying a non-linear method to linear data is much “better” than using a linear method with non-linear data.

Wrong Case: Using LPP-LSM method on NLPP-QR data

In this section, we show what happens if you only have one method like LPP-LSM available and apply it to data that does not full fill the necessary assumptions.

The distribution of the input data is, of course, the same as for the previous case. Hence I do not repeat it here.

Here is the analysis result plot:

Actual vs. Target price misusing LPP-LSM on NLPP-QR data

To be clear, we used the same input data as before but now used an LPP-LSM method to analyze it. It is pretty obvious that this result plot looks suspicious and strange.

Since we use perfectly generated data, you can even see that there seems to be a non-linear structure in the data. But your software using LPP-LSM cannot do any better. It gives you a result that just does not fit.

But please keep in mind, we used perfectly generated data, that is why you can see the problem. Real life data is not perfect, and you would not be able to see immediately, that that a result cannot be correct.

Maybe you are lucky and will find out the problem when looking at every data point in detail. One hint would be, that you can see above as well, that there are some negative target-prices predicted. All points left from 0 on the horizontal line have negative target-prices.

Not very reliable for a result of a price analysis tool.

By the way, the target-price regression formula is:

target-price = -10.678,879 + 249,936 * ‘product-property’

We can now compare the two results and plot the absolute difference between the correct NLPP-QR result and the wrong LPP-LSM result. This shows how dramatically wrong the LPP-LSM results are:

The absolute difference between correct NLPP-QR and wrong LPP-LSM target-price sorted by size

We used 100 products in our test data set. As you can see mostly all LPP-LSM target-prices are not just a bit wrong or are close enough to be used, these target-prices are just wrong.

If you are using a result calculated with a wrong method, reality just ruined your analysis result and all decisions you base on it.

Conclusion

We showed the need to use the correct regression method that fits the structure of the data. Otherwise, the results are just nonsense.

Our test case is a straightforward one. We used perfectly generated data and only one independent variable. It cannot get any simpler for analysis tools. Our NLPP tool can reliably recognize the structure of your input data fully automatically and choose a regression method which gives the best target-price formula.

However, most tools are not able to even recognize and handle this simple case correctly. How likely is it, that the result will be reliable for real life data?

And for real life data, you may not recognize these type of problems. The results will still be totally wrong. If the target-price for some parts might be correct, it is pure luck.

Using a wrong regression method on your data is equivalent to throwing dice to get a target-price in terms of reliability and quality. You would not do the latter, do you really want to do the former?

If you want to get our perfectly generated data sets for evaluation purposes of your current tool or because you are evaluating performance pricing solutions, just send an email to info@saphirion.com, and we are more than happy to help.

--

--

Saphirion AG
Performance Pricing

Most people make decisions by either guessing or using their gut. They will be either be lucky or wrong. You make better decisions with data science.