Evaluating “Performance Pricing” Solutions (Part 1)

Saphirion AG
Performance Pricing
6 min readMar 23, 2016

In the last part, we explained how a performance pricing solution works and what math challenges it needs to solve. In this part, we are digging a bit deeper into some of these challenges and show you which aspects a good performance pricing solution like NLPP needs to support.

No worries, understanding some of these aspects are possible and make a lot of sense even if you are not a mathematician. You get a better understanding of the pitfalls when applying statistical methods.

One method can’t handle every case

As we explained at the end of the last posting: The first and most critical aspect to understand is that one regression method (one mathematical approach) can’t handle all cases you have. There is just no single “one size fits all” approach for statistical price analysis.

Ok, I agree, I have to prove my statement…

Most performance pricing solutions support only one method which is called LPP-LSM. LPP-LSM stands for “Linear Performance Pricing with Least-Square-Method”. It is the classical approach mentioned & used in mostly everything you can find about this topic. It’s a linear approach using a simple approximation method. It’s so popular because there is a lot of literature about it and it’s pretty straightforward to implement.

I want to focus only on one property of LPP-LSM, that proves it can’t handle all cases correctly. By this, I don’t mean “doesn’t give a satisfactory result” instead I say “the result is wrong, totally wrong”. The problem is, that the resulting price estimation formula using LPP-LSM can give negative target prices. So, the model predicts a negative purchasing price.

Need I say more?

And to fix this, the standard approach is to ignore the parts that get negative target prices. Only using the target price predictions that are positive is a loss of reality. One can’t do things like: “This part of the result doesn’t fit my expectation, so I don’t use it, but this part here does hence I use it.” That’s unsound. Either a model holds, or it doesn’t.

More examples are demonstrating that supporting only one method can’t be the correct way for a good performance pricing solution. It is like: If you only know a hammer, everything becomes a nail.

Good performance pricing solutions support more than one analysis method to better handle reality.

R2 (R-squared) looks nice, but is not a friend

Another aspect is how to check if a model is good or not. When using a performance pricing solution, we want to be confident that the resulting price estimation formula is right and we can trust it. A very reasonable requirement, of course.

I bet most solutions utilize the great R2 indicator for this. I further bet, most users use R2 or want to use R2 to get a good feeling about the result. And most will apply a simple rule:

The higher the R2 value is, the better my model is.

And when doing so, questions like “what’s a good value for R2?” or “how big does R2 need to be for the regression model to be valid?” are asked next. If people are adventuresome even claims like this one are made: “a model is not useful unless R2 is at least x”, where x may be some fraction greater than 50%.

The correct response to these questions and ideas should always be a polite smile followed by a “That depends!” or a “These are not the right questions to ask when doing performance pricing.” Everything else should make you suspicious. You risk getting a “too good to be true” pseudo-precision solution that doesn’t correspond to reality.

But before we start I want to give an explanation what R2 is/does (mathematically):

R2 is the “percent of variance explained” by the model. R2 is the fraction by which the variance of the errors is less than the variance of the dependent variable.

Please note, this doesn’t mean “how good my model is” even most performance pricing solutions use R2 exactly in that wrong manner.

The following two properties of R2 show why R2 is not a reliable indicator for performance pricing solutions in my opinion. My points here mostly apply to “adjusted R2” as well.

  1. R2 becomes better the more independent variables (price drivers) you use. Adjusted R2 tries to compensate for this and is always smaller than R2, but the difference is usually very tiny (unless you are trying to estimate too many coefficients from a too small sample in the presence of too much noise, which is a different problem).

    Let’s assume, the wrong case, that R2 means “how good my model is” for a moment. With the stated property, this would translate to “adding more variables make my model better”. So, by just using more price drivers, I can get a better model. Independently if the added variables make a lot of sense or not.
  2. For some variables, the “gets better” jump is bigger than for others. Hence, some performance pricing solutions will show you by how many % the R2 value will get “better” when you add a particular variable. Such suggestions give you the illusion that you make a good move in your analysis. Nevertheless, you don’t, R2 is getting “better” anyway but not necessarily your resulting model.
  3. LSM (least-square) based models get the highest R2 values. Sure, it is caused by how R2 is calculated. The calculation fits to LSM idea. Hence you get good R2 values. If we use different regression methods, the R2 value is smaller or even worse isn’t reasonable to be calculated at all.

    For example, LSM is a mean-based approach, and R2 is mean based. That’s why the combination “works”.

    When you use a median-based approach, what is the meaning of R2 in this case? R2 uses a mean based calculation, how to apply this to a median-based result? That’s impossible, maybe better stated: That doesn’t make any sense.

Wikipedia lists some other aspects. R2 does not indicate whether:

  • the independent variables are a cause of the changes in the dependent variable;
  • omitted-variable bias exists;
  • the correct regression was used;
  • the most appropriate set of independent variables has been chosen;
  • there is collinearity present in the data on the explanatory variables;
  • the model might be improved by using transformed versions of the existing set of independent variables;
  • there are enough data points to make a solid conclusion.

I think at the end of the day, a user of a performance pricing solution shouldn’t care about these things at all.

Good performance pricing solutions find the “best” possible model automatically and let the user focus on the results.

Impact of a price-driver

It is interesting to understand which of the price-drivers (the independent variables) affect the price (the dependent variable) most. Knowing the impact helps to proof supplier argumentations like “This part is so expensive because A, B, C are so …” wrong. If you know that A, B, and C have low impact, you know how to respond.

Here is an example: target-price = 3,558 +0,021 * ‘Quantity [Pcs]’ + 0,102 * ‘Weight [kg]’ + 0,013 * ‘Diameter [mm]’ + 0,020 * ‘Width [mm]’

Most would answer the plain question “Which of the price drivers has the highest impact on the price?” by stating: “Weight, because the coefficient has the biggest value.” Some performance pricing solutions do the same thing.

But it’s wrong and pretty simple to proof: Let’s change the unit of “Weight” from “kg” to “g”. Hence, we multiply every value by 1000. What happens with the coefficient value in such a case? It will be the value divided by 1000. Because at the end of the day, just changing the unit of one variable can’t lead to any other result.

However, if you now just look at the coefficients, “Weight” is no longer the one with the biggest impact. Nonetheless, nothing changed in your data.

Good performance pricing solutions show you an impact number that will be the same when you change units.

Conclusion

We are all no good intuitive mathematicians, and we are even worse intuitive statisticians. Don’t be fooled by simple explanations that sound so “logical” that you don’t proof them. Most of these are just plain dishonest.

We have created a short questionnaire that lists, even more, aspects to take a look. If you have an interest in it, just drop us a message.

--

--

Saphirion AG
Performance Pricing

Most people make decisions by either guessing or using their gut. They will be either be lucky or wrong. You make better decisions with data science.