The Measurement Problem Part 3: Desperate measures

Published in

We Are Systematic

10 min readJan 12, 2021

The third measurement problem is a paradox: You can’t manage what you can’t measure, but when any metric becomes a target, it ceases to be a good measure. How do we measure and target well when observation changes the outcome?

This is the third in a series of posts based on quantum physics as an extended metaphor for product development. You can read the first post here, and the second post here. I should caveat by saying I don’t understand quantum mechanics: no-one does. But I do find it fascinating and find it to be a wonderfully compelling and surprisingly comprehensive analogy, so why not?

A tale of two quotes

We all love a good quote. Something that sums up your feelings about a topic so succinctly, or applies just the right idea at the right angle to perfectly blow one’s mind. If we’re being cynical, quotes are often the perfect alternative to actually knowing things. That’s certainly how I use them.

This post was inspired by two quotes that I’ve seen do the rounds at a number of companies I’ve worked with. These two quotes are interesting in that they cover exactly the same strategic ground, but appear completely contradictory.

The thesis:

“You can’t manage what you can’t measure.”
— Management Consultant Peter Drucker

The antithesis:

“When a measure becomes a target, it stops being a good measure.”
— Economist Charles Goodhart

We won’t play favourites and assume that Goodhart is necessarily correct because he is an economist and Drucker is necessarily full of shit because he’s a management consultant. For the purposes of this problem, let’s assume an equal weight of expertise.

And of course, I have another quote to perfectly describe this paradox; that two equally qualified experts can come to opposite conclusions:

“For every PhD there is an equal and opposite PhD.”
— Gibsons Law

In this post, I’m going to attempt to provide synthesis to these two positions. Let's see how I get on.

The Observer Effect, again

The paradox described above is the third way I’m applying the Observer Effect as an analogy for agency and business strategy. I described this effect in detail in the first post in this series, so if you haven’t already I’d encourage you to start there to save covering old territory here.

If you don’t have time for that or would rather not, here’s a quick recap:

The Observer Effect is a phenomenon in quantum mechanics whereby the act of observing an event changes the outcome of the event
The most famous thought experiment on the Observer Effect is Schrödinger’s Cat
There are many similar Measurement Problems for user researchers, analysts and marketers, the subject of this mini-series
In Part 1, I argued the more clients are involved in projects given to agencies, the more they change the outcome — a creative and strategic error
In Part 2, we saw how peeking at your A/B tests before they’ve finished can invalidate your results

In this post, we’ll discuss whether it’s possible to use any measure as a target, given that observation necessarily changes the outcome.

You can’t manage what you can’t measure

I’m a great believer in the sentiment Drucker is describing here. By any definition of ‘management’, it’s impossible to imagine a scenario where it would be possible to control, lead, correct or react correctly without knowing first what’s occurring.

I’m sure there are a great many successful companies that don’t care much for measurement, but they can’t be lucky forever or will certainly waste a great many resources in what is best described as the business equivalent of ‘flapping about’.

When a measure becomes a target, it stops being a good measure

The confounding factor to Drucker’s quote. There’s no denying this is a problem here: as soon as someone knows their job or reputation is tied so concretely to a single measure you can guarantee that target is going to be met somehow.

This is what economists refer to as the management of incentives. If you set a target, with no other strings attached or contextual measures a person is incentivised to make sure that target is hit.

The strategic problem here is not so much the concern that people will lie. It’s hard to maintain a lie like this without a bubble bursting or getting found out eventually, and most smart people who aren’t desperate will recognise this. Rather, the problem is that the scrutiny will mean the measure itself ceases to be a good reflection of reality, similar to the problems with probability I discussed in part 2 of this series.

Let’s take a recent and high profile example. On 2nd April 2020, the UK health secretary Matt Hancock announced a new target of 100,000 coronavirus tests being conducted per day by the end of the month. On the 1st May, Hancock announced that the target had been met with 122,347 tests conducted in the previous 24 hours.

On the same day, an article in the HSJ revealed “The government changed the way it count[ed]” tests in order to meet that target. The final figure included test kits that had been despatched but not yet completed and returned. This late addition was expected to make up around a third of the final tally.

Whether this is cheating or not is another argument, but it is clear that counting despatched tests is against the spirit of the target. It also demonstrates how the number being tracked no longer represented a fair measure of the number of tests undertaken on that day due to the intense pressure to meet the target.

This is another example of how observation (in this case, targets and scrutiny) changes the outcome (in this case, the quality of a measure to describe the effect it is measuring).

So is all reporting futile? On the basis that measurement is necessary for an effective strategy, how best to do this and maintain the integrity of your targets?

A systematic approach to measurement

Here I can share with you our approach. In order to maintain the integrity and validity of measures, we prefer to keep things lean and straightforward. Our lean framework is simple enough to provide clarity and focus but is also sufficiently transparent that it is clear when a measure has become compromised.

When we start on any project or working with a new product, we draft a measurement framework that describes how we will evaluate performance. The framework will be used to track our progress so we know what works and can take action to course-correct if you see the trends start to move in the wrong direction.

Measurement frameworks don’t need to be complicated. Ideally, they are small enough to summarise on a single slide or sketched on the back of a meeting agenda. Keeping it lean makes it easier to reach an agreement between stakeholders and get started.

Step 1: Start with a table

Start with a blank table. Label each row with an area of interest. This could be stages of the customer journey, functional areas of a product or teams and departments, depending on the project.

Measurement frameworks can include a lot of information, but I’ve found you really only need three more columns to get started. So sketch your table out like this:

Lean measurement framework: Areas of interest

Step 2: Key Performance Indicators

First, each area of interest needs a Key Performance Indicator (KPI). This is annoying business jargon, but it’s useful. Your KPI is the one metric that really matters. When it all comes down to it, which number do you really want to see going up when you present your results?

Your KPI should ideally be a contextual metric. That is a number which describes the rate, percentage or ratio of an effect, not an absolute amount. This is to avoid vanity metrics: reporting big numbers just because they’re big.

If you record 1000 app downloads in a month, is that good? Maybe, it depends on who you are and what the app is. A more useful metric might be the download growth rate or the percentage of users who download then successfully onboard.

Now you’re tracking numbers that mean something. These are contextual metrics. They have their context built-in, and they make the best KPIs.

And you only get one. Use more than one KPI, and your losing focus. It also allows you to cherry-pick good news. Having confidence in your numbers means taking a more scientific approach, which means calling your shots. State in advance which number you care about, then measure the results.

Lean measurement framework: Key Performance Indicators (KPIs)

Don’t get me wrong, other numbers are important. As we discussed before, it can be easy to get one needle to move if you’re happy to sacrifice integrity and context. That’s where our next column comes in, health metrics.

Step 3: Health metrics

Health metrics are how we manage Goodhart’s law. They are everything you need to monitor to know your efforts to improve your KPI aren’t causing harm in other areas. They also provide more context to your KPI, to ensure improvements are being felt in real life, not solely in the numbers.

For example, if your Conversion Rate is going up, but your Average Transaction Value is going down, that’s a problem for your revenue and an indication the system is being gamed to meet targets rather than create real improvement.

Lean measurement framework: Health metrics

Step 4: Where the numbers come from

Finally, we need to agree on how we’ll measure our KPIs. There are often multiple sources of the same data, and unless you’re very lucky they rarely correlate perfectly due to the different ways different platforms track and record data. Numbers will also differ over different measurement periods. Just to make sure everyone’s using the same numbers and to avoid an argument, we’ll include this as the final column in our measurement framework.

Lean measurement framework: Source/frequency

As I mentioned above, there are many columns you could add to add richness to your framework, depending on the project. Some of my favourites are:

Target: where do we need our KPI to be?
Baseline: what’s our starting point based on a rolling average or the last time this was measured?
Owner: who is responsible for each KPI?

So that's the lean measurement framework — a simple way to create a plan to measure the success of your product, easily fitting on a single slide or constructed collaboratively using a whiteboard and/or sticky notes.

The best approach to any project document is to believe in it strongly and hold it lightly. These things are not carved in stone. Check-in with your framework at regular intervals to ensure it’s still a good measure of your progress and performance. If not, change and adapt. This is largely preferable to having to explain why you are no longer able to accurately measure your progress against goals!

What about areas of interest that can’t be measured?

Occasionally I do come across the claim that some areas of interest aren’t unmeasurable. The impact of good PR, for example. Or the level of positive customer feeling towards your product.

Like the good empiricist that I am, I believe whilst it’s certainly true these are harder effects to measure, this is basically a cop-out.

To rearrange Drucker’s quote from earlier: If you can manage it, you can measure it.

The basis of the scientific method is investigating the relationship between two variables, in order to prove a causal relationship. The thing you manage, prod or generally mess with is known as an independent variable. The thing that is affected by your ‘messing’ is known as a dependent variable.

For example, if I give my cat more food (independent variable) its weight (dependent variable) will increase.

There are also confounding variables. These are externalities that spoil your cause-and-effect by impacting either the independent or the dependent variable.

For example, if my cat starts eating more mice at night (confounding variable), her weight will increase independently of my giving her food.

There are always ways to measure your area of interest, though it may require clever use of proxies.

To take our example from earlier about the effect of good PR. Maybe you can’t directly measure the effect on your dependent variable (good feeling towards your product or brand) but you can allocate proxy measures from which you triangulate your progress.

In this case, you might measure social media sentiment around your PR activity, or Google searches containing keywords from your PR campaign. These are proxies for the impact you are having. You will have to control for confounding variables of course (perhaps an industry news story breaks that generates interest in your product independent of your PR), but this is doable with sufficient analysis of the data.

Even better, you could create a formula that calculates a single KPI from a range of proxies to ensure clarity in your measurement framework. This has the advantage of creating a meaningful KPI that is bespoke to your specific context.

So next time an agency claims their artistic bravery and brilliance is simply far too nebulous for such a base practice as measuring effect, treat this with some scepticism.

Conclusion

The third measurement problem is a paradox: you can’t manage what you can’t measure, but as soon as a measure becomes important enough to be a target it ceases to be useful. This is the Observer Effect at work again, where the mere act of measurement changes the outcome.

Measurement is important. In fact, anything you intend to manage can, and should, be measured to ensure you’re having the effect you intend. Agencies have a responsibility to their clients to actively report on their own success and course-correct when things are clearly not working.

So to ensure you can measure without compromising the integrity of your results, try a lean measurement framework as described in this post. Assign each area of interest one (and only one) KPI based on a good quality contextual metric. Identify health metrics to contextualise and ensure the validity of your primary measure. Agree where the data will come from and how often it will be reported. Believe in it strongly, but hold it lightly.

I’ll end this post as I started, with another favourite quote of mine:

“The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it.”
— Brandolini’s Law