michael blastland
Feb 4 · 8 min read

Question: How many sheep?

Two? Or maybe one?

Does the lamb count?

No? A lamb isn’t a sheep, then?

All right, how about one and a half?

But not so fast… the ewe is about to give birth, imminently (I made that up, but let’s pretend).

So, what about one and two halves? That makes two, like some of you said, even if you didn’t mean that kind of two. Or maybe one, plus a half, plus a quarter?

And no, the white thing just visible above the one sheep that’s most uncontestably a sheep doesn’t count: that’s a rock.

Oh, come on… make up your mind!

You take the point: counting higher than one just got difficult, and the reason is all to do with definition. Whenever we count something, we define it. And if we want to count two, we’re saying they’re in some important way the same thing.

The simple idea here is that definitional ambiguity leads to uncertainty: how many sheep are there if we don’t agree what a sheep is / what season to count them in (maybe not in Spring) / etc?

What we measure, using what definitions and methods, can be highly contestable, and in much of our public data these things can and do change. This means that even if our counting itself is perfectly accurate, there will still be uncertainty.

Sheep are at the relatively easy end of the problem and — you’ll gather — not really what this post is about. We have bigger statistical fish to fry (don’t you love metaphors?), like comparisons of rates of cancer or other disease between countries, for example, or comparisons of productivity or GDP growth.

It’s the latter - the economic measures - that are our focus here. Vital to our understanding of what’s going on in the economy and capable of rattling politics with one bad set of numbers, these too are full of definitional headaches.

That matters for two main reasons. First, because the effects of definitional changes in economics can be huge; and second, despite this, because the implications for the reliability of any given economic number are often ignored and, with that, the implications for any policy based on those numbers.

Occasionally, though, the issue is forced into the open. Take UK productivity. In the long run, said the economist Paul Krugman, productivity is almost everything. As we produce more value for each hour worked (in the case of labour productivity) a modern economy grows richer.

Among current concerns about UK productivity are the apparent facts that its growth has slowed dramatically in recent years but we’re not entirely sure why — known as the ‘productivity puzzle’ — and that labour productivity per hour is strikingly lower than in many other large economies. You don’t have to look far to find people who are agitated. Take this comment piece, for example, from The Times in January.

But the UK is not lagging 20% behind. Or at least we don’t think so. And not because it abruptly caught up. About a month earlier in fact, the best estimate of the productivity gap fell overnight from about 20% to about 10% — an enormous correction. The OECD, working with the UK’s Office for National Statistics, said it had spotted differences in the way countries counted how many hours people worked. And if hours worked are counted differently, then output per hour also changes.

The UK had been making very few adjustments to reported working hours. In France, by contrast, those reported hours are corrected for holidays, strikes, sick leave and so on. This means French working hours appear shorter, and output per hour higher, than they would under the old UK formula. Once on a more like-for-like basis, it turns out that the notoriously overworked Brits actually work shorter hours than the notoriously long-lunching French — and the productivity gap between the two countries roughly halves.* Here’s how the OECD showed the new data.

As a consequence of the change in estimated working hours, the gap in labour productivity with the United States is also around 8 percentage points smaller than previously thought — closing from 24% to 16%. With Germany, the gap shrinks from 22% to 14%. UK productivity also leapfrogs Italy.

Clearly, the gaps with France, Germany and the US haven’t disappeared. They are still, some might argue, embarrassing — though there might also be benign reasons for some of the gap in some cases. For example, France has fewer people in work and higher unemployment — UK about 4% unemployment, France about 9% (OECD). Are many of the UK’s extra jobs low-cost, low-wage, therefore often calculated as low productivity? (NB: This is no reflection on how hard low-paid people work, and it is not a moral valuation. It’s simply how the calculation of economic output is done). We could probably raise average productivity per hour in the UK if we sacked a few. But do you think we should? In other words, being behind might not be all bad.

However we interpret the gap, this was unquestionably a massive correction, with hindsight suggesting that we should have had serious doubts about how big the gap really was.

And yet, there was little sign of doubt in the presentation of that old data, nor much in those who discussed it. When The Times picked up the figure of 20%, presumably from an ONS press release or website, the published data had not yet been revised to take account of this latest methodological correction, and the paper appears to have been unaware of the OECD’s findings, or that a correction was on the way. Yet with hindsight we see that, because of the potential for methodological or definitional differences, we should not have accepted those old numbers with anything like confidence.

The problem extends further. Labour productivity has two components: hours worked — discussed above — and how much value we crank out of them. Definitional problems apply to both. Any measure of national output, like GDP, also rests on a thousand-and-one definitions, some subject to intense research and argument about what might have changed in an increasingly digital and service-based economy where output is far from easy to define. Some commentators suggest that current definitions fail to capture significant chunks of that output. Presumably, these definitions might also be revised.

The extensive doubts about what’s going on in these cases are essentially caused by definitional ambiguity, incorporated into methodology. What should we count when we count hours worked? When we compare counts in one place with those in another, are we sure we have every potential variation in method nailed down?

As for differences between places, so for differences in time, which can also raise methodological questions about GDP. Are we sure that our methods, whatever they are, don’t need to be changed in some way to account for changing patterns of work or output?

No, we are not sure, because we can’t be sure. We can’t know all the problems that might lurk in the data, old, new or emerging, as a result of our choice of method or definition, or with what magnitude of effect.

Then what do we do?

Do we assume we have now fixed the productivity comparison? Or could there be something else going on? Should we just whack on an extra ten percentage point potential error just in case, in either direction? What if definitions of output do also change — as seems likely at some point — when we find new ways to describe value in a service-based, digital economy, and what if these changes have varied effects on different economies which have different balances of economic activity — some more manufacturing based, others more service heavy? Then productivity will change again, and so will the gaps, even though real productivity — whatever that is — will be chugging along in its own sweet way wholly unaffected by any of these changes to what we say it’s doing.

When I discussed the productivity correction with the Winton Centre’s David Spiegelhalter, he said he was amazed at how little attention it attracted.

“This is so important, and it is generally glossed over,” he said. “I think politicians and journalists are deeply disturbed at realising the frailty of these concepts that they use in arguments, and would simply rather not know.”

I think he’s right that few are ready to face up to the frailties. On the other hand, I suspect a lot of us know deep down that these things couldn’t be other than frail. So we finish up with a classic cognitive dissonance: we know it’s dodgy, but we’d really, really like it to be reliable, and the way that we talk about the published figures, obsessing over small changes, implies that we simultaneously think there’s a solid ‘productivity’ or ‘GDP’ out there — and that the number is somehow true. The problem, then, is how to force the chatter out of its denial about what many know are the awkward limitations.

But David also invites a question: whether we should think of definitional problems as ‘uncertainty’: “I would not even call this uncertainty, since what is it we are uncertain of? Unless you think there is some true, platonic, ‘productivity’ out there, this is all convention, and so it is all ambiguity.

“’Uncertain’ should mean there is at least a theoretical possibility of being certain, either through additional knowledge or simply waiting to see what happens. The hint is in the term — ‘uncertain’”

From a practical point of view, I guess what most matters is whether the economy — including productivity — is going in a good direction for the right reasons at a reasonable rate, so that if it’s not we can try to change policy. In that sense, I think we use the word ‘uncertain’ to qualify our confidence in what we think is a reasonable judgement based on the data available.

But if we’re altogether uncertain how uncertain we should be, or what the thing is that we’re uncertain of, perhaps the problem goes beyond that.

The least we can do is acknowledge it — which journalism almost never does, and even the ONS doesn’t do much. I’d be delighted to hear any ideas about what a suitable qualification to the data should look like in these cases… if it’s possible to frame one. Meanwhile, we should use examples like the OECD’s revisions of labour productivity to remind ourselves that numbers we might be inclined to take for granted can be badly wrong in ways we hadn’t even thought of or understood.

*Do the French really work less than the Brits? Average working hours might not be the best measure. If a larger part of the working population in the UK than in France is part-time (UK about 23%, France about 14%, according to the OECD), then average UK hours will look low, even though full-time UK employees might be the bigger slaves to the desk. A definition of working hours that might sound sensible for the purposes of assessing productivity might be less good at telling us what we want to know about a long working hours culture.


The Winton Centre for Risk and Evidence Communication is hosted within the Department of Pure Mathematics and Mathematical Statistics in the University of Cambridge. Transparent evidence designed to inform, not to persuade.

michael blastland

Written by


The Winton Centre for Risk and Evidence Communication is hosted within the Department of Pure Mathematics and Mathematical Statistics in the University of Cambridge. Transparent evidence designed to inform, not to persuade.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade