CODEX

Measuring the impossible

Finn Shewell
CodeX
Published in
4 min readMar 11, 2021

--

🐑🐑🐑🐑🐑

Imagine for a second you’re a farmer. You’ve got a flock of sheep, and at ~$150 a ewe, you want to make sure they’re well looked after — right down to the quality of grass they graze on every day.

Given the choice, would you spend $200,000 installing sub-terrain humidity detectors and taking regular samples of you dirt to ship off to a lab, or would you put accelerometers on your sheep for $50 a pop?

The answer seems clear — and it outlines the massive (and overlooked) value of proxy data. Let’s jump into it.

The term ‘proxy data’ was originally coined by paleoclimatologists — an unfortunate area to be in if you’re looking to directly measure your focus of study.
Since a paleoclimatologist can’t measure the quality of climate at any other point aside from right now, they rely on other measurements to infer the climate of past ages.
You probably know one of them already — tree rings. The growth of a tree’s rings is influenced largely by the climate during it’s growth, so if you can measure different metrics of a fallen tree’s rings, you can estimate the quality of climate from sapling through to today.

This concept can be applied across almost every industry — and you don’t need tree rings to make it happen. It’s all about thinking in the second-order, and having a deep understanding of correlation and causation.

Let’s take the farmers situation as an example — one I’ve had the fortune to work on myself. Big gratitude to Peter, Sharl, and Luke from Abacus Bio for letting me present this work as a case in this article — the work I did with them is what spurred the concept of ‘proxy data’ in the first place.

The problem was this:

If a farmer wanted to track a flock’s location, measure sheep health, estimate grass quality, and judge when lambs may be in heat, it took several different solutions cobbled together to get the insights desired, at high cost, and through a large amount of effort.

we developed a single product that can infer everything stated above and more.

With accelerometer and temperature data, we can calculate the relative distance between any given lamb and ewe, track whether a sheep is straying from the flock (correlated with poor health), rate of grazing (correlated with nutrient density of your grass), and likelihood a given sheep may be in heat.

So with only two forms of measurement, we can get clarity on a massive range of insights, for a fraction of the price.

The value proposition is clear: If you can benefit from data that is either prohibitively expensive or outright impossible to measure directly, explore how you might be able to capture insight through proxy data.

A complimentary perspective to take on this matter is to ask the following question:

Would you rather capture data that’s 60% accurate that can lead to $20 conversions, or data that’s 100% accurate, but will lead to $10 conversions?

The answer, once you’ve done the math, is simple. Say you take 200 actions based on these insights. with perfectly accurate data, you’ll end with $2,000. Not bad. With almost in-actionably bad data, you’ll end up with approximately $2,400. (side point: If any data you’re collecting is 60% accurate… let’s fix that.)

Anything measured through proxy has the potential to be less reliable — that’s just what happens when we trust correlations. The risk of spurious associations and confounding variables will always be present, unless what you’re measuring has a causal link to what you’re inferring.

So the question you need to answer is this: How valuable is what you’re seeking to measure — and how reliable do you need your data to be?

In some circumstances, the latter question is essentially binary. When diagnosing cancers, you can’t afford to be wrong. A false positive can send a family into financial turmoil without need or reason, and a false negative can cost someone their life. In others, the risk is much lower.
There’s a real technique in balancing the risk with the reward when weighing the accuracy of your data against the value of the insight.

If you have something you’d like to measure through proxy but aren’t sure if you can, or if you have a theory you want to test out — get in touch. I’d love to explore that with you.

--

--

Finn Shewell
CodeX

👨‍👩‍👦‍👦 I help people work together