Simpson’s Paradox — An Issue of Mix

Explanation & Implications

Decision-First AI

--

This article will not be the first to use The Simpson’s to teach readers about the Yule-Simpson Effect or Simpson’s Paradox. Oddly, it may be the first article to include a mixing board. Homer really has little in common with our paradox, other than his name. The mixing board, far more so — but let’s come back to that…

What is the Yule-Simpson Effect?

Simply stated — it is a situation where a segmented comparison of two data sets shows a trend that runs counter to that created by their aggregates. That sounds very academic, but what does that really mean?

Suppose we have two groups of workers. Each group is composed of doctors and janitors. The doctors in Group A make more money than the doctors in Group B AND the janitors in Group A also make more money than the janitors in Group B. Which group has the higher income?

If the “mix” of doctors and janitors is similar in each group — the answer is likely A. But if A has far more janitors compared to doctors — that will likely reverse. The latter situation creates Simpson’s Paradox. In common parlance, you have a “mix issue”.

Averages Are Only Meaningful For Homogeneous Groups

I have railed repeatedly about “the myth of the average”. I have even noted that to verify averages — one should ask to see them segmented or distributed. But doesn’t Simpson’s Paradox refute that? No, it only adds more context.

Averages work well for highly homogeneous groups. In a laboratory, where I can design control for the composition of two groups — averages are a simple and effective tool. The real world is another thing entirely. In the real world, I need to consider “mix issues”.

Even cartoons change over time and production budgets.

In the real world, things tend to change over time. Populations are especially sensitive to change. People age, change jobs, adopt new interests, and even behaviors.

This is doubly true if the population is open to new membership. New members can change a population much faster. They change the “mix”. New customers, new channels, new products, new experiences can create out-sized impacts — because they change the composition of the group.

Let’s Dig Into A Real World Example

For analysts, Simpson’s Paradox can be a gift (or a nightmare). Supposed the average time to close a new sale has been increasing. This is not a good thing at most companies. It is also a trend that almost everyone will be aware of — assuming your company has the most rudimentary reporting. The question is how? (actually — you will be asked why… but let’s continue)

If you dig into the time series and realize that among various segments that the sales time is actually dropping — you are witnessing the Yule Simpson Effect. Not perfectly — but practically. Real world analysis rarely fits so tightly in a box! You also know why the sales cycle is lengthening! You have a mix issue.

In this example, you can go back to the sales team and exec staff and report that the composition of your lead population is changing. This in turn is causing the “perceived” lag in sales. While this is only a portion of some real actionable intelligence — it goes over much better than averages are rising everywhere. People understand mix issues — but only great reporting or thoughtful analysis makes them visible.

D’OH

Just to close the tree… there were two branches here.

If you find no solid evidence of Yule Simpson, something else has changed. There is a really good chance the data set you are using does not have your answer.

You are going to have to dig into your sales pipeline in an operational and forensic style. Aren’t you lucky!

Even if you manage to find the answer — odds are someone is going to be blamed for breaking, changing, or otherwise delaying a process. People don’t like that…

Who shot Mr Burns?

Wrapping things up. Simpson’s Paradox indicates a mix issue between two groups. While the aggregate of one group is higher than the other, the underlying sub-segments tell a different story. Great analysis will dig into the distributions of those group across and array of dimensions rather than focusing on averages.

Perhaps the Simpsons have one final role to play? While the world is full of Homers who are unaware of this effect and its implications, there are also plenty of Maggies. They may look innocent but they leverage this effect to make the data tell the story they want to tell. Be careful! And thanks for reading!

--

--

Decision-First AI

FKA Corsair's Publishing - Articles that engage, educate, and entertain through analogies, analytics, and … occasionally, pirates!