Image for post
Image for post
Homer Simpson Paradox

Simpson’s Paradox — How statistics can be misleading

Super Albert
Aug 9, 2018 · 3 min read

Homer Simpson is not the author of this paradox but he suffers from it, thinking he sees four beers when in reality he only he is being misled by his simple data analsysis.

If you’re starting to dive into Data Science probably sooner than later you will have to dip your toes in some Statistics. According to Joel Grus, on Data Science from Scratch, “Statistics refers to the mathematics and techniques with which we understand data”. We use Statistics to persuade others to believe in what we’re telling. Most people, organisations or goverments base their most important decisions on this organised data.Nevertheless, very often when looking at your data if you make the mistake of not looking close enough you might be misled by it.

Simpson’s Paradox occurs when groups of data show on a particular trend, however this trend is reversed when the groups are combined together.

Let’s learn by looking at an example!

# Choosing between two hospitals

Imagine you have an elderly relative which needs to go to the hospital for an emergency surgery and you have to choose between two facilities(Hospital A or B). According to the latest data available, for the last 1000 patients going through surgery at Hospital A 900 survived whereas only 800 survived at Hospital B.

Image for post
Image for post
Last 1000 surgery episodes survival rate comparision between two hospital

So the first assumption would be to choose Hospital A over Hospital B, right? Wrong…

For example, we have to assume that not all patients arrived at the hospital with the same level of health. Therefore, we can start by dividing our results into patients that arrived in good and poor health, and we start to see a difference in the results.

In the case of Hospital A, for 100 patients that arrived in poor health 30 survived. On the other hand, Hospital B had 400 patients arriving in poor health and from that total, 210 survived. Thus, we can conclude that Hospital B, with a survival rate of 52.5%, is the best choice for patients that arrive in poor health.

Image for post
Image for post
Surgery episodes survival rate comparision between two hospital, for patients arriving in poor health

Ok, that’s if the patient is in poor health… what about if the patient is in good health?

Curiously, Hospital B continues to be the best choice with a survival rate of 98%! So, now comes the one million dollar question: how can Hospital A have an overall better survival rate if Hospital B has better survival rate for patients in each of the two groups!? Congratulations, you’ve now understood Simpson’s Paradox! The same set of data appeared to show opposite trends depending on how is grouped. This occurs when data hides a conditional variable, which is a factor that significantly influences the results. Here the condition variable is patient’s health when arriving at the hospital.

So, in the case of Homer Simpson possibly his conditional variable was the fact that he was already under the influence of alcohol and therefore what he was seeing was not actually true.

This text was based on Mark Liddel Youtube video, visit to see more awesome examples.

Don’t forget, if you like it, please give it an applause!

Image for post
Image for post
You the mighty reader applauding!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store