Homer Simpson is not the author of this paradox but he suffers from it, thinking he sees four beers when in reality he only he is being misled by his simple data analsysis.
If you’re starting to dive into Data Science probably sooner than later you will have to dip your toes in some Statistics. According to , on Data Science from Scratch, “Statistics refers to the mathematics and techniques with which we understand data”. We use Statistics to persuade others to believe in what we’re telling. Most people, organisations or goverments base their most important decisions on this organised data.Nevertheless, very often when looking at your data if you make the mistake of not looking close enough you might be misled by it.
Simpson’s Paradox occurs when groups of data show on a particular trend, however this trend is reversed when the groups are combined together.
Let’s learn by looking at an example!
# Choosing between two hospitals
Imagine you have an elderly relative which needs to go to the hospital for an emergency surgery and you have to choose between two facilities(Hospital A or B). According to the latest data available, for the last 1000 patients going through surgery at Hospital A 900 survived whereas only 800 survived at Hospital B.