On our flight home from a hot summer holiday in Sardinia, my daughter’s attention is caught by a visualisation in TIME magazine in an interesting article titled What are my risk factors?
The colourful visualisation displays the top five causes of death in 2013 for nine age groups in the US.
The visualisation is structured in two parts. On top of the spread, a bump chart conveys the top five causes of deaths in each age group through colored bands of varying thicknesses. The bands represents percentages of total deaths for that cause. For instance, in the first age group, Children under 3, the diagram does not tell us how many deaths there are in total, but it tells us that, out of those who die, 20% die because of birth defects, 18% because of premature births, 7% because of pregnancy complications, and so on.
Further down the page, a diagram represents the total number of deaths in each age group using black circles. For instance, in the first age group, Children under the age of 3, there is a circle representing 23,440 total deaths.
Scared of becoming shark-food during the entire holiday while swimming in crystalline water, my daughter Matty dives into her own age group’s statistics and gets very surprised by the causes of death she is most exposed to.
In her age group, 10 to 14 years old, the top five causes of death are:
- Accidents 27%
- Cancer 15%
- Suicide 13%
- Birth defects 6%
- Homicide 5%
Her eyes do not notice the little black circle placed below the colorful display. That black circle would tell her that the total number of deaths in her age group is 2,913, definitely not the biggest number of deaths if compared to other age groups. For example, check the age group 45–54 where the number of deaths are 177,724.
The visual emphasis on the colorful ‘rivers’ of the bump chart that fills up the entire spread makes it looks as if loads of people are dying in every age group all the time. Matty looks a bit upset, and she is not the only one. I’m also struggling trying to make sense of the numbers.
As a dad and as an information designer I felt the need, not to say the duty, to explain to her why the colorful percentages look so overwhelming.
That was easier said than done.
‘’Matty, the colorful ‘bands’ do not represent the actual amount of people dying but the proportion of different causes of death in each age group. The total number of people dying in each age group can differ a lot.’’
‘’The thing is, different amounts of people dies at different ages.’’
My attempt does not help her understand the big picture or the probability of death.
Percentages are like that, tricky to understand. Not just for my daughter, but also for scientists, journalists and, of course, information designers.
Prof Gerd Gigerenzer is a leading expert in risk perception and he tries to explain why we struggle to understand probability and risk in percentages.
In an entertaining TEDx talk he tells the funny story of a TV newscaster that once announced the weather like this: ‘The probability that it will rain on Saturday is 50%. The probability that it will rain on Sunday is also 50%. Therefore, the presenter concluded, chances that it will rain on the weekend are 100%.’
The audience laughs. You too, probably!
Prof Gigerenzer continues: ‘Most of us smile at that, but do you know what it means if the weather report announces a 30% chances of rain tomorrow? 30% of what?’
Prof Gigerenzer who lives in Berlin shares that this is what Berliners think about the 30% chance of rain tomorrow:
- Some think it means that it will rain 30% of the day tomorrow (about 7 hours)
- Some think it means it will rain in 30% of the region tomorrow
- Some think it means that 3 of 10 meteorologists thinks it will rain tomorrow
All of the above are wrong, but they show that many of us struggle to evaluate probability in percentages.
Getting soaked is a minor problem, but think when this confusion is related to medical information given to you in order to make a decision on your health or on the health of people you love. Suddenly it is a much more serious issue.
100% increased probability — what does that mean?
In the UK, the media informed the public about the “doubled” probability of thrombosis women are exposed to when they switch to the new, third generation birth control pill. The claim in the media was that there is a 100% increased probability to get a blood clot if women switch from second- to the third-generation pill.
This sounds really bad. Many women thought so too and stopped taking the pill all together. The result was a dramatic increase in abortion and unwanted teen pregnancies.
If the media had used absolute numbers instead of percentages, it would have been a lot easier to understand the risk.
In fact the absolute risk of going from second to third generation birth control pill looks like this:
- Second generation: 1 in 7,000 had a thrombosis
- Third generation: 2 in 7,000 had a thrombosis
People, in general have a hard time understanding relative probability (percentages).
Here is a recent case from 2015, where media communicating relative risk in an unnecessary sensational way: Newer contraceptive pills raise risk of blood clot four fold (The Telegraph).
Many journalists might think this (the link above) is within the boundaries of good journalism as it follows a well-known template of structuring a story.
So what do people die of in Norway?
My daughter’s fear of getting eaten by a shark while swimming in Sardinia’s water was very real to her, but the probability of that actually happening was incredibly low.
Back at work after our holiday trip, I couldn’t to get the TIME visualisation out of my head.
I fully understand why the visualisation looks the way it does, but the link between the relative values (colorful bands) and the absolute values (black circles) is not apparent. This missing link is a fundamental part of the story; visualising that link is both a challenge and a brilliant opportunity in the journey of visualising data in a more understandable way.
This is what the causes of death within different age groups looks like in Norway (2003–2013 Source: The Norwegian Institute of Public Health). The categorisation for the Norwegian numbers is slightly different from the US numbers.
When we show the relative numbers in percentages per age group, we are able to see the pattern of what causes death within each group (bands), regardless of the number of deaths we have in each group. The downside of that image is that it doesn’t convey the absolute risk of dying in each age group.
The image below does that, because it visualises the data-set in absolute values.
In my daughter’s age group (14–16) the overall number of deaths is far lower than in the age group (80–84), for example. In this visualisation it is harder to see how the slabs (causes of deaths) are distributed in her age group (14–16) but the visual story carries more common sense, as it depicts reality more honestly: people (mostly) die when they are old.
Eaten by shark
So, getting to the end of this post. My daughter’s fear of getting eaten by a shark was very real for her. But the probability of that actually happening was disappointingly low. According to the same magazine article, the shark was rated as one of the least dangerous animals around. It only devoured 3 humans in 2014.
In the meantime you can chew on this. How do you understand the forecast “there is a 30% chance of rain tomorrow”?
Visualisations are based on numbers from the Norwegian Institute of Public Health (www.fhi.no). To make the visualisation I have simplified some variable names and merged very small groups to make the visualisation more legible. In the variable ‘Andre’ there are different causes that alone became too small. The variable ‘Voldsomme dødsfall’ is merged with ‘Andre voldsomme dødsfall’. The variable ‘Komplikasjoner under svangerskap og fødsel’ is merged with ‘Komplikasjoner ved svangerskap, fødsel, barsel(o00-O99)’ and ‘Visse tilstander som oppstår i perinatalperioden (P00-P96).
Please visit www.fhi.no for more information on the variables.
A huge thanks to Angela Morelli and Sarah Rosenbaum for helping me out with proofreading and good suggestions :-)