The Curve (aka Know Where Your Data Has Come From)

Sophie Shawdon
ClearScore
Published in
3 min readApr 8, 2020

This is part of the Class of COVID-19 series. To read more, click here.

If one image has dominated the recent news coverage, it’s this: number of confirmed coronavirus cases, by country.

Graph showing the number of confirmed COVID-19 cases by country
Financial Times: Confirmed Cases by Country

It’s everywhere. And with good reason: it’s a marker of how we’re doing, how everyone else is doing, a glimpse into the future. “Are we flattening the curve?” scream the headlines. Are we going to be a South Korea, or an Italy, or even (whisper it) a United States?

But how truthful is it to the story it’s trying to tell? This graph measures the number of confirmed cases: not the number of actual cases. The accuracy of the data relies heavily on the accuracy of both the testing and reporting, which varies hugely from country to country. It’s an important lesson in understanding where your data has come from, and how it has been put together.

How is the data being collected?

The United States has been heavily criticised for its approach to testing. Having originally rejected test kits from the WHO (instead relying on tests developed by its own medical agency, the CDC, which would later turn out to have significant flaws), it has been slow to roll out testing. Days ago, Trump proudly announced of their testing programme:

“We’re now conducting… more than any other country in the world, both in terms of the raw number and also on a per capita basis, the most.”

Mike Pence, however, stated that approximately 1.2m Americans had been tested: roughly 1 in 273 of the population. Furthermore, many Americans were unable to get tested, despite showing symptoms of the disease.

By contrast, Australia and South Korea have implemented widespread testing. Australia claims to have the highest test rate per capita, at 1 in 100 — including of those who are at risk of the disease but not symptomatic.

A doctor in protective gear swabs a patient

With an estimated 50% of cases being asymptomatic, it becomes impossible to accurately measure the true number of coronavirus cases without widespread testing. Furthermore, a jump or fall in the number of confirmed cases on a given day may also be skewed by a sudden change in the availability of testing kits; issues with certain tests; or even simply a change in policy of who is tested.

Is the data being reported properly?

The data is also subject to countries and regions reporting accurately. For instance, Indonesia — the world’s fourth most populous country, and one which has struggled to persuade its citizens to follow lockdown advice — has reported a little over 2,500 cases. It only confirmed its first few cases in March. Modelling by external sources, however, suggests that the true figure could be up to 100x higher, with both poor testing and an unwillingness to admit just how out of hand things are getting both likely to be playing their part.

Crowds of people wearing orange face masks attend prayers at a Mosque
Friday prayers at a mosque in Surabaya, Indonesia, on March 20

Know where your data has come from

With any data you come across, it is important to understand where it has come from, how it is calculated, and who is doing the reporting. If data has been taken from surveys, understand what questions were asked; if it has been collected by other means, think about any barriers to gathering all the data, or anything that may have skewed it. Look into who is doing the reporting too — do they have an agenda behind showing the results in a certain light, or being selective with what they show?

With data, as with coronavirus — tracking your sources is key.

This is part of the Class of COVID-19 series. To read more, click here.
For other data distractions, visit
@thecolourofdata.

--

--

Sophie Shawdon
ClearScore

Mathematics and linguistics geek. Ice cream-fuelled ultrarunner. Analytics Lead @ ClearScore