Covid-19, Data and the Misinterpretation
People quote the rise in the total COVID-19 cases rise alone as the indicator situation getting worse or use only the total cases to compare two regions’ situation but is that the right way to do the comparison? In this article, I briefly cover the major variable that people are missing.
It’s good to be correct though I have always believed it’s good to be correct only if your reasoning behind your answer is also right. Sometimes, we do faulty analysis but conclude something correctly. I believe something similar is happening with the Covid-19 situation too. Most of the people are just looking at the total number of cases to date and are providing their comments. I am not saying the total number of cases is insignificant or of no use though just looking at the absolute total cases to date can be misleading too.
This is a quick article just from the data presentation perspective and to provide people different views to look into the COVID-19 situation with respect to the data. I want to provide a disclaimer right away
“this article is not at all an analysis of the COVID-19 situation or any statement on the current or future state of the same but to show how to present data in the correct form for relevant comparisons or analysis”.
To showcase the same, I ran a few analyses on the Covid-19 cases in India. Below are the details of the same.
I have gathered data day wise from 9th April 2020 till 26th June 2020 from https://www.covid19india.org/ API for national-level data on COVID-19. The provided data was in JSON format, I used Python to extract relevant fields from JSON and convert in excel format for my analysis. This is not the main focus of the article, and hence I will not spend time explaining how I did it.
The most common variable that people are talking about is the total number of confirmed COVID-19 cases to date, and hence I chose to plot a day by day line chart for the same from 9th April 2020 till 26th June 2020. And if we look at the outcome, we can understand why everyone is freaking out, the graph shows exponential growth.
Then a few people talk about the daily new cases of COVID-19. Actually, we can get an idea about the same from the above graph only, though, to make it simple, I have plotted that too separately. As we can see, this is also getting steeper.
So it sounds right to conclude COVID-19 cases are rising exponentially and are getting out of control? Definitely, the numbers are increasing, and are going towards exponential growth but until now we haven’t analyzed one significant variable that is the number of tests being conducted to date and daily. Isn’t it important to consider the COVID-19 cases with respect to the tests being conducted? Yes, definitely it is.
It is said, let the data tell you the story don’t force or manipulate it to hear what you want to hear. That’s completely right but that is applicable once we know what data we are looking for, and to know that we need to have some idea in mind that we want to check or verify with the data or to be more precise a hypothesis. In my case, I didn’t have a hypothesis but I was interested in looking into the trend of the cases per test being conducted. Hence, I plotted the two graphs one for Total cases to date per total tests to date for each day from 9th April to 26th June, and another one is the daily cases per daily tests.
I know and understand that the daily COVID-19 confirmed cases are not from the same day test counts. My idea is not to present daily accuracy reports or make any comment on what happened that day which caused the rise in the cases but my idea is only to convey that the variable “total cases per total test” is an important indicator if not the main indicator to look for.
Because total cases will keep rising for a while and also the daily number of the new cases will rise too but the early indicator for us will be the new cases per new tests ratio
– once this starts declining that will show our situation is improving even if the total cases or daily cases keep rising for some more time. I mean improvement only in the perspective of COVID-19 spread rate — NOT the country social and economic situation.
I know many people will say this is also saying the same thing the number of cases is rising but I never denied that the whole point of the article was to use the right variable to make the statement or do the comparison.
We can’t compare two regions’ total cases or daily cases if their test numbers are not of similar scale.
Also, it will be wrong to compare ‘only’ the total cases and population of two regions to compare the COVID-19 situation in both regions — the percentage of the population tested and the number of tests need to be included in the comparison study too.
I hope via this article, I was able to at least convince and convey people that just the number of cases — total or daily, is not important but we need to look at those with respect to the number of tests we are conducting and also if we choose to compare the two regions’ situation we need to take in account the percentage of the population tested too. This applies to all the countries and the world as a whole.
Disclaimer: No metrics are perfect and neither is the above one but this is just an attempt to show the importance of the number of tests and also to make people understand the number of cases will rise as we test more. And keep an eye on the cases per test ratio for better understanding.
 Raw JSON data fetched from https://www.covid19india.org/ API