COVID-19 and Data Visualization

Greta Faccio, PhD
3 min readMar 21, 2020

--

How to visualise and understand the pandemic — graphs by Dr. AZM

Caught up by the new rules limiting our freedom of movement and shedding a new light on the value of having an efficient healthcare system, it is easy to lose the overview of what is a global viral spread.

What do we know about COVID-19?

Coronavirus COVID-19 emerged in late 2019 in Wuhan, a city of more than 6 million inhabitants in Eastern China. Italy is currently the focal point of the pandemic with more than 1K new cases each day and more than 2K deceased, as writing.

Coronaviruses are particles with a spherical shape containing one filament of (single-stranded positive-sense) RNA. The viral envelope is decorated with sugar chains that confer a visible halo, or corona, around it when visualised under the electron microscope. One image of representative coronavirus particles is available here.

This new strain of coronavirus has been spotted as the cause behind a new series of pneumonia cases recorded in China in late 2019. Coronaviruses are known to infect mammals and birds, and they can acquire the genetic material to cross species and infect humans. In this case, pangolins have been suggested as the animal carrying the COVID-19 to humans but the finding is not definitive.

What are scientists doing?

Scientists have been struggling to gather all the relevant data and thus provide a model able to predict the spreading of the COVID-19 coronavirus.

Scientific magazines and portals have made the research freshly produced, freely available to the public and can be accessed also through a specialised website.

Using the that is made data freely available on Wikipedia, we tried to give an overview of the situation in three graphs.

The graph shows the reported death rate as a percentage calculated from the reported cases and deaths. Only the countries wit
The graph shows the reported death rate (or fatality rate) as a percentage calculated from the ratio between reported deaths and the reported positives. Only the countries with at least two deaths are included in the statistics. The error bar illustrates the uncertainty that the value will stay in future updates of the statistics. A higher error bar means that the country has reported a low number of cases, deaths or both and any future change of the figures might change significantly the shown values

It is uncertain what the amount of unreported cases is, even for countries with the lowest reported death rate, like South Korea or Germany. Basic epidemiological considerations tell that the reported deaths illustrate the state of the pandemic at the moment of the contagion, which for COVID-19 is around two weeks before the death. Back then, the number of detected cases was significantly lower (the number of cases can duplicate every around 6 days). On the other hand, the fraction of hidden cases (unreported or people with mild symptoms that go undetected) can be as high as 10–100% of the reported cases. Since both effects are large but they drive the death rate in opposite direction, it is difficult to decide which can be the real figure for the death rate. However, it looks like the number of cases that go undetected or unreported may dominate the final value. If this assumption is correct, the reported death rates are high estimates of the real death rate.

Figure 2: MissingCases.png
 The graph shows an estimation of the missing cases as a fraction of the reported cases. The calcu
The graph shows an estimation of the missing cases as a fraction of the reported cases. The calculation is based on the death rate calculated from the reported cases. The underlying assumption to calculate this quantity is that the death rate is similar in all countries. Thus, the reason for reporting higher death rates is linked to a fraction of cases not being detected. The base death rate has been assumed to be that of South Korea, which has one of the lowest death rates, and, following the previously given assumption, a lower amount of unreported cases. It is also the country that has performed more tests per inhabitant, what gives confidence on its figures. Some countries have lower death rate and they appear as negative fractions. However, they have higher statistical uncertainty, so it is uncertain yet whether the calculated death rate will stay in the future. Negative values have not a clear meaning but they have been kept to illustrate the benefit of having low death rate. For countries with high reported death rate and high number of deaths, like Italy, China or Iran, the high stress posed on the health system must be considered but cannot be quantified. It is noteworthy to highlight that Italy has one of the best healthcare system at world level.

What is the future holding

The graph shows the reported death rate versus the fraction of patients that recovered or died. This last quantity is a measu
The graph shows the reported death rate versus the fraction of patients that recovered or died. This last quantity is a measure of how close is the country to the end of the epidemic. Countries like China, the origin of the pandemic, have a high fraction of patients recovered and the number of reported new cases is already low. In countries like Germany, the amount of reported cases is still growing fast and the reported death rate can change significantly in the future, depending on the evolution of the already sick people. The 20 countries marked with red markers have the highest amount of reported deaths as of March 21th 2020. The size of the marker is proportional to the logarithm of the number of reported cases.

--

--