Data Visualization: An Epidemic Turned into a Pandemic!

Reia Natu
Analytics Vidhya
Published in
5 min readSep 3, 2020

Background

COVID-19 broke out for the first time in Wuhan, China in December 2019. WHO declared the COVID-19 outbreak to be a pandemic on 11th March, 2020. It was beyond imagination how quickly the virus spread across the globe despite preventive measures like shutdowns and quarantines all around the world.

Data Source

The Johns Hopkins University Center for Systems Science and Engineering created publicly available data consolidated from sources like the WHO, the Centers for Disease Control and Prevention (CDC), and the Ministry of Health from multiple countries- https://github.com/RamiKrispin/coronavirus

Objective

Visualizing COVID-19 data of the initial weeks of the outbreak to see at what point this virus became a global pandemic and insights into the affected countries during that tenure.

Analysis

The table below shows some of the cumulative confirmed cases of COVID-19 worldwide by date.

In order to make better sense of the data, let us visualize its results as seen below.

The plot displays the global rise in total number of confirmed cases rapidly approaching to around 200,000. However, there is an odd jump in mid February, then the rate of new cases slows down for a while, and again speeds up again in March. Let’s dive deeper to see what is happening!

It is known that early on in the outbreak, the cases were centered only in China. Let us now plot confirmed COVID-19 cases in China and the rest of the world separately to make comparisons.

The data used to visualize the comparison is as seen in the table below:

On plotting results for China vs Not China, clearly, the two lines show different trends. Majority of the cases were observed in China in February. This changed in March when the world saw an outbreak, wherein the total number of cases outside China overshot the number of cases in China.

Studies hinted that there were a couple of landmark events that happened during the outbreak. The key one revealed a huge jump in the China line on February 13, 2020. This wasn’t just a bad day regarding the outbreak. It was discovered that China changed the way it reported figures on that day.

Let us now annotate events like these to better interpret changes as can be seen in the following plot.

Next, when trying to assess how big and grave the future problems are going to be, it is important to be able to measure the rapid increase in number of cases. A good starting point is to check if the cases are growing faster or slower than in a linear manner.

Further, there is a clear surge of cases observed around February 13, 2020, with the reporting change in China. However, a couple of days after that, the growth of cases in China appears to have slowed down.

Let us now describe COVID-19’s growth in China after February 15, 2020.

The plot above indicates that the growth rate in China is slower than linear. This means that the virus was somewhat contained in China between late February and early March.

Let’s find out the status of the rest of the world as compared to this linear growth!

From the plot above, a straight line is not a good fit and it shows that the cases in other parts of the world are growing much faster than linearly.

To achieve a better fit, let us log transform the y-axis and see the results.

The logarithmic scale certainly fits the data well. There are two interesting inferences from this:

  1. The upside is that a good fit is great news in terms of data science.
  2. The downside is that from a public health point of view, this concludes that cases of COVID-19 in the rest of the world are growing at an exponential rate; which is terrible news!

COVID-19 does not affect all the countries equally. Hence, looking for significantly affected countries would be helpful. Owing to that, let us now find the countries other than China that suffer with the most confirmed cases.

The top 7 countries other than China having COVID-19 can be see in the table below:

The results show that even though the outbreak was first seen in China, among the many countries in Asia, only South Korea is in the top 7 affected countries with confirmed cases; others being France, Italy, Spain and Germany which share borders.

To get more context, we can plot these countries’ confirmed cases over time as seen in the below:

--

--

Reia Natu
Analytics Vidhya

Data Scientist | 15K+ Data Science Family on Instagram @datasciencebyray | LinkedIn- https://in.linkedin.com/in/reia-natu-59638b31a |