The impact of COVID-19 — data visualization using Plotly and comparative analysis with SARS
Derive interesting data insights on covid-19 and visualize the outbreak of coronavirus compared to SARS
Introduction:
With the rapid spread in the novel corona-virus across countries, the World Health Organisation (WHO)and several countries have published latest results on the impact of COVID-19 over the past few months.
I have been going through many sources and articles to understand the fatality trend and I was excited to come across this data source and decided to see some visualization on the same. The aim here is to understand how visualization helps to derive informative insights from data sources.
For the visualization part, I am using Plotly. Plotly is a visualization tool available in python, and R which supports a number of interactive, high-quality graphs and is a great tool for data science beginners.
Data-set:
The data-set sources are accumulated, processed and latest updates are made available by “Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)” in their github page
Terms of use: As stated by JHU CSSE in link. [Data to be used only for research purposes].
Details on the data set is as follows:
- Daily reports data
This csv file contains information on the affected countries [in blue] which helps to identify the virus spread, information on infected cases, number of deaths and recoveries across countries. The country co-ordinates are also provided for analysis.
2. Time-series data
A time series data which contains the counts on infected cases, deaths and recoveries across countries is also given. The time series data has individual files for each case and needs to be processed before visualization. The country co-ordinates are also provided for time series visualization on geo plots such as Choropleth Maps.
Codes: All the codes for producing the following charts and the data set used can be found in the link provided below.
Important notes:
Data as of 3rd March has been used for the below analysis. Please refrain from using the data or insights derived from the analysis for medical guidance or use of the same in commerce. It is solely for learning purpose.
The same code template can be leveraged for various other data sources. I would encourage readers to try other charts as well in Plotly and to customize the codes according to application requirements.
Another important aspect of showing your key findings is to use only a set of charts that infer key insight from the data rather than showing too much charts with redundant information.
Analysis:
- The global impact of COVID-19
2. Descriptive analysis on infected, mortality and recovery rates
3. Timeline analysis on spread of COVID-19 between [Jan — Mar]
4. Impact of COVID-19 over SARS: a comparative analysis
The global impact of COVID-19
To understand the impact of the virus on a geographical landscape, I used the geographical scatter plot from Plotly. The code for this interactive plot is found in the shared link which will give a more clear interactive visualization.
Observation:
- From the chart, we can see the disease has infected a larger number of people in China where the virus was first discovered
- Even though the infected region is large [in blue], we can observe that the number of deaths are considerably low and we can also see there has been a lot of recovered patients to this date
Descriptive analysis on infected, mortality and recovery rates
Here I have used various charts to show how information can be mined from data sources.
Note: Since China has a higher rate of infected cases which is greater than 85% compared to other countries, it would be good to exclude China and see the numbers on other countries. This will be applied to all the following charts.
- Confirmed cases: Analysis using pie chart
Mainland china has the highest number of affected cases compared to other regions. Next to China, South Korea, Italy and Iran show high number of infected patients.
2. Reported number of deaths: Analysis using bar chart
From the graphs we can see even though South Korea has the highest number of confirmed cases compared to Italy and Iran, their mortality rate is far below in comparison. Now an analysis of recovery across province/state is shown below,
3. Recovery rates: Analysis using tree maps
The recovery rate across countries gives a wider scope of how countries are mitigating the outbreak. Here, the different levels of hierarchy comes from the data set in order, “world”,”Country” followed by ”Province/Region” as can be seen for Mainland China (“world”,”Mainland China”, [“Hubei”, “Henan”,”Anhui”,,etc…]).
An interesting point here is, Iran and Italy have higher number of recoveries compared to South Korea. From here, a deep dive into impact of other attributes (age, ethnicity and region) for recovery/death could give more clarity on these numbers.
Timeline analysis on spread of COVID-19 between [Jan — Mar]
It is crucial to show how rapidly the virus has spread in different countries in a short span of time. Time line analysis would require some pre-processing on the original data to be visualized in Plotly which is provided in the notebook.
Here I have only shown the trend for infected cases. A similar trend can also be observed for deaths and recoveries across countries. A consolidated view is given using line chart below.
Timeline analysis across countries: Scatter plot
Observation:
While a steady growth in the virus spread can be observed in China, a rapid increase in number of cases within the past few days can be seen in some other countries.
Timeline analysis: Multiple line chart
Overall affected cases, deaths and recoveries in the world.
Impact of COVID-19 vs SARS: a comparative analysis
The most intriguing part is to compare the impact of COVID-19 to Severe Acute Respiratory Syndrome (SARS) which is similar in nature. The dataset for SARS was obtained from kaggle data source. The data only contained overall infected and mortality cases and thus visualizing the same here. I have analysed the impact of both viruses in a window frame staring from its discovery to its impact in the next three months.
Timeline analysis for infected cases:
Timeline analysis for reported deaths:
Observation:
Evidently COVID-19 is spreading faster and has a higher mortality rate compared to SARS. The advancements in transport could be a key factor in this case.
Conclusion:
This article gives a detail analysis on how COVID-19 has impacted the world and how the derived insights can be used for downstream analysis. The charts can also be applied to other scenarios to infer key data insights.
Codes: link to codes and data sets
Future work:
- To learn more about other attributes such as patient gender, ethnicity and age and how it causes the fatality rate
- A dashboard of interactive charts to provide an overall summary
References: