Image by Author

COVID-19 Analysis With Python

Create a complete COVID-19 report using Python and its powerful data science packages.

Nikhil Adithyan
CodeX
Published in
7 min readAug 31, 2020

--

Introduction

Python is a highly powerful general-purpose programming language that can be easily learned and provides data scientists a wide variety of tools and packages. Amid this pandemic period, I decided to analyze this novel coronavirus.

In this article, I am going to walk you through the steps I undertook for this analysis with visuals and code snippets.

Steps involved in Data Analysis:

  1. Importing required packages

2. Gathering Data

3. Transforming Data to our needs (Data Wrangling)

4. Exploratory Data Analysis (EDA) and Visualization

Step — 1: Importing required Packages

Importing our required packages is the starting point of all data analysis programming in python. As I’ve said, python provides a wide variety of packages for data scientists and in this analysis, I used python’s most popular data science packages Pandas and NumPy for Data Wrangling and EDA. When coming to Data Visualization, I used python’s interactive packages Plotly and Matplotlib. It’s very simple to import packages in python code:

This is the code for importing our primary packages to perform data analysis but still, we need to add some more packages to our code which we will see in step-2. Yay! We successfully finished our first step.

Step — 2: Gathering Data

For a clean and perfect data analysis, the foremost important element is collecting quality Data. For this analysis, I’ve collected many data from various sources for better accuracy.

Our primary dataset is extracted from Esri (a website that provides updated data on coronavirus) using a query URL (click here to view the website). Follow the code snippets to extract the data from Esri:

Requests is a python package used to extract data from a given JSON file. In this code, I used requests to extract data from the given query URL by Esri. We are now ready to do some Data Wrangling! (Note: We will be importing many data in step-4 of our analysis)

Step — 3: Data Wrangling

Data Wrangling is a process where we will transform and clean our data to our needs. We can’t analyze with our raw extracted data. So, we have to transform the data to proceed with our analysis. Here’s the code for our Data Wrangling:

Note that, we have imported a new python package, ‘DateTime’, which helps us to work with dates and times in a dataset. Now, get ready to see the big picture of our analysis -’ EDA and Data Visualization’.

Step — 4: Exploratory Data Analysis and Data Visualization

This process is quite long as it is the heart and soul of data analysis. So, I’ve divided this process into three steps:

a. Ranking countries and provinces (based on COVID-19 aspects)

b. Time Series on COVID-19 Cases

c. Classification and Distribution of cases

Ranking countries and provinces

From our previously extracted data, we are going to rank countries and provinces based on confirmed, deaths, recovered, and active cases by doing some EDA and Visualization. Follow the code snippets for the upcoming visuals (Note: Every visualization are interactive and you can hover them to see their data points)

Part 1 — Ranking Most affected countries

i) Top 10 Confirmed Cases Countries:

The following code will produce a plot ranking the top 10 countries based on confirmed cases.

Graph by Author

ii) Top 10 Death Cases Countries:

The following code will produce a plot ranking the top 10 countries based on death cases.

Graph by Author

iii) Top 10 Recovered Cases Countries:

The following code will produce a plot ranking the top 10 countries based on recovered cases.

Graph by Author

iv) Top 10 Active Cases Countries:

The following code will produce a plot ranking the top 10 countries based on recovered cases.

Graph by Author

Part 2 — Ranking most affected States in largely affected Countries:

EDA for ranking states in largely affected Countries:

We are extracting States’ data from the USA, Brazil, India, and Russia respectively because these are the countries that are most affected by COVID-19. Now, let’s visualize it!

Visualization of Most affected states in largely affected Countries:

i) Most affected States in the USA:

The following code will produce a plot ranking the top 5 most affected states in the United States of America.

Graph by Author

ii) Most affected States in Brazil:

The following code will produce a plot ranking the top 5 most affected states in Brazil.

Graph by Author

iii) Most affected States in India:

The following code will produce a plot ranking the top 5 most affected states in India.

Graph by Author

iv) Most affected States in Russia:

The following code will produce a plot ranking the top 5 most affected states in Russia.

Graph by Author

Time Series on COVID-19 Cases

To perform time series analysis on COVID-19 cases we need a new dataset. https://covid19.who.int/ Follow this link and images shown below for downloading our next dataset.

Image by Author

After pressing the link mentioned above, you will land on this page. On the bottom right of the represented map, you can find the download button. From there you can download the dataset and save it to your files. Good work! We fetched our Data! Let’s import the data :

From the above-extracted dataset, we are going to perform two types of time series analysis, ‘COVID-19 cases Worldwide’ and ‘Most affected countries over time’.

i) COVID-19 cases worldwide:

EDA for COVID-19 cases worldwide:

a) Cumulative cases worldwide:

The following code produces a time series chart of cumulative cases worldwide right from the beginning of the outbreak.

Graph by Author

b) Cumulative death cases worldwide:

The following code produces a time series chart of cumulative death cases worldwide right from the beginning of the outbreak.

Graph by Author

c) Daily new cases worldwide:

The following code produces a time series chart of daily new cases worldwide right from the beginning of the outbreak.

Graph by Author

d) Daily death cases worldwide:

The following code produces a time series chart of daily death cases worldwide right from the beginning of the outbreak.

Graph by Author

ii) Most affected countries over time:

EDA for Most affected countries over time:

Note that, we have extracted data of countries USA, Brazil, India, Russia, and Peru respectively as they are highly affected by COVID-19 in the world.

a) Most affected Countries’ Cumulative cases over time

The following code will produce a time series chart of the most affected countries’ cumulative cases right from the beginning of the outbreak.

Graph by Author

b) Most affected Countries’ cumulative death cases over time:

The following code will produce a time series chart of the most affected countries’ cumulative death cases right from the beginning of the outbreak.

Graph by Author

c) Most affected Countries’ daily new cases over time:

The following code will produce a time series chart of the most affected countries’ daily new cases right from the beginning of the outbreak.

Graph by Author

d) Most affected Countries’ daily death cases:

The following code will produce a time series chart of the most affected countries’ daily death cases right from the beginning of the outbreak.

Graph by Author

Case Classification and Distribution

Here we are going to analyze how COVID-19 cases are distributed. For this, we need a new dataset. https://www.kaggle.com/imdevskp/corona-virus-report Follow this link for our new dataset.

i) WHO Region-Wise Case Distribution:

For this analysis, we are going to use the ‘country_wise_latest.csv’ dataset which will come along with the downloaded Kaggle dataset. The following code produces a pie chart representing case distribution among WHO Region classification.

Graph by Author

ii) Most affected Countries’ case distribution:

For this analysis, we are going to use the same ‘country_wise_latest.csv’ dataset which we imported for the previous analysis.

EDA for Most affected countries’ case distribution:

The following code will produce a pie chart representing the case classification of the Most affected Countries.

Graph by Author

iii) Most affected continents’ Negative case vs Positive case percentage composition:

For this analysis, we need a new dataset. https://ourworldindata.org/coronavirus-source-data Follow this link to get our next dataset.

EDA for Negative case vs Positive case percentage composition :

The following code will produce a pie chart illustrating the percentage composition of Negative cases and Positive cases in most affected Continents.

Graph by Author

Conclusion

Hurrah! We completed creating our own COVID-19 report with Python. If you forgot to follow any above-mentioned steps I have provided the full code for this analysis below. Apart from our analysis, there is much more you can do with Python and its powerful packages. So don’t stop exploring and create your own reports and dashboards.

You can find many useful resources on the internet based on data science in python, for example, edX, Coursera, Udemy, and so on but, never stop learning. Hope you find this article useful and knowledgeable.

Happy Analyzing!

Full Code:

--

--

Nikhil Adithyan
CodeX

Founder @BacktestZone (https://www.backtestzone.com/), a no-code backtesting platform | Top Writer | Connect with me on LinkedIn: https://bit.ly/3yNuwCJ