Data Analysis on Covid-19 in The World

From a worldwide, Canadian and Brazilian perspective

Vinicius Porfirio Purgato
Analytics Vidhya
12 min readMar 9, 2021

--

In December 2019, the whole world heard about a new pneumonia outbreak happening in Wuhan, China. A few days later, scientists found out that China was not dealing with pneumonia. Instead, it was a coronavirus, named Sars-Cov-2 and later Covid-19.

Photo by Yohann LIBOT on Unsplash

No one expected it to become a problem like it is nowadays, when I first heard about Covid, at the beginning of January, I was backpacking with my family through Central America, I didn’t think it would become a thing. However, here we are, in a pandemic in which businesses had to close, over 2.57M people have died (while I’m writing this), everyone has to wear masks and anti-vaccine movements grew big and wide.

Unfortunately, the world changed a lot.

Schools are closed, there is a political war on vaccination and due to fake news and non-scientific beliefs, over 40% of people in the USA won’t take the vaccines, claiming there weren’t enough studies. Even though scientists indeed worked hard to ensure the safety and effectiveness of the vaccine. There were also huge investments and studies proving how safe it is. I pay my respect to all the frontline workers and scientists who have been giving up their personal lives in order to save others.

Keep in mind that I did this analysis on March 6th, 2021. Therefore, numbers might have changed, depending on when you read this.

The data is from Our World in Data, owned by the University of Oxford and only uses information from official sources, like the World Health Organization (WHO) and Governments. Thus I’m not responsible for misleading data in the CSV table. Make sure to check out their dataset here.

Photo by Edwin Hooper on Unsplash

Before we start

The analysis is quite long, but it is worth your reading. Also, if you want to check all the coding and plotting procedures, access my Google Colab Notebook.

Importing Data and Modules

We only need three libraries for this analysis, these are:

  • pandas - Manipulate our table.
  • matplotlib - Plotting graphs.
  • seaborn - Styling the plots.

Let's import them:

# Importing our libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Settings for style and appearance
%matplotlib inline
sns.set()
DATA_PATH = 'https://covid.ourworldindata.org/data/owid-covid-data.csv' # Link to our data
df = pd.read_csv(DATA_PATH) # Assigning df to our dataset

After having a look at our data, checking its size and cleaning it, it turns out we have 73384 rows and 59 columns. Quite a big table, eh?

The variable location contained all the countries, however, it also contained some continents. We don’t want that, since there is a variable for them called, so I removed them from location.

Then I also set date as our index, then cast it to a date_time type.

I also checkedshape again. After cleaning, we had 70973 and 59 rows.

Country With Highest Number of New Cases on March 6th, 2021

I want to check which countries had the most death/cases today, March 6th, 2021.

The method .loc[] allows us to find the data within a given parameter.

I will pass 2021–03–06 as the parameter. Then I will sort the values by the number of new_casesin descending order.

By the way, I will jump to the second row, because the first one will be World, which contains the sum of all countries.

df.loc['2021-03-06'].sort_values(ascending=False, by='new_cases')[1:3]

Brazil is top-ranked, followed by the USA, they both had 75 102 and 67 164 new cases on March 6th, respectively.

  • Brazil — 69609
  • USA — 58062

As someone who lives in Brazil, this makes a lot of sense to me. People aren’t following quarantine nor social distancing at all. Many even say the pandemic is over, which is not. Sadly, the government also doesn’t provide useful help to the people. Instead, they’ve been closing businesses and cutting social assistance.

Top 5 Countries With The Highest Number of Total Cases

Now I want to see the top five countries in the number of total_cases.
The way we do this is pretty much the same as the example above, but instead of using new_cases, I will use total_cases.

# the 5 countries with highest cases numberdf.loc['2021-03-06'].sort_values(by='total_cases', ascending=False)[1:6]

Let’s plot a bar graph to see how these numbers compare:

Plot by Author

The USA is far ahead of all the other countries.
It literally has over two times more cases than India, which is the second most populated country in the world.

Top 5 Countries With The Most Deaths

# Countries with highest death ratedf.loc['2021-03-06'].sort_values(by='total_deaths', ascending=False)[1:6]

Some studies have tried to directly link low income to a higher percentage of new cases. However, countries like the USA and U.K made it to the top 5, even though they are by no means poor:

Let’s also plot this:

Plot by Author

Day The World Had More Deaths

On which day did the world have the most deaths?
To answer this question, we will use the same logic as the examples above, however, we will only get the top-ranked row.

# Day with the most cases
df.sort_values(by='new_deaths', ascending=False)[0:1]
By Author

January 20th, 2021 was when the world had the most deaths in a single day by Covid-19.
This is actually surprising to me. I thought it would be around September-November 2020. However, it shows us that people have not been quarantining, all over the world.

  • January 20th, 2021–17 886 deaths

Relation Between Human Development Index and Total Deaths.

Now let’s check which country has the lowest population/death rate. To do this, I will use the variable total_deaths_per_million, which will take into consideration death per million people.
Again, the logic is the same as the other examples, but instead of getting the higher number, we want the lowest, therefore ascending=True.
I will also check the number of total deaths.

# Checking country's HDI
df.loc['2021-03-06'].sort_values(by='total_deaths_per_million', ascending=True)[:1]
by Author

Burundi is the country with the lowest death rate per one million people, these are the statistics of the place:

  • Total Cases (total_cases) - 2 299
  • Total Deaths (total_deaths) - 3
  • Share of the population with handwashing facilities (handwashing_facilities) - 6.144%
  • Population (population) - 11 890 781
  • Median Age (median_age) - 17.5
  • Extreme Poverty (extreme_poverty) - 71.7%
  • Life Expectancy (life_expectancy) - 61.58 years old
  • Human Development Index (human_development_index) - 0.433.

Let’s visualize the progression of Covid-19 in Burundi:

Plot by Author

Total Number of COVID-19 Vaccination Doses Administered

Now let’s see which countries have vaccinated the most people.
I will use the most recent date (2021–03–06) as my parameter, and then I will sort the values in descending order by total_vaccinations.

Plot by Author

Even though the States have vaccinated lots of people, vaccination success depends on what percentage of your population takes it. Therefore, let’s analyze what countries had the most people taking it, by the population rate they have. Let’s use total_vaccination_per_hundred.
Obs: I will use the second row as the top result since the first one will be Gibraltar, which has all of its values empty. So I will just ignore it.

# top five countries by total vaccinations per one hundred peopledf.loc['2021-03-06'].sort_values(by='total_vaccinations_per_hundred'
, ascending=False)[1:6]
Plot by Author

There we go, now that’s a fair comparison.
Of course, Maldives wouldn’t have more people vaccinated than the USA, it has a much smaller population. However, when you analyze per one hundred people, then you see that the Maldives actually had more people vaccinated.

Deaths and Spread of Covid-19

I want to see both the increase of deaths in the world (new_deaths) and the spread of the virus (new_cases).

Plot by Author

Both graphs were increasing massively. However, around January 2021, they started to decrease. That’s the same season when vaccination started in most countries and it proves how vaccines are saving lives.

Scatter Plot of per Capita Gross Domestic Product (GDP) by Total Number of Deaths

Are there any relations between the country’s income and the number of deaths?

It seems like the richest and the poorest countries are the ones with the least deaths. Burundi has a low GDP, but even though it has a population of 11.5M people, it had only three deaths by Covid.

Let’s check which country has the world's median gdp_per_capita.

Now we know the highest, lowest, mean and median values of the world’s per capita income. Which countries represent these values?

First of all, let’s check the median GDP value:

World’s Median GDP per Capita

# Printing country with the median GDPdf[df['gdp_per_capita'] == 12951.839].sort_values(by='total_deaths', ascending=False)[:1]
by Author
Saint Lucia Flag

Saint Lucia is a country in the Caribbean. It has the world’s median GDP per Capita, which is about U$12951.

This value is below the World’s mean, which is around 19 126 US dollars.
With a population of 183 629 people and 43 deaths by Covid, around 0.02% of people who had Covid died

World’s highest GDP per Capita

# Printing the highest gdp in the worlddf[df['gdp_per_capita'] == 116935.6].sort_values(by='total_deaths', ascending=False)[:1]
by Author
Qatar Flag

Qatar is an Asian country and has the world’s leading GDP per capita. Which is around U$116 935.
As of today, Qatar has a population of 227 322 people, around 166 475 had Covid, which’s around 73% of the population.
However, out of 166 475 Covid cases, only 262 people died, which represents around 0.16%.

World’s Lowest GDP per Capita

# Finding which country has the lowest gdp per capitadf[df.gdp_per_capita == 661.24].sort_values(by='total_deaths', ascending=False)[:1]
Central African Republic Flag

The Central African Republic has the world’s lowest GDP per Capita.
With a population of 4 829 764 people, 5018 of them had Covid.
Out of the 5018 cases, 63 died, which is around 1.25%.

Let’s visualize how the gdp_per_capita of these three nations can be compared:

Plot by Author

There it is, the distribution of the countries with the highest, median and lowest gdp_per_capita.

Even after this, we can’t say that the relation between GDP and Covid Deaths is non-existent because it depends on lots of proportion tests, etc. However, it’s still interesting.

Canada

Canada Flag

Canada has many times been considered the best place to live in the world. It is beautiful, diverse. The people are warm and, in general, willing to help. If you have a chance, visit it. Even though I am Brazilian, one of my parents is Canadian, making me really close to the country because of family, but also because I love it. Therefore, I pay attention to its politics, etc. Unfortunately, Canada hasn’t been doing well during the pandemic. Which makes me curious to see how it’s been. For that reason, let’s analyze it.

I will assign all of its corresponding data to df_canada.

# Assigning Canada's data to df_canadadf_canada = df[df.location == 'Canada']

Now let’s plot three graphs for new_cases, new_vaccinations and total_deaths.

by Author

Canada started its vaccination in mid-December, pay attention to how in January-February, both deaths and new cases numbers started to drop as vaccination increases.

Now let’s check total_deaths and total_cases in Canada.
I'll assign today's date (March 6th, 2021) to df_canada_today, that way we only get the most updated result.

Canada vs Brazil in The Pandemic

Photo by @thami__cascales

The majority of nations didn’t do well in the pandemic like New Zealand and Australia did. Canada and Brazil are no exceptions.
However, even though Canada has many cases because lots of people don’t follow social distancing, the Government tries to keep them at home. Canada even gave citizens who lost their jobs due to the pandemic quite a good amount of social assistance.
Brazil’s president, Bolsonaro, on the other hand, is a negationist and claimed not to believe in Covid during many public interviews. He also didn’t really cooperate to help the states buy external vaccines — like Pfizer, Moderna and others — neither offered help to Brazil’s producer — Butantan — simply because they were partnering with Sinovac, a Chinese brand. He also refused to wear masks many times.
Therefore, let’s compare both countries.

I will assign all of Brazil’s data to df_brazil.

# Assigning Brazil's data to df_brazildf_brazil = df[df['location'] == 'Brazil']

Now, let’s plot a line chart comparing both country’s new_cases_per_million.
That way the comparison becomes fair since Canada has 7 million fewer people than São Paulo, the state I live in.

Plot by Author

In the beginning, we can see a few moments in which Canada was worse than Brazil. However, the scenario quickly changes. Around January 2021 Canada had one or two days worse than Brazil, but then, by changing the vaccination plan, the process became more robust, so new cases started dropping.

To make this more visual, I will download and import the library pywaffle, which will provide us with a waffle chart. That's one of my favourite libraries, I highly encourage you to learn it.

# Installing the library
!pip install pywaffle -q
# Importing
from pywaffle import Waffle

I will pass the rounded mean of total_cases_per_million (which means total confirmed cases of COVID-19 per 1,000,000 people) of each country as our parameters.

Plot by Author

As you can see, out of 75 icons, Canada occupied only 19, while Brazil occupied 56. It shows a big difference in how Brazil and Canada handled the pandemic. Since it is per million people, it doesn’t matter the population they have.

Just so you have an idea in numbers, here’s the mean of total_cases_per_million of both countries:

Conclusion

As seen in the data analysis, Covid-19 has only gotten worse, these are some of the insights we took:

  • The day we had more deaths was January 20th, 2021.
  • Burundi is the country with the least death per million people, even though they have 11.5M living there.
  • Qatar has the world’s highest GDP per capita.
  • The USA is the country with the highest amount of death.
  • Israel is the place with the most vaccinated people, taking into consideration the nation's size.
  • Canada had its death rate by Covid-19 increasing until vaccination started to take place, then it dropped drastically.
  • Central African Republic has the lowest GDP per capita in the world, however, it had only 63 deaths out of 5M people living there.

Connect With Me

If you like this article, share it and leave a comment telling me your thoughts, what could I do better? Don’t forget to add me on LinkedIn and GitHub, see you next time!

--

--

Vinicius Porfirio Purgato
Analytics Vidhya

Computer Science student in love with teaching and learning Data Science. Python lover and R bully :)