Ebenezerakpati
9 min readMay 16, 2023

AFRICA CONTINENT COVID-19 DATA ANALYSIS AS OF JANUARY 2020 TO MAY 2023: AN SQL SERVER (SSMS) AND TEABLU PROJECT

INTRODUCTION

As of May 2023, according to the data analysed the number of confirmed COVID-19 cases in Africa amounted to around 8,630,485,021, 188,807,977 deaths and 2.19% death rate. The African continent first came in contact with the coronavirus pandemic on February 14th 2020, in the northernmost part, particularly Egypt. Since then, the different governments took severe restrictive measures to try to curb the spread of the disease. However, South Africa, Morocco and Tunisia have the highest population infected respectively in terms of numbers, but in terms of average sum of percentage by population (rate) for the period analysed, Seychelles, Reunion and Mauritius top the chat with 20.82%, 18.26% and 7.90% respectively, South Africa came 10th with 3.89%.

My goal in this project in one hand was to keep generally this work simple and in other hand was to give a detailed exploratory analysis on covid-19 dataset. This dataset was from www.ourworldindata.org, it has 307,831 entries (rows) and 66 columns. The dataset has many null values; I was first faced with two choices, to clean out right away in Excel Spreadsheet and lose huge chunk of the dataset or import the dataset with the null values into my SQL Server and start the process of cleaning up and analysing the dataset.

STEPS I TOOK TO ARRIVE AT MY DASHBOAD

The first thing I did was to create a database call ‘portfolio’ and a file call ‘projects’ by following these steps:

1) Right click on database on SQL server

2) Click on new database in the dialogue box fill the name of the database of your choice and click on add, mine was ‘Portfolio’

3) Fill the name of the file, it is the space at the beginning of the last row under database file and click ok, mime was ‘project’

4) Refresh your data base and you will see your portfolio

DATA IMPORTATION

I split the dataset into two different data files which I named “covidDeath$” and “covidVaccinations$” respectively. The two data files were imported into my newly created database. This is to reduce the size of the dataset being analyse at a given time, thus, increasing the speed of executing a task by my SQL Server.

CALLING UP THE IMPORTED DATASET

SELECT *

FROM PortfolioProjects..covidDeath$

ORDER BY 3,4;

— has 307,831 entries(rows) and 26 columns

— has 307,831 entries(rows) and 26 columns

SELECT *

from PortfolioProjects..covidVaccinations$;

ORDER BY 3,4;

— has 307,831 entries and 44 columns

DATA EXPLORATORY

I started by creating a Common Table Expression (CTE), I used this process to select the columns that I needed for this analysis.

;With Glober_death As (

SELECT Continent

, location

, date

, population

,total_cases

,new_cases

,total_deaths

,new_deaths

,icu_patients

,hosp_patients

FROM PortfolioProjects..covidDeath$

)

— viewing the created CTE

— SELECT * FROM Glober_death;

— I Still have 307,831 entries, this shows my CTE table is working, Then I — — pass my CTE table to another CTE: ‘Africa_Deaths’ where I filtered or — — — narrowed the dataset down to Africa Continent.

,Africa_Deaths AS

(

SELECT *

FROM Glober_death

WHERE continent LIKE ‘%Africa%’

)

— Select * From Africa_Deaths;

— 69,370 entries were left for further analysis

— Passing ‘CTE Africa_Deaths’ into a Tempt table: ‘#deaths_byCovid2’

Select * INTO #deaths_byCovid2

FROM Africa_Deaths;

This Tempt table: #deaths_byCovid2, is what I will be analyzing, it has 69,370 entries and 10 columns, the 10 columns where columns of choice and the rows are entries for Africa. it is necessary to note that, during all this process the data were not lost, I only filtered out what I do not need for this analysis

— viewing my new tempt table

Select * FROM #deaths_byCovid2;

ANALYSING #DEATHS_BYCOVID2 DATASET

1. SHOWING TOTAL AFRICA CONTINENT CASES, TOTAL DEATH, AND DEATH_RATE:

SELECT Sum(total_cases ) as total_case,

Sum( total_deaths) as total_death,

(Sum( total_deaths *1.0)/Sum(total_cases ) )* 100 as death_rate

FROM #deaths_byCovid2;

During the covid pandemic, the rate of death in Africa was 2.19%, a total of 188,807,977 deaths were recorded and a total of 8,630,485,021 cases where recorded

2 LOOKING AT HIGHEST CASES BY COUNTRY

SELECT location,

Max(total_cases) as Max_total_caseCount,

SUM(total_cases) as SUM_totalcases

From #deaths_byCovid2

Where total_cases is not null

Group by location

order by 3 Desc;

South Africa top the record with 4,072,533 highest cases ever recorded and a sum total of 2,832,987,163 cases. Saint Helena has the lowest 2,166 4,747,76

2A LOOKING AT HIGHEST CASES VS LOCATION WITH WINDOW FUNCTION

SELECT location,

MAX(total_cases) OVER (PARTITION BY location) AS Max_totalcaseCount,

SUM(total_cases) OVER (PARTITION BY location) AS SUM_totalcases

FROM #deaths_byCovid2

Where total_cases is not null

ORDER BY 2,3 DESC; (I got same result as the above query)

2B LOOKING AT THE TOP 20 COUNTRIES WITH HIGHEST CASES

SELECT TOP 20 location,

SUM(total_cases) as Sumtotalcase

FROM #deaths_byCovid2

GROUP BY location

ORDER BY 2 DESC;

South Africa top the list followed in sequence five North Africa countries, Central africal holds 9th west Africa holds 10th position

3 LOOK AT DEATH RATE BY LOCATION AND BY YEAR

Showing percentage of total death by number of cases: the * 1.0 added to the query helps to give a float result when dividing an integer by an integer in SQL

SELECT

location,

DATEPART(YEAR, date) AS Year,

(SUM(total_deaths*1.0)/ SUM(total_cases))*100 AS total_deathpercentage

FROM #deaths_byCovid2

WHERE total_deaths IS NOT NULL

Group by location, DATEPART(YEAR, date)

ORDER BY 2,3 DESC;

In 2020 , Niger, Chad and Sudan top percentage list of total death by Cases with 7.33%, 7.29% and 6.39% respectively. In 2021, Sudan, Egypt and Somalia top the list with 7.08%, 5.74% and 5.23% respectively. In 2022, Sudan, Somalia and Egypt top the list with 7.64%, 5.10% and 4.89% respectively. In 2023, Sudan, Somalia and Egypt top the list with 7.86%, 4.98% and 4.81% respectively. North Africa region top the percentage death list consecutively for the period of study (2020–2023)

4 LOOKING AT COUNTRIES WITH HIGHEST POPULATION INFECTION RATE

Shows population got covid by Countries

SELECT

location,

Sum(population) as total_population,

Sum(total_cases) as population_Infected,

(Sum(total_cases) * 1.0/Sum(population)) * 100 as pecentage_PopulationInfected

FROM #deaths_byCovid2

GROUP BY location, population

ORDER BY pecentage_PopulationInfected DESC;

Seychelles toped the list with 20.82%

5 LOOKING AT COUNTRIES THAT MANAGED THEIR CASES IN THE HOSPITAL

SELECT location,

Sum(total_deaths) As totalDeath,

Sum(hosp_patients) AS hosp_patients,

Sum(icu_patients) As icu_patients

FROM #deaths_byCovid2

Group by location

ORDER BY 2,3;

Only South Africa and Algeria have records out of 58 entries, thus I decided to filter further

SELECT location,

Max(total_cases) as highest_CountCases,

Max(icu_patients) as highestCount_Icupatients,

Max(hosp_patients) as highestCount_Hosp_patients

FROM #deaths_byCovid2

where icu_patients is not null

Group by location

ORDER BY 1,2;

the result happens to be only two recodes, this could by as a result of Ignorance, social Stigmatization, Myth and Conspiracy Theories that surrounds the covid-19 outbreak in Africa

6 LOOKING AT NEW CASES VS NEW DEATH

SELECT location, date,

SUM(new_cases) over(partition by location order by location, date) as totalNew_cases,

SUM(new_deaths) over(partition by date ) as totalNew_deaths

FROM #deaths_byCovid2

where new_cases is not null

ORDER BY 3,4 Desc;

I took a look at the vaccination table performed a few analyses on it, but first, I had to Join my imported tables and select few columns I needed to complete this project. I piped my jointed table into a Tempt table: #VaccineCTE

Joining my imported tables on location and date columns and I narrowed down to Africa by filtering the dataset

SELECT pcd.Continent,

pcd.location

,pcd.date

,pcd.population

,pcv.total_vaccinations

,pcv.people_vaccinated

,pcv.people_fully_vaccinated

,pcv.new_vaccinations

INTO #VaccineCTE

FROM PortfolioProjects..covidDeath$ pcd

FULL Join PortfolioProjects..covidVaccinations$ pcv

ON pcd.location = pcv.location

AND pcd.date = pcv.date

Where pcd.continent Like ‘%Africa%’

AND pcd.continent is not null;

VIEWING MY NEW DATASET:

select * from #VaccineCTE

— 69,370 entries

— I decided to use another method to handle datatype as I Continue this analysis

7 SHOWING PECENTAGE TOTAL VACINATION BY POPULATION

SELECT

location,

population,

(sum(Cast(total_vaccinations as bigint)* 1.0)/sum(Cast(population as bigint)))* 100 AS percentage_TotalVaccinated

FROM #VaccineCTE

Where total_vaccinations is not null

Group by location,population

ORDER BY location, population, percentage_TotalVaccinated DESC;

8 SHOWING PECENTAGE OF PEOPLE FULLY VACCINATED

SELECT

location,

population,

(Sum(Cast(people_fully_vaccinated AS bigint) *1.0)/Sum(Cast(population AS bigint)))* 100 AS percentage_fullyVaccinated

FROM #VaccineCTE

Where people_fully_vaccinated is not null

Group by location,population

ORDER BY location, population, percentage_fullyVaccinated DESC;

9 VISUALIZATIONS OF DATA

All my calculations and 95% of data cleaning were done in SQL server, Tableau was strictly use for visualization. From the visualization, North African countries Top the chat in death by population rate: Sudan, Egypt and Liberia were on top with 6.39% death in 2020, 7.08% in 2021, 7.65% in 2022 and 7.86% 2023 for Sudan; Egypt has 4.81% for 2023, 4.88% for 2022, 5.75% for 2021 and 5.41% 2020; Liberia has 3.64% in 2023, 3.85% in 2022, 4.75% in 2021 and 6.91% in 2020.

South African had the highest cases recoded, this could be as a result of temperature falls between June and August, temperature can get as low to -20C In some part of South Africa. At the heel of South Africa, were Morocco Tunisia and Egypt respectively, Ethiopia, Central Africa came 5th.

January, July, August and December where the months that have the highest out break of new cases and January, February July, August, December has the highest records of new deaths. It should be noted that only South Africa and Algeria had record of cases managed in hospital, South Africa have 2694 patients in ICU and 18034 hospital Patient, Algeria has 67 patients in ICU none as hospital patient. This could be as a result of social Stigmatization, Myth and Conspiracy Theories that surrounds the covid-19 outbreak in Africa

Seychelle, Morocco, Tunisia and South Africa top the chart of percentage of people fully vaccinated by population with 53.04%, 33.16%, 30.04% and 23.82% respectively

SUMMARY

The continent, has had about 188,807,977 death cases from 2020 to 2023 according to dataset from www.ourworldindata.org. There was less death in Central and West Africa compare to other part of the continent in that Central Africa did not make the list of top 15 countries with high percentage death rate, West has only one among the top 15 countries.

Some countries with high population had low infection and death rate, this might be as a result of under-reporting of some of the cases in certain countries and the efficient handling of the outbreak in some others. There was low hospital record in the continent, this again could be as a result of pool health facilities, fear and ignorance about the virus that led to conspiracy theories. The vaccination rate did not translate to the population of each country, it was only Seychelle, that has 50% of her population vaccinated.

thanks