AFRICA CONTINENT COVID-19 DATA ANALYSIS AS OF JANUARY 2020 TO MAY 2023: AN SQL SERVER (SSMS) AND TEABLU PROJECT
INTRODUCTION
As of May 2023, according to the data analysed the number of confirmed COVID-19 cases in Africa amounted to around 8,630,485,021, 188,807,977 deaths and 2.19% death rate. The African continent first came in contact with the coronavirus pandemic on February 14th 2020, in the northernmost part, particularly Egypt. Since then, the different governments took severe restrictive measures to try to curb the spread of the disease. However, South Africa, Morocco and Tunisia have the highest population infected respectively in terms of numbers, but in terms of average sum of percentage by population (rate) for the period analysed, Seychelles, Reunion and Mauritius top the chat with 20.82%, 18.26% and 7.90% respectively, South Africa came 10th with 3.89%.
My goal in this project in one hand was to keep generally this work simple and in other hand was to give a detailed exploratory analysis on covid-19 dataset. This dataset was from www.ourworldindata.org, it has 307,831 entries (rows) and 66 columns. The dataset has many null values; I was first faced with two choices, to clean out right away in Excel Spreadsheet and lose huge chunk of the dataset or import the dataset with the null values into my SQL Server and start the process of cleaning up and analysing the dataset.
STEPS I TOOK TO ARRIVE AT MY DASHBOAD
The first thing I did was to create a database call ‘portfolio’ and a file call ‘projects’ by following these steps:
1) Right click on database on SQL server
2) Click on new database in the dialogue box fill the name of the database of your choice and click on add, mine was ‘Portfolio’
3) Fill the name of the file, it is the space at the beginning of the last row under database file and click ok, mime was ‘project’
4) Refresh your data base and you will see your portfolio
DATA IMPORTATION
I split the dataset into two different data files which I named “covidDeath$” and “covidVaccinations$” respectively. The two data files were imported into my newly created database. This is to reduce the size of the dataset being analyse at a given time, thus, increasing the speed of executing a task by my SQL Server.
CALLING UP THE IMPORTED DATASET
SELECT *
FROM PortfolioProjects..covidDeath$
ORDER BY 3,4;
— has 307,831 entries(rows) and 26 columns
— has 307,831 entries(rows) and 26 columns
SELECT *
from PortfolioProjects..covidVaccinations$;
ORDER BY 3,4;
— has 307,831 entries and 44 columns
DATA EXPLORATORY
I started by creating a Common Table Expression (CTE), I used this process to select the columns that I needed for this analysis.
;With Glober_death As (
SELECT Continent
, location
, date
, population
,total_cases
,new_cases
,total_deaths
,new_deaths
,icu_patients
,hosp_patients
FROM PortfolioProjects..covidDeath$
)
— viewing the created CTE
— SELECT * FROM Glober_death;
— I Still have 307,831 entries, this shows my CTE table is working, Then I — — pass my CTE table to another CTE: ‘Africa_Deaths’ where I filtered or — — — narrowed the dataset down to Africa Continent.
,Africa_Deaths AS
(
SELECT *
FROM Glober_death
WHERE continent LIKE ‘%Africa%’
)
— Select * From Africa_Deaths;
— 69,370 entries were left for further analysis
— Passing ‘CTE Africa_Deaths’ into a Tempt table: ‘#deaths_byCovid2’
Select * INTO #deaths_byCovid2
FROM Africa_Deaths;
This Tempt table: #deaths_byCovid2, is what I will be analyzing, it has 69,370 entries and 10 columns, the 10 columns where columns of choice and the rows are entries for Africa. it is necessary to note that, during all this process the data were not lost, I only filtered out what I do not need for this analysis
— viewing my new tempt table
Select * FROM #deaths_byCovid2;
ANALYSING #DEATHS_BYCOVID2 DATASET
1. SHOWING TOTAL AFRICA CONTINENT CASES, TOTAL DEATH, AND DEATH_RATE:
SELECT Sum(total_cases ) as total_case,
Sum( total_deaths) as total_death,
(Sum( total_deaths *1.0)/Sum(total_cases ) )* 100 as death_rate
FROM #deaths_byCovid2;
During the covid pandemic, the rate of death in Africa was 2.19%, a total of 188,807,977 deaths were recorded and a total of 8,630,485,021 cases where recorded
2 LOOKING AT HIGHEST CASES BY COUNTRY
SELECT location,
Max(total_cases) as Max_total_caseCount,
SUM(total_cases) as SUM_totalcases
From #deaths_byCovid2
Where total_cases is not null
Group by location
order by 3 Desc;
South Africa top the record with 4,072,533 highest cases ever recorded and a sum total of 2,832,987,163 cases. Saint Helena has the lowest 2,166 4,747,76
2A LOOKING AT HIGHEST CASES VS LOCATION WITH WINDOW FUNCTION
SELECT location,
MAX(total_cases) OVER (PARTITION BY location) AS Max_totalcaseCount,
SUM(total_cases) OVER (PARTITION BY location) AS SUM_totalcases
FROM #deaths_byCovid2
Where total_cases is not null
ORDER BY 2,3 DESC; (I got same result as the above query)
2B LOOKING AT THE TOP 20 COUNTRIES WITH HIGHEST CASES
SELECT TOP 20 location,
SUM(total_cases) as Sumtotalcase
FROM #deaths_byCovid2
GROUP BY location
ORDER BY 2 DESC;
South Africa top the list followed in sequence five North Africa countries, Central africal holds 9th west Africa holds 10th position
3 LOOK AT DEATH RATE BY LOCATION AND BY YEAR
Showing percentage of total death by number of cases: the * 1.0 added to the query helps to give a float result when dividing an integer by an integer in SQL
SELECT
location,
DATEPART(YEAR, date) AS Year,
(SUM(total_deaths*1.0)/ SUM(total_cases))*100 AS total_deathpercentage
FROM #deaths_byCovid2
WHERE total_deaths IS NOT NULL
Group by location, DATEPART(YEAR, date)
ORDER BY 2,3 DESC;
In 2020 , Niger, Chad and Sudan top percentage list of total death by Cases with 7.33%, 7.29% and 6.39% respectively. In 2021, Sudan, Egypt and Somalia top the list with 7.08%, 5.74% and 5.23% respectively. In 2022, Sudan, Somalia and Egypt top the list with 7.64%, 5.10% and 4.89% respectively. In 2023, Sudan, Somalia and Egypt top the list with 7.86%, 4.98% and 4.81% respectively. North Africa region top the percentage death list consecutively for the period of study (2020–2023)
4 LOOKING AT COUNTRIES WITH HIGHEST POPULATION INFECTION RATE
Shows population got covid by Countries
SELECT
location,
Sum(population) as total_population,
Sum(total_cases) as population_Infected,
(Sum(total_cases) * 1.0/Sum(population)) * 100 as pecentage_PopulationInfected
FROM #deaths_byCovid2
GROUP BY location, population
ORDER BY pecentage_PopulationInfected DESC;
Seychelles toped the list with 20.82%
5 LOOKING AT COUNTRIES THAT MANAGED THEIR CASES IN THE HOSPITAL
SELECT location,
Sum(total_deaths) As totalDeath,
Sum(hosp_patients) AS hosp_patients,
Sum(icu_patients) As icu_patients
FROM #deaths_byCovid2
Group by location
ORDER BY 2,3;
Only South Africa and Algeria have records out of 58 entries, thus I decided to filter further
SELECT location,
Max(total_cases) as highest_CountCases,
Max(icu_patients) as highestCount_Icupatients,
Max(hosp_patients) as highestCount_Hosp_patients
FROM #deaths_byCovid2
where icu_patients is not null
Group by location
ORDER BY 1,2;
the result happens to be only two recodes, this could by as a result of Ignorance, social Stigmatization, Myth and Conspiracy Theories that surrounds the covid-19 outbreak in Africa
6 LOOKING AT NEW CASES VS NEW DEATH
SELECT location, date,
SUM(new_cases) over(partition by location order by location, date) as totalNew_cases,
SUM(new_deaths) over(partition by date ) as totalNew_deaths
FROM #deaths_byCovid2
where new_cases is not null
ORDER BY 3,4 Desc;
I took a look at the vaccination table performed a few analyses on it, but first, I had to Join my imported tables and select few columns I needed to complete this project. I piped my jointed table into a Tempt table: #VaccineCTE
Joining my imported tables on location and date columns and I narrowed down to Africa by filtering the dataset
SELECT pcd.Continent,
pcd.location
,pcd.date
,pcd.population
,pcv.total_vaccinations
,pcv.people_vaccinated
,pcv.people_fully_vaccinated
,pcv.new_vaccinations
INTO #VaccineCTE
FROM PortfolioProjects..covidDeath$ pcd
FULL Join PortfolioProjects..covidVaccinations$ pcv
ON pcd.location = pcv.location
AND pcd.date = pcv.date
Where pcd.continent Like ‘%Africa%’
AND pcd.continent is not null;
VIEWING MY NEW DATASET:
select * from #VaccineCTE
— 69,370 entries
— I decided to use another method to handle datatype as I Continue this analysis
7 SHOWING PECENTAGE TOTAL VACINATION BY POPULATION
SELECT
location,
population,
(sum(Cast(total_vaccinations as bigint)* 1.0)/sum(Cast(population as bigint)))* 100 AS percentage_TotalVaccinated
FROM #VaccineCTE
Where total_vaccinations is not null
Group by location,population
ORDER BY location, population, percentage_TotalVaccinated DESC;
8 SHOWING PECENTAGE OF PEOPLE FULLY VACCINATED
SELECT
location,
population,
(Sum(Cast(people_fully_vaccinated AS bigint) *1.0)/Sum(Cast(population AS bigint)))* 100 AS percentage_fullyVaccinated
FROM #VaccineCTE
Where people_fully_vaccinated is not null
Group by location,population
ORDER BY location, population, percentage_fullyVaccinated DESC;
9 VISUALIZATIONS OF DATA
All my calculations and 95% of data cleaning were done in SQL server, Tableau was strictly use for visualization. From the visualization, North African countries Top the chat in death by population rate: Sudan, Egypt and Liberia were on top with 6.39% death in 2020, 7.08% in 2021, 7.65% in 2022 and 7.86% 2023 for Sudan; Egypt has 4.81% for 2023, 4.88% for 2022, 5.75% for 2021 and 5.41% 2020; Liberia has 3.64% in 2023, 3.85% in 2022, 4.75% in 2021 and 6.91% in 2020.
South African had the highest cases recoded, this could be as a result of temperature falls between June and August, temperature can get as low to -20C In some part of South Africa. At the heel of South Africa, were Morocco Tunisia and Egypt respectively, Ethiopia, Central Africa came 5th.
January, July, August and December where the months that have the highest out break of new cases and January, February July, August, December has the highest records of new deaths. It should be noted that only South Africa and Algeria had record of cases managed in hospital, South Africa have 2694 patients in ICU and 18034 hospital Patient, Algeria has 67 patients in ICU none as hospital patient. This could be as a result of social Stigmatization, Myth and Conspiracy Theories that surrounds the covid-19 outbreak in Africa
Seychelle, Morocco, Tunisia and South Africa top the chart of percentage of people fully vaccinated by population with 53.04%, 33.16%, 30.04% and 23.82% respectively
SUMMARY
The continent, has had about 188,807,977 death cases from 2020 to 2023 according to dataset from www.ourworldindata.org. There was less death in Central and West Africa compare to other part of the continent in that Central Africa did not make the list of top 15 countries with high percentage death rate, West has only one among the top 15 countries.
Some countries with high population had low infection and death rate, this might be as a result of under-reporting of some of the cases in certain countries and the efficient handling of the outbreak in some others. There was low hospital record in the continent, this again could be as a result of pool health facilities, fear and ignorance about the virus that led to conspiracy theories. The vaccination rate did not translate to the population of each country, it was only Seychelle, that has 50% of her population vaccinated.
thanks