A PROJECT ON EXPLORING COVID-19 DEATHS AND VACCINATIONS DATA WITH SQL USING POSTGRESQL

Oluwagbenga Ajetomobi
2 min readNov 14, 2023

--

OBJECTIVE: This SQL-only project delves into COVID-19 data using PostgreSQL, focusing on death statistics and vaccination details. The analysis aims to reveal key metrics and answer critical questions related to death likelihood in India, percentage of deaths to the entire population, global and regional case percentages, and continent-wise data.

DATASET: The dataset encompasses a comprehensive collection of COVID-19 information, including death and vaccination data, providing a holistic view of the pandemic’s impact. The original dataset which was gotten from Our World in Data. The data was separated into two files-CovidDeaths and CovidVaccinations(link here)

In PostgreSQL, “Portfolioproject” database was created and two tables(CovidDeath and CovidVaccinations) along with their columns and its datatype was created under the database. The csv files was imported into the SQL table to prepare them for analysis.

QUESTIONS TO EXPLORE:

The tables were queried to gain insights and answer the following questions from the data

1. Likelihood of Dying in India when infected with COVID19 by date

Query;

select date,total_cases,total_deaths from “CovidDeaths” where location like “%India%”

2. Total percentage of death due to COVID19 in entire India population

Query;

select max(total_deaths)/avg(population)*100 from “CovidDeaths” where location like “%India%”

3. Country with the highest death rate as a percentage of their population

Query;

select location,(max(total_deaths)/avg(population)*100) as percentage from “CovidDeaths” group by location order by percentage desc;

4. Total percentage of positive COVID19 cases in India

Query;

select (cast(max(total_cases) as double precision)/avg(cast(population as double precision))*100) as percentage from "CovidDeaths" where location like '%India%’;

5. Total percentage of positive COVID19 cases in the world

Query;

select location,(cast(max(total_cases) as double precision)/avg(cast(population as double precision))*100) as percentage from "CovidDeaths" group by location order by percentage desc;

6. Total number of positive COVID19 cases by continent

Query;

select location,max(total_cases) as total_cases from "CovidDeaths" where continent is null group by location order by total_cases desc;

7. Total number of deaths caused by COVID19 by continent

Query;

select location,max(total_deaths) as total_deaths from "CovidDeaths" where continent is null group by location order by total_deaths desc;

8. Total number of vaccinated persons by country

Query;

select "CovidDeaths".location as country,max("CovidVaccinations".people_fully_vaccinated) as Fully_vaccinated from "CovidDeaths" join "CovidVaccinations" on "CovidDeaths".iso_code="CovidVaccinations".iso_code and "CovidDeaths".date="CovidVaccinations".date where "CovidDeaths".continent is not null group by country order by Fully_vaccinated desc;

The completely documented list of the queries can be downloaded here.

CONCLUSION

This SQL project utilizes PostgreSQL to extract and analyze COVID-19 data, presenting key metrics and insights into death likelihood, percentage of deaths, and case percentages. The findings provide valuable information for understanding the pandemic’s dynamics at both a global and regional level, supporting data-driven decision-making and public health strategies.

--

--