THE DESIGN AND ANALYSIS OF COVID-19 DATA USING SQL PART-TWO

Akande Imisioluwa
9 min readApr 21, 2020

--

Good job for following through the part-one of this article. In the second phase of this article, we will be analyzing the organized COVID19 data collected on the 17th of April, 2020 in order to draw conclusions about certain information using Structured Query Language(SQL).

To start with, here are some SQL formatting best practices I have observed in part-one and some new standards we will observe in this part.

  • Using upper cases for SQL Clauses, statements, functions and conditions.
  • Using lower cases for table and column names.
  • Adding semicolon to the end of each statement.
  • Ensuring that there is no space between table and column names.

Case-Study

Let’s assume Mark is a Data Analyst at the European Centre for Disease Prevention and Control(ECDC). The Director of ECDC asked him numerous research questions on COVID-19. Mark is expected to respond to her request in two days by generating a report on the analysis of COVID-19 data in the database. He is expected to present this report during a board-meeting coming up in four days time. Let’s assume Mark has SQL as the only tool to implement this job.

Next, we are working as Mark(the Data Analyst) and we will be simulating some research questions to analyze our COVID-19 data. You can run the queries provided as answers on your query panel in your Database Management System(DBMS) in order to make sense of the data. You will learn some SQL concepts and data analytics here.

Note: We will assume the current date to be the data collection date.

Question 1

Show the highest number of confirmed coronavirus cases the world has recorded daily. Return a table with a record of 10 highest number of confirmed coronavirus cases in the world to date and do not display the affected countries.

SELECT DISTINCT ca.occurred_at, ca.confirmed_case
FROM cases ca
ORDER BY confirmed_case DESC
LIMIT 10;
Figure 1: 10 highest number of coronavirus

Question 2

We would like to know if a death toll of over 1000 has been recorded on any day as a result of the coronavirus to date. Do not mention the affected countries.

SELECT DISTINCT ca.occurred_at, ca.death_toll 
FROM cases ca
WHERE ca.death_toll > 1000
LIMIT 3;
Figure 2: The coronavirus death toll of over 1000

Question 3

Return a record of all coronavirus affected countries with the keyword ‘United’.

SELECT name
FROM countries
WHERE name LIKE '%United%';
Figure 3: coronavirus affected countries with the keyword ‘United’

Question 4

Return a record of all confirmed coronavirus cases contained on the 11th and 15th of April, 2020.

SELECT confirmed_case, occurred_at
FROM cases
WHERE occurred_at IN ('11/04/2020', '15/04/2020');
Figure 4: Confirmed coronavirus cases contained on the 11th and 15th of April, 2020

Question 5

Return a record of all coronavirus affected countries with names outside the United Kingdom, Canada, and the United States.

SELECT name
FROM countries
WHERE name NOT IN ('United_Kingdom', 'United_States_of_America',
'Canada');
Figure 5: All coronavirus affected countries outside the United Kingdom, Canada, and the United States

Question 6

Return a record of all coronavirus affected countries without the keyword ‘United’.

SELECT name
FROM countries
WHERE name NOT LIKE '%United%';
Figure 6: All coronavirus affected countries without the keyword ‘United’

Question 7

Return a record of all coronavirus affected countries with names within the United Kingdom, United Arab Emirates, United Republic of Tanzania, United States Virgin Islands and the United States of America and select only countries with ids between 193 and 196.

SELECT *
FROM countries
WHERE name IN ('United_Kingdom', 'United_Arab_Emirates',
'United_Republic_of_Tanzania',
'United_States_Virgin_Islands',
'United_States_of_America')AND id BETWEEN 193 AND 196
ORDER BY id DESC;
Figure 7: All coronavirus affected countries with ids between 193 and 196

Question 8

Return a record of all the death toll associated with the United States of America to date.

SELECT co.name, ca.death_toll, ca.occurred_at
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.name = 'United_States_of_America';
Figure 8: Death toll associated with the United States of America to date

Question 9

Return the countries with coronavirus confirmed cases to date in descending order.

SELECT co.name, ca.confirmed_case, ca.occurred_at
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
ORDER BY co.name DESC;
Figure 9: Descending order of Coronavirus affected countries

Question 10

Return the record of coronavirus confirmed cases and deaths with countries starting with the letter ‘C’ in ascending order to date.

SELECT co.name, ca.confirmed_case, ca.death_toll, ca.occurred_at
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.name LIKE 'C%'
ORDER BY co.name;
Figure 10: Confirmed cases and deaths with countries starting with letter ‘C’

Question 11

Return the coronavirus confirmed cases greater than 5000 and death toll greater than 500 with associated countries.

SELECT co.name, ca.confirmed_case, ca.death_toll, ca.occurred_at
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE ca.confirmed_case > 5000 AND ca.death_toll > 500;
Figure 11: Confirmed cases greater than 500 and deaths greater than 500

Question 12

Return the total number of coronavirus confirmed cases and deaths since the inception of the disease in the world.

SELECT SUM(ca.confirmed_case) Confimed_cases, 
SUM(ca.death_toll) Deaths
FROM cases ca;
Figure 12: Total cases and deaths

Question 13

Return an average of the coronavirus death toll for France.

SELECT co.name, AVG(ca.death_toll) toll_average
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.name = 'France';
Figure 13: Average Coronavirus death toll in France

Question 14

Return a record of 10 countries with the highest average number of the coronavirus death toll.

SELECT co.name, AVG(ca.death_toll) toll_average
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY co.name
ORDER BY toll_average DESC
LIMIT 10;
Figure 14: Highest average numbers of the coronavirus death toll for 10 countries

Question 15

Return the highest number of coronavirus cases ever in China

SELECT co.name, MAX(ca.confirmed_case) 
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.name = 'China' ;
Figure 15: Highest number of coronavirus cases in China

Question 16

Return the lowest number of coronavirus cases ever in the United States of America(USA).

SELECT co.name, MIN(ca.confirmed_case) confirmed_case
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.name = 'United_States_of_America' ;
Figure 16: Lowest number of Coronavirus cases in the USA

Question 17

Return a record of all coronavirus affected countries with their highest number of cases in descending order.

SELECT co.name, MAX(ca.confirmed_case) confirmed_case
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY co.name
ORDER BY confirmed_case DESC;
Figure 17: Coronavirus affected countries with their highest number of cases in descending order

Question 18

Return a record of all coronavirus affected countries with their highest number of death tolls ever in descending order.

SELECT co.name, MAX(ca.death_toll) death_toll
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY co.name
ORDER BY death_toll DESC;
Figure 18: Coronavirus affected countries with their highest number of deaths in descending order

Question 19

Return a record of coronavirus affected countries and their population in descending order.

SELECT DISTINCT co.name, co.population_count 
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
ORDER BY population_count DESC;
Figure 19: Coronavirus affected countries and their population in descending order

Question 20

Return a record of coronavirus affected countries with a population of above 200,000,000.

SELECT DISTINCT co.name, co.population_count 
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
WHERE co.population_count > 200000000
ORDER BY population_count DESC;
Figure 20: Coronavirus affected countries with a population of above 200,000,000.

Question 21
Return a table of coronavirus affected countries with their population and those with a number of daily records(count) greater than 90. In all, select only five countries with records greater than 100.

SELECT DISTINCT co.name, co.population_count population_count, COUNT(*)
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY co.id
HAVING COUNT(*) > 100
ORDER BY COUNT(*) DESC
LIMIT 5;
Figure 21: Five countries with daily records greater than 100.

Question 22

Return a table of coronavirus affected countries with cases and tolls to date in decreasing order with the status description label of high risk if the number of confirmed cases is greater than 1000 and death tolls greater than 135 else it should be categorized as low risk.

SELECT co.name, ca.confirmed_case confirmed_case, ca.death_toll, 
occurred_at,
CASE WHEN ca.confirmed_case > 1000 AND ca.death_toll > 135
THEN 'High Risk' ELSE 'Low Risk' END AS Risk_level
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
ORDER BY confirmed_case DESC;
Figure 22: High-risk versus Low-risk days of the coronavirus disease

Question 23

Return the sum of death tolls by coronavirus affected countries in descending order and label death tolls greater than 1000 as high risk, death tolls greater than 500 as middle risk and death toll less than 500 as low risks.

SELECT co.name, SUM(ca.death_toll) death_toll,
CASE WHEN SUM(ca.death_toll) > 1000 THEN 'high risk'
WHEN SUM(ca.death_toll) > 500 THEN 'middle risk'
ELSE 'low risk' END AS risk_level
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY 1
ORDER BY 2 DESC;
Figure 23: Sum of the death tolls by coronavirus affected countries with risk levels

Question 24

Sequel to question 23 above, return a record of countries that suffer a high-risk threat to the coronavirus disease.

SELECT name, risk_level
FROM (SELECT co.name, SUM(ca.death_toll) death_toll,
CASE WHEN SUM(ca.death_toll) > 1000 THEN 'high risk'
WHEN SUM(ca.death_toll) > 500 THEN 'middle risk'
ELSE 'low risk' END AS risk_level
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY 1
ORDER BY 2 DESC) temp_table
WHERE risk_level = 'high risk';
Figure 24:High-risk coronavirus infected countries

Question 25

Sequel to question 23 above, return a record with the countries that suffer a low-risk and middle-risk threat to the coronavirus disease.

WITH t1 AS (
SELECT co.name , SUM(ca.death_toll) death_toll,
CASE WHEN SUM(ca.death_toll) > 1000 THEN 'high risk'
WHEN SUM(ca.death_toll) > 500 THEN 'middle risk'
ELSE 'low risk' END AS risk_level
FROM cases ca
JOIN countries co
ON co.id = ca.country_id
GROUP BY 1),
t2 AS (
SELECT name, risk_level, MAX(death_toll) death_toll
FROM t1
GROUP BY 1)
SELECT t1.name, t1.death_toll, t1.risk_level
FROM t1
JOIN t2
ON t1.Name = t2.name AND t1.risk_level = t2.risk_level
WHERE t1.risk_level IN ('low risk', 'middle risk')
ORDER BY 2 DESC;
Figure 25: Low and Middle-risk coronavirus infected countries

Good job for you to have followed through to this point. We have utilized a diverse concept in SQL to solve a number of problems in this article. We applied SQL basics such as SELECT, FROM, ON, WHERE, ORDER BY, LIMIT, NOT, LIKE, AND, BETWEEN, OR, JOINS, Alias and many more. Alongside with SQL aggregations such as MIN, MAX, COUNT, SUM, AVG, HAVING, CASE and many more. We also covered SQL Subqueries and the WITH command which is also known as the Common Table Expression(CTE). You can play around this subject topic by running additional queries.

Overall, we have been able to analyze the data and put it out as a piece of information that is meaningful to a Non-technical stakeholder. This extracted information can still be processed another step further by visualizing it in a graphical format using Line graphs, Bar graphs, histograms, and Line plots. We would not be going into data visualization because it is outside the scope of this course. However, If you are interested in moving a step further you might consider using Tableau for your Data Visualization.

Don’t just skim through. Read, Practice and Please feel free to drop your feedback and comments.

--

--