Introduction to SQL (Part 3)
A beginner guide for learning SQL from scratch
In this section, we will explore aggregate functions in SQL that allow us to summarize data on a selected column. These functions can be used to answer questions like:
“How many customers churned this week?”
“What is the total revenue for each region?”
“What is the most favorite pizza?”
The functions themselves are similar to aggregate functions in spreadsheet or any other analytics program. These are some common aggregate functions that will be covered in this section.
Each aggregate function is going to run on the whole set of records and return a single value when there is no specific grouping. Aggregate functions are often used with the GROUP BY
clause, which allows us to group our data based on a specific column. We will also learn about aliases, which will make aggregate function results easier to read and process. Let’s get started.
1. Aggregate Functions
COUNT Function
The COUNT
function is a fundamental aggregation function that allows you to count the number of records in a table or the number of records that meet specific criteria. These are some common queries for COUNT
function.
-- Count all records in a table, null values are included
SELECT
COUNT(*)
FROM
table_1;
-- Count all records in column_1, non null values
SELECT
COUNT(column_1)
FROM
table_1;
-- Count all distinct values in column_2, non null values
-- Alias the result as count_value
SELECT
COUNT(DISTINCT column_2) AS count_value
FROM
table_1;
MIN and MAX Function
The MIN
function returns the lowest numerical value or earliest date in a specific column. On the other hand MAX
function does the opposite, it returns the highest numerical value or latest date in a specific column.
-- Find the minimum value in a specific column, non null values
-- Alias the result as min_value
SELECT
MIN(column_1) AS min_value
FROM
table_1;
-- Find the maximum value in a specific column, non null values
-- Alias the result as max_value
SELECT
MAX(column_1) AS max_value
FROM
table_1;
SUM and AVG Function
The SUM
function is used to calculate the sum of all numerical values in a specific column and the AVG
function is used to calculates the average of numerical values in a specific column.
-- Sum all records in a specific column, non null values
-- Alias the result as total_value
SELECT
SUM(column_1) AS total_value
FROM
table_1;
-- Calculate the average of all records in a specific column, non null values
-- Alias the result as avg_value
SELECT
AVG(column_1) AS total_value
FROM
table_1;
2. GROUP BY and HAVING clause
The GROUP BY
clause is an essential part of aggregation. It allows us to group our data based on a specific column.
-- Select customer_id and sum of price colum, then alias the result as total_price
-- Filter the results where product_id = ramen and sum of price column greater than 25
-- Then group the results based on customer_id
SELECT
customer_id,
SUM(price) AS total_price
FROM
sales
WHERE
product_id = 'ramen'
GROUP BY
customer_id
HAVING
SUM(price) > 25;
You’ve probably noticed that the GROUP BY
clause is put and executed following the WHERE
clause, so you might be wondering how to filter the result after the GROUP BY
clause has been applied. After the GROUP BY
clause, there is another filter called the HAVING
clause that is used specifically to filter grouped records from the result. The HAVING
clause is similar to the WHERE
clause in terms of operators that can be applied and how it is executed. You can learn more about the WHERE
clause in the previous section, Introduction to SQL (Part 2).
3. ALIASES
The expressions used in queries can get long and hard to read as queries become more complex. In order to make query and its result clearer and easier to read, we can use alias to assign a new name for the long and complex expression in the query. In the previous queries, we’ve already used AS
to assign a new name for the selected column, for example SUM(price) AS total_price
. In the next series of Intermediate SQL we will learn how to apply alias for joining tables, creating subqueries and creating common table expressions (CTE).
Let’s solve some exercises from SQL Bolt.
To answer the 1st and 2nd questions in Exercise 10 of SQL Bolt you need to do these queries:
-- 1st Question
SELECT
MAX(years_employed) AS longest_period
FROM
employees;
-- 2nd Question
SELECT
role,
AVG(years_employed) AS avg_years_employed
FROM
employees
GROUP BY
role;
To answer the 2nd question in Exercise 11 of SQL Bolt you need to do these query:
-- 2nd Question
SELECT
role,
COUNT(name)
FROM
employees
GROUP BY
role;
Now you can answer the 3rd question in Exercise 10 and the 1st and 3rd questions in Exercise 11 of SQL Bolt. If you found any difficulties around these exercises, please leave a comment and I will do my best to assist you.
So far, we’ve covered aggregate functions, the GROUP BY
clause, the HAVING
clause, and aliases in SQL. It’s excellent that you have progressed this far in your SQL learning. See you in the next series, Intermediate SQL.
This is part of the Introduction to SQL series, here is the list of Introduction to SQL series.
References
1. SQL Basics Cheat Sheet from DataCamp