Introduction to SQL (Part 3)

Faisal Afif
Data Science Indonesia
5 min readOct 16, 2023

A beginner guide for learning SQL from scratch

Photo by Austin Distel on Unsplash

In this section, we will explore aggregate functions in SQL that allow us to summarize data on a selected column. These functions can be used to answer questions like:

“How many customers churned this week?”

“What is the total revenue for each region?”

“What is the most favorite pizza?”

The functions themselves are similar to aggregate functions in spreadsheet or any other analytics program. These are some common aggregate functions that will be covered in this section.

Figure 1. Aggregate function in SQL (Image by author)

Each aggregate function is going to run on the whole set of records and return a single value when there is no specific grouping. Aggregate functions are often used with the GROUP BY clause, which allows us to group our data based on a specific column. We will also learn about aliases, which will make aggregate function results easier to read and process. Let’s get started.

1. Aggregate Functions

COUNT Function

The COUNT function is a fundamental aggregation function that allows you to count the number of records in a table or the number of records that meet specific criteria. These are some common queries for COUNT function.

-- Count all records in a table, null values are included
SELECT
COUNT(*)
FROM
table_1;

-- Count all records in column_1, non null values
SELECT
COUNT(column_1)
FROM
table_1;

-- Count all distinct values in column_2, non null values
-- Alias the result as count_value
SELECT
COUNT(DISTINCT column_2) AS count_value
FROM
table_1;

MIN and MAX Function

The MIN function returns the lowest numerical value or earliest date in a specific column. On the other hand MAX function does the opposite, it returns the highest numerical value or latest date in a specific column.

-- Find the minimum value in a specific column, non null values
-- Alias the result as min_value
SELECT
MIN(column_1) AS min_value
FROM
table_1;

-- Find the maximum value in a specific column, non null values
-- Alias the result as max_value
SELECT
MAX(column_1) AS max_value
FROM
table_1;

SUM and AVG Function

The SUM function is used to calculate the sum of all numerical values in a specific column and the AVG function is used to calculates the average of numerical values in a specific column.

-- Sum all records in a specific column, non null values
-- Alias the result as total_value
SELECT
SUM(column_1) AS total_value
FROM
table_1;

-- Calculate the average of all records in a specific column, non null values
-- Alias the result as avg_value
SELECT
AVG(column_1) AS total_value
FROM
table_1;

2. GROUP BY and HAVING clause

The GROUP BY clause is an essential part of aggregation. It allows us to group our data based on a specific column.

-- Select customer_id and sum of price colum, then alias the result as total_price
-- Filter the results where product_id = ramen and sum of price column greater than 25
-- Then group the results based on customer_id
SELECT
customer_id,
SUM(price) AS total_price
FROM
sales
WHERE
product_id = 'ramen'
GROUP BY
customer_id
HAVING
SUM(price) > 25;

You’ve probably noticed that the GROUP BY clause is put and executed following the WHERE clause, so you might be wondering how to filter the result after the GROUP BY clause has been applied. After the GROUP BY clause, there is another filter called the HAVING clause that is used specifically to filter grouped records from the result. The HAVING clause is similar to the WHERE clause in terms of operators that can be applied and how it is executed. You can learn more about the WHERE clause in the previous section, Introduction to SQL (Part 2).

3. ALIASES

The expressions used in queries can get long and hard to read as queries become more complex. In order to make query and its result clearer and easier to read, we can use alias to assign a new name for the long and complex expression in the query. In the previous queries, we’ve already used AS to assign a new name for the selected column, for example SUM(price) AS total_price. In the next series of Intermediate SQL we will learn how to apply alias for joining tables, creating subqueries and creating common table expressions (CTE).

Let’s solve some exercises from SQL Bolt.

Figure 2. Exercise 10 in SQL Bolt

To answer the 1st and 2nd questions in Exercise 10 of SQL Bolt you need to do these queries:

-- 1st Question
SELECT
MAX(years_employed) AS longest_period
FROM
employees;

-- 2nd Question
SELECT
role,
AVG(years_employed) AS avg_years_employed
FROM
employees
GROUP BY
role;
Figure 3. Exercise 11 in SQL Bolt

To answer the 2nd question in Exercise 11 of SQL Bolt you need to do these query:

-- 2nd Question
SELECT
role,
COUNT(name)
FROM
employees
GROUP BY
role;

Now you can answer the 3rd question in Exercise 10 and the 1st and 3rd questions in Exercise 11 of SQL Bolt. If you found any difficulties around these exercises, please leave a comment and I will do my best to assist you.

So far, we’ve covered aggregate functions, the GROUP BY clause, the HAVING clause, and aliases in SQL. It’s excellent that you have progressed this far in your SQL learning. See you in the next series, Intermediate SQL.

--

--