Aggregation in SQL

Duyen
Analytics Vidhya
Published in
10 min readMay 18, 2021

--

A right question is also an answer.

Aggregation is the process of performing calculations over a set of data, normally to gain its statistical figures. Splitting data into groups is a necessary step if you want to aggregate data by certain criteria, not the whole data set. This article talks about AGGREGATE FUNCTIONS and WINDOW FUNCTIONS, two main approaches to aggregate data in SQL, through an explanation section followed by a practical section with the Pokemon dataset. All technical points are based on PostgreSQL engine.

Photo by Lia Panidara on Unsplash

Ok we have two methods here. So when should we use which?

AGGREGATE FUNCTIONS + GROUP BY

Let’s begin with the combination of aggregate functions and group by. Here is a basic example:

SELECT col1, col2, aggregate_function(colx)
FROM table
GROUP BY col1, col2

With this simple query, the database engine scans the table, fetches only relevant columns col1, col2, colx, splits rows into groups of (col1, col2) and then aggregate colx on each group. A select statement with group by clause therefore can return only grouping columns and aggregating columns. It means that if there is a column col4 in the table, we cannot show it raw. All we can do is aggregating its values. (It’s SQL standard, but please note that it also depends on the database engine. For example, MySQL…

--

--