Aggregate functions using Group By, Over and Partition By clause in SQL

Karthik Joshi S
4 min readMay 17, 2020

--

An aggregate function performs a calculation on a set of values and returns a single value. These are often used with the GROUP BY clause of the SELECT statement.

Let us understand the following queries with an example in this article:

  1. How to use an aggregate function?
  2. How group by clause works?
  3. How to apply aggregate functions without using group by clause?

For ease of our understanding let us consider the below entity as an example:

customer entity

The customer entity has the above four attributes where id is the primary key and dept_id is the foreign key reference from the department entity.

  1. Aggregate function

Let us try to get the count of all departments along with their name.

Get count of departments for a customer (P.S. This query won’t work will explain the reason below)

Here count is an aggregate function that combines all the departments for a customer. If we run the above query we will end up with the following error

aggregate function without group by

Problem:

  • As we are trying to get the count of all departments along with their name the above error makes perfect sense. We need to aggregate the rows by another field here, in this case, it is dept_name.
  • Aggregate functions can be used on their own or in conjunction with the GROUP BY clause.
  • When it is used without group by clause the aggregate operation is applied across the entire table.
  • As we are trying to get the count of dept along with their name it is intended for us to group values with department name.

Solution:

So we need to mention the field to aggregate by, which is done using Group By clause.

2.Group By with an aggregate function

Count with the group by (working query)

By executing the above query, all the rows will be aggregated by department name and the output will look like below where department A, B are having 4 and 3 customers respectively.

group by result

3. Aggregate functions without group by:

Let us consider a use case where we need to get the latest reported date for each department. There is an attribute called reported_date in the above customer entity, for which we need to print the customer name, department, and the latest reported date. Here the latest reported date should be common across each department.

Using the MAX aggregate function:

In order to solve the above query we will use MAX aggregate function as below:

MAX function to get the latest reported date

Problem:

Although the above query executes without any errors it doesn’t serve the expected result.

Max query using group by

As you can see there are different date(s) for the same department in each row/tuple. Our expected result was to have the latest reported date common across the same department. (First 3 rows should have the same date and the last 4 rows should have a common date)

Solution:

Use Over and Partition by clause without applying group by clause on aggregate functions

To use the OVER and PARTITION BY clauses, you simply need to specify the column that you want to partition your aggregated results by. This is best explained in the below query.

Over and Partition by clause
Result of Over and Partition by clause

Understanding Partition by and Over clause:

  • Over allows you to get aggregate information without using a GROUP BY. In other words, you can retrieve all the rows, and get aggregate data alongside.
  • The Partition By clause used with Over clause divides the result set into partitions and returns the name, department, and the latest reported date (using MAX here) across all the departments in the same row. This does mean we will have repeated data but it does suit the use case where we don’t want to lose any row unlike group by.
  • The Partition By clause does not reduce the number of rows returned whereas group by does.

Note:

The above queries used in our example will work for almost all RDBMS.(MySQL,Postgres,Oracle…)

Commonly used aggregate functions are:

AVG(expression)Calculate the average of the expression.

COUNT(expression)Count occurrences of non-null values returned by the expression.

COUNT(*)Counts all rows in the specified table.

MIN(expression)Finds the minimum expression value.

MAX(expression)Finds the maximum expression value.

SUM(expression)Calculate the sum of the expression.

Thank you.

--

--