“Having” Clause and “Group by” Clause in SQL
Data without SQL is like Bread and Butter without Bread. SQL is invaluable when we talk about data. SQL stands for Structured Query Language, which lets you access and manipulate databases.
The Group By clause comes in very handy in SQL, especially as it lets you segregate data into groups and gives more insight into a set of data—the GROUP BY statement, groups rows that have the same values into summary rows. For instance, it would “find the names of customers” and group them by their countries.
- GROUP BY clause is used with the SELECT statement.
- In the query, the GROUP BY clause is placed after the WHERE clause.
- In the query, the GROUP BY clause is placed before the ORDER BY clause if used.
- In the query, the Group BY clause is placed before the Having clause.
The GROUP BY statement is often used with aggregate functions; COUNT(), MAX(), MIN(), SUM(), and AVG() to group the result set by one or more columns. An aggregate function performs a calculation on a set of values and returns a single value. Except for, count(), aggregate functions ignore null values. Aggregate functions are often used with the GROUP BY clause of the SELECT statement.
Deterministic functions always return the same result any time they’re called with a specific set of input values and given the same state of the database. Nondeterministic functions may return different results each time they’re called with a specific set of input values even if the database state that they access remains the same.
For example,
AVG()
always returns the same result given the qualifications stated above, but theGETDATE()
function, which returns the current datetime value, always returns a different result.Also,
abs(X)
will always return the same answer, as far as X is the same. However, a nondeterministic functionrandom()
will give random results each time it is applied.
All aggregate functions are deterministic, i.e. return the same value each time that they are called, when called with a specific set of input values.
A deterministic function always gives the same answer when it has the same inputs. Most built-in SQL functions in SQLite are deterministic. For example, the abs(X) function always returns the same answer as long as its input X is the same. Non-deterministic functions might give different answers on each invocation, even if the arguments are always the same.
This is a sample of the logic for a group by statement.
SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
ORDER BY column1, column2;
--function_name: Name of the function used for example COUNT(),AVG(),SUM(),MIN(),MAX().
--table_name: Name of the table.
--condition: Condition used.
--column1 or column2: Name of column, eg country
To give a basic example, here is a table before using the group by Clause
SELECT COUNT(Country) as number_of_persons, Country
FROM People
GROUP BY Country
ORDER BY Country;
You can clearly see, how the group by statement summarises the table: people, by their countries. However, if we apply an aggregation directly on the group by Clause, we will get an error, indicating that aggregate expressions are not allowed or are illegal in the GROUP BY clause.
This could not be quite confusing, as we said earlier, aggregate functions can be used with the group by Clause. Yes! That is true. An aggregate expression applies an aggregate function across the rows or groups of rows selected by a query. When we use the count() with the group by clause, it returns the number of rows in each group. This includes NULL
values and duplicates. But using count() without group by clause returns the cardinality (i.e. number of rows) in the resultset. This includes rows comprised of all-NULL values and duplicates.
However, we cannot apply an aggregate function or expression directly on the group by clause in this format: group by count(Country)
.This is wrong and will return an error.
However, we use aggregate functions as expressions only in the following situations:
- The select list of a SELECT statement (either a subquery or an outer query).
- A HAVING clause.
The Having Clause
In an instance where you want to get data for People, with a minimum age of 30, using the aggregate function min() will result in an error because it doesn’t work with the where keyword, and cannot be in the group by Clause. This is why the Having clause, was added to SQL. Below, is the syntax for the Having clause.
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
To understand more clearly, below is an example of using the having clause
SELECT COUNT(Country) as number_of_persons, Country
FROM People
GROUP BY Country
Having min(Age) = 31
ORDER BY Country;
You can see clearly, that the number_of_persons in Nigeria decreased to 2, after applying the filter min(Age) = 31.
Sources: learn.microsoft.com, w3schools