Statuscode
Published in

Statuscode

The many faces of DISTINCT in PostgreSQL

3 powerful uses of DISTINCT in PostgreSQL.

Duplication is scary (The shining, 1980)

DISTINCT

I created a simple Employee table with name, department and salary using mock data from this site:

haki=# \d employee
Table "public.employee"
Column | Type | Modifiers
------------+-----------------------+-----------
id | integer | not null
name | character varying(30) |
department | character varying(30) |
salary | integer |
Indexes:
"employee_pkey" PRIMARY KEY, btree (id)

haki=# select * from employee limit 5;
id | name | department | salary
----+----------------+----------------------+--------
1 | Carl Frazier | Engineering | 3052
2 | Richard Fox | Product Management | 13449
3 | Carolyn Carter | Engineering | 8366
4 | Benjamin Brown | Business Development | 7386
5 | Diana Fisher | Services | 10419
(5 rows)

What is DISTINCT?

haki=# SELECT DISTINCT department FROM employee;
department
--------------------------
Services
Support
Training
Accounting
Business Development
Marketing
Product Management
Human Resources
Engineering
Sales
Research and Development
Legal
(12 rows)
SELECT department FROM employee GROUP BY department;

DISTINCT ON

A classic job interview question is finding the employee with the highest salary in each department.

SELECT  
*
FROM
employee
WHERE
(department, salary) IN (
SELECT
department,
MAX(salary)
FROM
employee
GROUP BY
department
)
ORDER BY
department;
id | name | department | salary
----+------------------+--------------------------+--------
30 | Sara Roberts | Accounting | 13845
4 | Benjamin Brown | Business Development | 7386
3 | Carolyn Carter | Engineering | 8366
20 | Janet Hall | Human Resources | 2826
14 | Chris Phillips | Legal | 3706
10 | James Cunningham | Legal | 3706
11 | Richard Bradley | Marketing | 11272
2 | Richard Fox | Product Management | 13449
25 | Evelyn Rodriguez | Research and Development | 10628
17 | Benjamin Carter | Sales | 6197
24 | Jessica Elliott | Services | 14542
7 | Bonnie Robertson | Support | 12674
8 | Jean Bailey | Training | 13230
WITH ranked_employees AS (
SELECT
ROW_NUMBER() OVER (
PARTITION BY department ORDER BY salary DESC
) AS rn
,
*
FROM
employee
)
SELECT
*
FROM
ranked_employees
WHERE
rn = 1
ORDER BY
department;
 rn | id |       name       |        department        | salary 
----+----+------------------+--------------------------+--------
1 | 30 | Sara Roberts | Accounting | 13845
1 | 4 | Benjamin Brown | Business Development | 7386
1 | 3 | Carolyn Carter | Engineering | 8366
1 | 20 | Janet Hall | Human Resources | 2826
1 | 14 | Chris Phillips | Legal | 3706
1 | 11 | Richard Bradley | Marketing | 11272
SELECT DISTINCT ON (department)
*
FROM
employee
ORDER BY
department,
salary DESC
;

IS DISTINCT FROM

Comparing values in SQL can result in three outcomes — true, false or unknown:

WITH t AS (
SELECT 1 AS a, 1 AS b UNION ALL
SELECT 1, 2 UNION ALL
SELECT NULL, 1 UNION ALL
SELECT NULL, NULL
)
SELECT
a,
b,
a = b as equal
FROM
t;
a | b | equal
------+------+-------
1 | 1 | t
1 | 2 | f
NULL | 1 | NULL
NULL | NULL | NULL
(a is null and b is null)
or
(a is not null and b is not null and a = b)
  a   |  b   | equal | full_condition
------+------+-------+----------
1 | 1 | t | t
1 | 2 | f | f
NULL | 1 | NULL | f
NULL | NULL | NULL | t
SELECT
a,
b,
a = b as equal,
a IS DISTINCT FROM b AS is_distinct_from
FROM
t;
a | b | equal | is_distinct_from
------+------+-------+------------------
1 | 1 | t | f
1 | 2 | f | t
NULL | 1 | NULL | t
NULL | NULL | NULL | f

How other databases handle this?

  • MySQL — a special operator <=> with similar functionality.
  • Oracle — Provide a function called LNNVL to compare nullable fields (good luck with that…).
  • MSSQL — Couldn’t find a similar function.

ARRAY_AGG (DISTINCT)

ARRAY_AGG was one of the major selling points of PostgreSQL when I was transitioning from Oracle.

SELECT
department,
ARRAY_AGG(name) AS employees
FROM
employee
GROUP BY
department;
department | employees
--------------------------+-------------------------------------
Services | {"Diana Fisher","Jessica Elliott"}
Support | {"Bonnie Robertson"}
Training | {"Jean Bailey"}
Accounting | {"Phillip Reynolds","Sean Franklin"}
Business Development | {"Benjamin Brown","Brian Hayes"}
Marketing | {"Richard Bradley","Arthur Moreno"}
Product Management | {"Richard Fox","Randy Wells"}
Human Resources | {"Janet Hall"}
Engineering | {"Carl Frazier","Carolyn Carter"}
Sales | {"Benjamin Carter"}
Research and Development | {"Donna Reynolds","Ann Boyd"}
Legal | {"James Cunningham","George Hanson"}
SELECT
department,
ARRAY_AGG(DISTINCT salary) AS salaries
FROM
employee
GROUP BY
department;
department | salaries
--------------------------+---------------
Accounting | {11203}
Business Development | {2196,7386}
Engineering | {1542,3052}
Human Resources | {2826}
Legal | {1079,3706}
Marketing | {5740}
Product Management | {9101,13449}
Research and Development | {6451,10628}
Sales | {6197}
Services | {2119}
Support | {12674}
Training | {13230}

How other databases handle this?

  • MySQL — Has a similar function called GROUP_CONCAT.
  • Oracle — Has an aggregate function called ListAgg. It has no support for DISTINCT. Oracle introduced the function in version 11.2 and up until then the world wide web was filled with custom implementations.
  • MsSQL — The closest I found was a function called STUFF that accepts an expression.

Take away

The only take away from this article is that you should always go back to the basics!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store