SQL 101: Union vs Union All

Sean Coyne
4 min readMar 16, 2020

--

photo credit wocintechchat.com

A common operator found in SQL reports and analytics is the UNION operator. UNION (and UNION ALL) are used to combine the results of two or more SELECT statements.

UNION

Union allows you to combine the results of two (or more) select statements into a single result data set. Union removes any duplicates in the two queries. Imagine taking the results of query 1, the results of query 2, combining them into a data set and then taking DISTINCT records from this set.

Each of the select statements must have the same number of fields, and the fields across the two select statements must have compatible data types. To summarize the union operation combines two more queries and removes the duplicates.

sql union diagram
SQL Union, Duplicates Removed

For example, let’s take the the tables Actresses and Musicians and write a UNION statement.

two tables to be used in the sql union statement, actresses and musicians
Actresses and Musicians tables
SELECT
actress_name AS artist
FROM actress
UNIONSELECT
musician_name AS artist
FROM musicians

The output of the above will be all of the actresses and all of the musicians, with duplicates removed. Take note that Jennifer Lopez only appears once.

results of the sql union statement
Results of the SQL UNION statement

UNION ALL

Union All allows you to combine the results of two (or more) select statements into a single result data set. Union All DOES NOT remove any duplicates in the two queries. Imagine taking the results of query 1, the results of query 2, combining them into a data set, if a record existed in both query 1 and query two it would occur twice in the result set.

sql union diagram
SQL Union ALL, Duplicates Remain

For example, let’s take the same two tables Actresses and Musicians and write a UNION ALL statement.

two tables to be used in the sql union statement, actresses and musicians
SELECT
actress_name AS artist
FROM actress
UNION ALLSELECT
musician_name AS artist
FROM musicians

The output of the above will be all of the actresses and all of the musicians, but instead of removing duplicates, duplicates remain. Take note that Jennifer Lopez only two times in the result set when using UNION ALL.

Results of the SQL UNION ALL statement
Results of the SQL UNION ALL statement

The Difference between UNION and UNION ALL

  • UNION removes duplicate rows
  • UNION ALL does not remove duplicate rows

Practical Examples

What should you do if you have two queries where the result data sets have a different number of columns? Since UNION requires the two queries to have the same number of columns, you can’t just combine the two results. You can use NULL or a hard coded value to populate the missing columns and ensure that both queries have the same number of columns.

Actresses and Musicians Tables
Actresses and Musicians Tables

We can use the below query to combine actresses and the number of films they have performed in and musicians and the number of films they have performed in.

SELECT
actress_name AS artist,
number_of_films
FROM actresses
UNIONSELECT
musician_name AS artist,
0 AS number_of_films
FROM musicians

One issue we are going to run into is Jennifer Lopez. Because UNION removes duplicates you might assume that we will only see JLo once in our result set, but because UNION removes duplicates across all columns we will see two records for JLo, one with a film count of 38, and one with a film count of 0.

We can solve this issue by adding a little logic on to our query as seen below. By adding an outer query, grouping by artist and summing the number_of_films we can reduce the two JLo records down to one.

SELECT
artist,
SUM(number_of_films) AS number_of_films
FROM (
SELECT
actress_name AS artist,
number_of_films
FROM actresses
UNION SELECT
musician_name AS artist,
0 AS number_of_films
FROM musicians
)
GROUP BY artist

Tip: be careful when using the UNION operation on the same table as it can cause the query to read the same table multiple time. Instead of UNION consider refactoring the query iso that all conditions are placed in one SELECT instruction, or using an OUTER JOIN instead of UNION.

I have also covered all of the above in this helpful video tutorial:

You can crush the data engineering interview by learning the tips and tricks in my book, Ace The Data Engineering Interview, on kindle and paperback, and my free companion app on iOS.

--

--

Sean Coyne

Author of "Ace the Data Engineer Interview", Director Software Engineering at GoodRx, Adjunct Professor at USD