Joining Constellations with SQL
A tutorial on intermediate SQL Queries
“In the mystical moist night-air, and from time to time, Look’d up in perfect silence at the stars.” — Walt Whitman
When was the last time you looked up at the twilight sky? Ever count the stars as they become visible? One, two, six… suddenly there are too many to count. It’s easy to start looking for patterns in the stars. You might have found common asterisms like the Big and Little Dippers or the huge Summer Triangle. Maybe you’ve recognized some constellations like Cassiopeia, which appears like a giant ‘W’ in the sky.
I could write a lot about the constellations and objects that you can see in the night sky, but I digress. This post is about SQL queries — specifically different JOIN
statements. Let’s begin.
The Data Tables
Instead of querying one table, we are going to be using two tables — circ_consts
and sum_consts
. The table circ_consts
has information for five constellations that are in the circumpolar region of the sky. It has two columns:
const_name
— the name of the circumpolar constellationarea
— which represents the area of the sky that the boundaries of the constellations enclose, in degrees
Using a SELECT
query, we can view the table:
The other table, sum_consts
, has information for six constellations that can be seen after sunset in the (Nothern Hemisphere’s) summer. It also has two columns:
const_name
— the name of the summer constellationmain_stars
— how many main stars make up the constellation. These are (usually) the brightest stars in the constellation and are connected in images of the stick figures.
Again, using a SELECT
query, we can quickly view the table:
According to W3Schools, a JOIN clause is used to combine rows from two (or more tables), based on a related column between them. Looking at the two tables each has a column namedconst_name
, and there are some names that are the same. This is the related column between the tables. I want to note that in practice, tables that we want to combine DO NOT NECESSARILY have the same column names. In fact, they rarely do. In this case, we need to rely on a map called a database table schema. If that isn’t available, look for a data dictionary or someone with experience working in the database.
Full Outer Join
Consider the following. Say we want to join both tables, and all the data in them, into one table. We use a FULL OUTER JOIN
for this.
SELECT *
FROM circ_consts AS c
FULL OUTER JOIN sum_consts AS s ON c.const_name = s.const_name;
Result:
See the blank spaces in the image above? The blank spaces represent NULL values. A NULL value is different from a zero value or a field that contains spaces. Here, it literally means that the field was missing when the two tables were combined.
A lot of folks learn SQL JOIN statement’s with Venn Diagrams. So, for historical purposes, I will add Venn Diagrams. First, look at a diagram with the entire sets — both tables. The circle on the left represents circ_consts
(the five circumpolar constellations) and the circle on the right represents the table with six summer constellations, sum_consts
. The two constellations in the middle are shared constellations, that is, they are both circumpolar constellations and constellations you can view in the summer months.
This is the Venn Diagram for the FULL OUTER JOIN
. Notice how both circles are shaded?
Inner Join
Next up: the Inner Join. We want to address those two constellations that appear in both tables — Cassiopeia and Ursa Major. Here is the diagram. We want the information and data about the constellations highlighted in the middle.
The SQL query is:
SELECT *
FROM circ_consts AS c
INNER JOIN sum_consts AS s ON c.const_name = s.const_name;
Result:
The join merged two tables with common constellation names. Notice that we have repeating information const_name
? We can refine the query and eliminate the duplicate information by specifying which columns we want in this table view.
SELECT c.const_name, area, main_stars
FROM circ_consts AS c
INNER JOIN sum_consts AS s ON c.const_name = s.const_name;
Result:
Left Outer Join
The last join I want to introduce to you is the LEFT OUTER JOIN
. This join returns all the rows of the table on the left side of the join — also referred to as the left table — and matching rows for the table on the right side of the join. In terms of our Venn Diagrams, this is the area of the left circle (including the middle common area).
So, what should we expect from this query? Three things:
- All the information from the left table,
circ_consts
, which includes information from theconst_name
andarea
. - The common constellations will have three points of data:
area
,main_stars
, andconst_name
. - The rows for which there is no matching row on the right side will contain NULL. These are the constellations in
circ_consts
.
Let’s see this in action.
SELECT c.const_name, area, main_stars
FROM circ_consts AS c
LEFT OUTER JOIN sum_consts AS s ON c.const_name = s.const_name;
Result:
Again, the blank spaces are NULL values.
Full outer join, inner join, and left outer join — I hope I’ve demystified these concepts or, at least, added some examples to your toolset.
For your reference, I have a series of basic SQL tutorials. Take a look!
- Black holes, planets, and SQL
- Touring the Solar System with SQL
- Searching for moons with SQL
- Classifying black holes with SQL
As always, ask questions and post comments below. Let’s chat!