Many (to many) relations among the stars

A tutorial on an intermediate SQL topic

Karen Warmbein
The Startup
6 min readJun 13, 2020

--

Image credit: CESAR @ ESA

In the mystical moist night-air, and from time to time, Look’d up in perfect silence at the stars.” — Walt Whitman

Recently, I came across a personal journal where I noted the objects I observed in the night sky with my telescope. Most of the entries are from high school and college summer breaks when I could stay up later at night and (somewhat) sleep in the next day. I’d start every summer by observing the late Spring constellations like Hercules with his Great Globular Cluster. Sometime around the middle of July, I would see a beautiful crescent Moon setting over Lake Michigan. This was also my favorite time to see The Summer Triangle with Lyra and it’s Ring Nebula. Finally, I’d end the summer by looking for the Andromeda Galaxy.

These deep-sky objects (objects like globular clusters and galaxies) occur relative to the same constellations. However, there are some objects (those inside the solar system) like the Moon, planets, and comets that change positions and move relative to the constellations throughout the year. In other words, I can observe the Moon or Jupiter in different constellations such as Taurus in the winter and Leo in the spring.

These observations are a great example of a database with a many-to-many relationship. Again, let me use astronomy for a SQL example. Let’s begin.

Many-to-many relationships

What is a many-to-many relationship in a SQL database? It’s an instance where multiple records in one table are related to multiple records in another table. After a bit of searching, a common example is a database made for a school. The school has many students that can take many classes at a time. At the same time, there isn’t one student per class — but many students per class. (For more on this example, I recommend reading this DZone tutorial.)

I want to use a different example to showcase this concept. Consider a list of constellations you can see in the summer (in the below picture on the left) and a list of objects that can be seen in the summer night sky (on the right). The list of objects are in some constellation from the left list. For example, the Ring Nebula can be observed in the constellation Lyra but the Moon can be observed next to many constellations in the Summer (here, Scorpius and Pisces). You can also see M-80 in Scorpius but M-82 is seen in Ursa Major.

The relationship between items in two lists.

The Problem

How do you store this information in a database? Well, let’s try using one table. Coded up in a table called wrong_table, this looks like the following.

One table to store the information.

But, how can we query this table? Specifically, look at the record for Scorpius and the record for Pisces.

Querying wrong_table.

For Scorpius, M-80 and The Moon occur together as a text field. The Moon also occurs in the Pisces record. Say I want to count how many times The Moon occurs. Let’s try.

SELECT *
FROM wrong_table
WHERE name_obs = 'The Moon';

Result:

Querying wrong_table.

Alright, that’s odd — the result is just the Pisces record. Why? The Moon’s instance in the record for Scorpius is tied to M-80 in that M-80, The Moon are in the same field and not distinct from each other. How can I count occurrences of The Moon if one record is not distinct from M-80?

Ok, let’s try another approach. Say we put each occurrence in its own column, like in wrong_table_2.

A view of wrong_table_2.

There are a few issues with this. Let’s try finding the instances of The Moon.

SELECT *
FROM wrong_table_2
WHERE name_obs_1 = 'The Moon' OR name_obs_2 = 'The Moon';

Result:

Querying wrong_table_2.

Ok, now let’s try to count the instances… but when we do, the columns used in the WHERE line can theoretically grow (countably) infinitely long. Meaning, I can add more observations to this table for the Scorpius record.

What a mess!

The solution

The solution to this problem is clever — use three tables in a database: one for each data entity and one for the relationship which holds the connection between the two tables.

The solution to this issue — three tables.

A coded example

Let’s code this up. I created three tables in my database: sum_consts_obs, sum_obs, and rel_obs.

  • The sum_consts_obs (short for summer constellations observations) table has three columns, id, const_name, main_stars that has an identification number, constellation name, and the number of main stars in the constellation, respectively. There are seven constellations listed in this table.
The sum_conts_obs table.
  • The sum_obs (short for summer observations) table also has three columns, id, ds_name, and app_mag which are the table’s id of the object, the name of the object, and the apparent magnitude of the object (that is, how bright the object is from Earth). There are five objects listed in this table.
The sum_obs table.
  • The rel_obs (shortened from the relationship between the observations) table has two columns, id_sum_const and id_ds. This is a table that stores the relationship between the two tables above. id_sum_const is the identification number of the Summer constellation in sum_consts_obs and id_ds is the id of the object in the sum_obs table. There are six relationships in this table.
The rel_obs table.

Now that we have a dataset, let’s see if we can solve this issue. In my article Joining Constellations with SQL, I blogged about the basics of joining tables. There, I only wrote about joining two tables together. But here I need to join three. In order to do that I just need to use two joins to connect the first table to the second and the second table to the third. I use the ID’s in each table to join on.

SELECT const_name
, ds_name
FROM sum_consts_obs AS c
INNER JOIN rel_obs AS r ON c.id = r.id_sum_const
INNER JOIN sum_obs AS o ON r.id_ds = o.id;

(Technical note: I use an INNER JOIN to just get the matches, not all the const_names in the first table.)

Result:

The solution view.

Notice the ON part of the JOIN clause matched the keys in the first and third tables with those in the middle (second) table. We can now count the number of times The Moon appears.

SELECT o.ds_name
, COUNT(o.ds_name)
FROM sum_consts_obs AS c
INNER JOIN rel_obs AS r ON c.id = r.id_sum_const
INNER JOIN sum_obs AS o ON r.id_ds = o.id
WHERE ds_name = 'The Moon'
GROUP BY o.ds_name;

Result:

Finally!

There it is folks — a tutorial on many-to-many relationships in a database, one more thing to add to your SQL toolbox.

For your reference, I have a series of SQL tutorials that cover basic and intermediate queries. Take a look!

Like these SQL posts? Have a question? Let’s chat in the comments!

--

--