Searching for Moons with SQL

Another tutorial on basic SQL Queries

Karen Warmbein
The Startup
5 min readMay 16, 2020

--

Image of the near side of the Moon.

A full moon rising above the horizon is beautiful. This phase is when the Moon appears fully illuminated from Earth’s perspective and occurs because of the geometry of the Earth, Sun, and Moon. (By the way, the next time we can see a full moon is June 5th, 2020!)

But, imagine that we are at a planet that has multiple moons. Take Jupiter — the King of the Planets with a huge red-orange storm in its clouds — it has 79 orbiting moons. Or there’s Saturn, known for its beautiful rings. At the latest count, Saturn has 82 moons. Which ones are part of the 20 largest objects in the Solar System? Let’s see if we can discover this with SQL.

I’m going to make queries against a table that was used in the post Touring the Solar System with SQL (link here). This table has facts about the 20 largest objects in the Solar System. (I define “largest” as the width, or the diameter, of the objects). Again in this post, I won’t show the entire table filled with data, but encourage you to read through this post to explore the table. This mimics a real-world scenario where you have a big dataset to query and you can’t possibly view all the entries.

Here are some facts about the table: I built solar_system_20, a relational database table with the following columns:

  • num — the number that identifies the object
  • ss_name — the unique name of the Solar System object
  • diameter — the equatorial width of the object (in miles)
  • classification — the type of object (like star or planet)
  • ss_location — the location of the object in the Solar System

I’ll query the table throughout this post.

To start off, I am going to run through a quick refresher query. Recall that Aggregate Functions are functions that provide summary statistics (like the average, count, and mode) for a group of rows in a table. As in the previous post, we can cluster all the objects together in one group and find how many distinct objects are in this table.

SELECT COUNT(ss_name) 
FROM solar_system_20;

Result:

20 objects in the table

Are the objects all the same? In other words, what classifications are in this table? To figure this out, let’s use the GROUP BY clause. Here, I’m going to find the classification “bins”, and count the number of records that belong to each group.

SELECT classification, COUNT(*) 
FROM solar_system_20
GROUP BY classification;

Result:

The 20 objects are part of 4 groups — or bins.

In addition, we can use WHERE with GROUP BY and Aggregate Functions. Suppose I want to find the bins that have small diameters, say less than 3,000 miles.

SELECT classification, COUNT(*) 
FROM solar_system_20
WHERE diameter < 3000
GROUP BY classification;

Result:

9 of the 20 objects are small — these are dwarf planets and moons.

What are the names of these 9 objects?

SELECT ss_name 
FROM solar_system_20
WHERE diameter < 3000;

Result:

Behold! The names!

I’m getting a bit off track, here. I initially wanted to know what are the moons in the Solar System that are in this top 20 list. So, let’s determine that. Using the fact we discovered above, that the classification bin name is ‘moon’, we have the following.

SELECT ss_name, diameter, ss_location
FROM solar_system_20
WHERE classification = 'moon';

Result:

Names, diameters (in miles), and locations of the 9 moons listed in the table solar_system_20

I’m going to take another detour and add one more key concept to this post: the HAVING clause. This is used in conjunction with GROUP BY to filter results on the aggregate function. For example, suppose we have a list of 4 bins (classifications) that the 20 objects are grouped in. And, we have the number of objects that are in each group.

SELECT classification, COUNT(*) 
FROM solar_system_20
GROUP BY classification
ORDER BY COUNT(classification);

Result:

1 star + 2 dwarf planets + 8 planets + 9 moons = 20 largest solar system objects.

But, say we wanted to search for the bins (classifications) that had 3 or more objects in them.

SELECT classification, COUNT(*) 
FROM solar_system_20
GROUP BY classification
HAVING COUNT(classification) > 2
ORDER BY COUNT(classification);

Result:

Bins with more than 2 objects.

Finally — we can create a more complex query. Suppose we want a list of the bins with more than 2 objects in each bin, plus some summary statistics about those bins. Also, the diameter of those objects should be less than 10,000 miles.

SELECT classification, COUNT(*), 
MIN(diameter) AS min_diameter,
MAX(diameter) AS max_diameter,
AVG(diameter) AS avg_diameter
FROM solar_system_20
WHERE diameter < 10000
GROUP BY classification
HAVING COUNT(classification) > 2;

Result:

4 planets + 9 moons meet this crazy criterion.

This last query used Aggregate Functions, the WHERE, HAVING and GROUP BY clauses and aliasing — almost everything I have talked about so far!

This is a short post as it only introduces the GROUP BY and HAVING clauses. It also reviews last week's material. I hope you’ve enjoyed these tutorials on SQL; let me know your thoughts in the comments!

--

--