Touring the Solar System with SQL

Another tutorial on basic SQL Queries

Karen Warmbein
The Startup
4 min readMay 9, 2020

--

When it comes to retrieving a variety of (structured) data, SQL is universal. It does not discriminate against the subject of the data. It is simply a tool to use to discover more information and insights.

Take my post, Black holes, planets, and SQL. Check it out here. In it, I covered some very basic commands SQL uses to query data, specifically, data about supermassive black holes, a stellar-mass black hole, and planets in the Solar System.

I want to continue from the last blog post with examples from astronomy. Specifically, I want to make SQL queries against a table that has facts about the 20 largest objects in the Solar System. (I define “largest” as the width, or the diameter, of the objects). I won’t show the entire table filled with data, but I will leave discovering this knowledge to exploration through queries.

I built solar_system_20, a relational database table with the following columns:

  • num — the number that identifies the object
  • ss_name — the name of the Solar System object
  • diameter — the equatorial width of the object (in miles)
  • classification — the type of object (like star or planet)
  • ss_location — the location of the object in the Solar System

I’ll query this table throughout this post.

Let’s check this and discover the first two items in this table.

SELECT *
FROM solar_system_20
LIMIT 2;

Result:

Nice — a planet (Jupiter) and our star, the Sun.

Through intuition and knowledge of the Solar System, the Sun will be the largest object in the table. Let’s check this!

SELECT ss_name, diameter
FROM solar_system_20
ORDER BY diameter DESC
LIMIT 1;

Result:

Indeed, the Sun is the largest object listed.

Adjust the previous query to find the smallest object in the table. Answer?

SELECT ss_name, diameter
FROM solar_system_20
ORDER BY diameter
LIMIT 1;

Result:

The result is Rhea — a moon orbiting Saturn.

It’s interesting that we don’t need to specifically state that the diameters should be ordered in ascending order. We can, though, and if we do the information is the same.

SELECT ss_name, diameter
FROM solar_system_20
ORDER BY diameter ASC
LIMIT 1;

Let’s get on with a new topic: Aggregate Functions. These functions provide summary statistics (like the average, count, and mode) for a group of rows in a table. Here are some aggregate functions that I’ve run across:

  • AVG()
  • SUM()
  • MAX()
  • MIN()
  • COUNT()

For a first example, let’s count the number of rows in this table. Notice that the aggregate function, COUNT(), goes after the SELECT, and the group of rows we are applying this function to is all rows in the table.

SELECT COUNT(*)
FROM solar_system_20;

Result:

There are 20 rows in this table.

To make sure each row has a solar system object name:

SELECT COUNT(ss_name)
FROM solar_system_20;
Yup — 20 names.

We can use aggregate functions to find the statistical range of the diameters.

SELECT MAX(diameter) — MIN(diameter) AS range
FROM solar_system_20;

Result:

A very large range.

There are two new parts to this query. First, we subtracted two aggregate functions. Second, we used AS to name the resulting column range. This is called aliasing in SQL.

Finally, let’s find the objects that are larger than the average size. This task can be broken up into two tasks. First to find the average,

SELECT AVG(diameter)
FROM solar_system_20;

then to find the objects larger than the average.

SELECT *
FROM solar_system_20
WHERE diameter > 56817.3;

But SQL has a method that we can use to combine these two steps. We can write a subquery after the WHERE function like so.

SELECT *
FROM solar_system_20
WHERE diameter > (SELECT AVG(diameter)
FROM solar_system_20);

Result:

Only 3 of the 20 objects are larger than the average size.

Aggregate functions, subqueries, and aliases — three more topics to consider when writing SQL queries. Have any questions or thoughts? Let’s chat in the comments!

--

--