A window to the solar system
Peering into an intermediate SQL topic
Ah, windows — I could look out my childhood bedroom window and see a great sunset in the west, the moon rising in the east, and stars. Lots of stars. Sagittarius, the archer who looked like a teapot, was one of my favorite things to look at. So was Orion and his two dogs.
Today I want to write about a different kind of window — specifically, SQL window functions. A window function calculates an aggregate value based on a group of rows and return multiple rows for each group.
Let’s jump into an example and break it down.
I am going to use a table that I created called solar_system_20
. I’ve queried this table multiple times this past month for various SQL examples. Here’s some metadata about the table:
solar_system_20
is a relational database table with the following columns:
ss_name
— the unique name of the Solar System objectdiameter
— the equatorial width of the object (in miles)classification
— the type of object in the record (like star or planet)ss_location
— the location of the object in the Solar System
Recall, this table has facts about the 20 largest objects in the Solar System. I used solar_sytem_20
in two previous tutorials, Touring the Solar System with SQL and Searching for Moons with SQL, to show hands-on examples of aggregate functions, subqueries, aliases, and the GROUP BY
and HAVING
commands in SQL.
Here’s the query with window functions that we want to understand:
SELECT ss_name
, diameter
, classification
, SUM(diameter) OVER(PARTITION BY classification) AS total_class
, CAST(CAST(diameter AS decimal) / CAST(SUM(diameter)
OVER(PARTITION BY classification) AS decimal) * 100 AS int)
AS pct
FROM solar_system_20
ORDER BY 5 DESC
, ss_name;
Before I show you the full resulting view of this query, let’s break the code down.
- The first few lines are just a SELECT statement that returns the
ss_name
,diameter
, andclassification
of the object. - The next two columns in the new view are aliased as
total_class
andpct
. - The window functions happen first at
SUM(diameter) OVER(PARTITION BY classification)
. This is essentially saying group the classifications and sum the diameters of each group. The first few records of the new view look like this:
(See how the value intotal_class
for the Sun — the only star in solar_system_20
— is the same as its diameter? But, the diameters for two dwarf planets are summed to equal the resultant number in total_class
?)
- Now that we have calculated the total diameter for every class of objects in the table, we can calculate the percent that an objects diameter is to the value in
total_class
. In other words,diameter / total_class * 100%
for each record. In SQL we want something like
diameter / SUM(diameter) OVER(PARTITION BY classification) * 100
- But, there is a problem with this — it only produces values of 0 or 100. Since all the
diameter
s are stored asint
I need to cast everything as a decimal to obtain the values in between 0 and 100. UsingCAST <column_name> AS decimal
, the query takes this form.
CAST(CAST(diameter AS decimal) / CAST(SUM(diameter)
OVER(PARTITION BY classification) AS decimal) * 100 AS int)
Result (first few records):
ORDER BY
has a nice little trick I learned last week. In PostgresSQL, you can refer to each column here by its ordinal number. So, we want the query to order by thepct
value first, then byss_name
.
So, the final view looks like this:
That’s it for this post, folks! Window functions are an important tool to have in your SQL toolbox.
For your reference, I have a series of SQL tutorials that cover basic and intermediate queries. Take a look!
- Black holes, planets, and SQL
- Touring the Solar System with SQL
- Searching for moons with SQL
- Classifying black holes with SQL
- Joining Constellations with SQL
- Solar System classifications with SQL
- Many (to many) relations among the stars
- Creating tables in SQL
Like these SQL posts? Have a question? Let’s chat in the comments!