A window to the solar system

Peering into an intermediate SQL topic

Karen Warmbein
The Startup
4 min readJun 27, 2020

--

Photo by Ihor Malytskyi on Unsplash.

Ah, windows — I could look out my childhood bedroom window and see a great sunset in the west, the moon rising in the east, and stars. Lots of stars. Sagittarius, the archer who looked like a teapot, was one of my favorite things to look at. So was Orion and his two dogs.

Today I want to write about a different kind of window — specifically, SQL window functions. A window function calculates an aggregate value based on a group of rows and return multiple rows for each group.

Let’s jump into an example and break it down.

I am going to use a table that I created called solar_system_20. I’ve queried this table multiple times this past month for various SQL examples. Here’s some metadata about the table:

solar_system_20 is a relational database table with the following columns:

  • ss_name — the unique name of the Solar System object
  • diameter— the equatorial width of the object (in miles)
  • classification — the type of object in the record (like star or planet)
  • ss_location — the location of the object in the Solar System

Recall, this table has facts about the 20 largest objects in the Solar System. I used solar_sytem_20 in two previous tutorials, Touring the Solar System with SQL and Searching for Moons with SQL, to show hands-on examples of aggregate functions, subqueries, aliases, and the GROUP BY and HAVING commands in SQL.

Here’s the query with window functions that we want to understand:

SELECT ss_name
, diameter
, classification
, SUM(diameter) OVER(PARTITION BY classification) AS total_class
, CAST(CAST(diameter AS decimal) / CAST(SUM(diameter)
OVER(PARTITION BY classification) AS decimal) * 100 AS int)
AS pct
FROM solar_system_20
ORDER BY 5 DESC
, ss_name;

Before I show you the full resulting view of this query, let’s break the code down.

  • The first few lines are just a SELECT statement that returns the ss_name, diameter, and classification of the object.
  • The next two columns in the new view are aliased as total_class and pct.
  • The window functions happen first at SUM(diameter) OVER(PARTITION BY classification). This is essentially saying group the classifications and sum the diameters of each group. The first few records of the new view look like this:

(See how the value intotal_class for the Sun — the only star in solar_system_20 — is the same as its diameter? But, the diameters for two dwarf planets are summed to equal the resultant number in total_class?)

  • Now that we have calculated the total diameter for every class of objects in the table, we can calculate the percent that an objects diameter is to the value in total_class. In other words, diameter / total_class * 100% for each record. In SQL we want something like
diameter / SUM(diameter) OVER(PARTITION BY classification) * 100
  • But, there is a problem with this — it only produces values of 0 or 100. Since all the diameters are stored as int I need to cast everything as a decimal to obtain the values in between 0 and 100. Using CAST <column_name> AS decimal, the query takes this form.
CAST(CAST(diameter AS decimal) / CAST(SUM(diameter)       
OVER(PARTITION BY classification) AS decimal) * 100 AS int)

Result (first few records):

  • ORDER BY has a nice little trick I learned last week. In PostgresSQL, you can refer to each column here by its ordinal number. So, we want the query to order by the pct value first, then by ss_name.

So, the final view looks like this:

That’s it for this post, folks! Window functions are an important tool to have in your SQL toolbox.

For your reference, I have a series of SQL tutorials that cover basic and intermediate queries. Take a look!

Like these SQL posts? Have a question? Let’s chat in the comments!

--

--