How to use SQL Aggregations to add value to raw data

Dominic Imbuga
DevCNairobi
Published in
5 min readJul 22, 2021

Imagining myself as a marketing manager at Kiwanda, I wanted to compare Facebook as a marketing channel against all other channels.I knew Facebook is a great channel for my business, but is it better than all the rest combined ? How would I use SQL tool to answer this question?

So far on my journey, I used JOINs to explore the data, and I shared In our last article (https://link.medium.com/IC4qug5w6hb) , as we looked at how to leverage SQL to link tables together. You saw why SQL is one of the most popular environments for working with data as we learned how to write JOINs.

Along the way, I found myself using raw level outputs for the early exploratory work , when searching my database to better understand the data and JOINs were handy , as I got a sense for how the data looks , and begin looking answers to my questions, aggregates became more useful.

Raw data is overwhelming and less valuable than aggregated data, fortunately, databases are great at aggregating data and the SQL language was easy to learn. I’m familiar with Excell ,and found out that SQL commands are similar ! how convenient.

  • COUNT — counts how many rows are in a particular colum
  • SUM adds together all the values in a particular colum
  • MIN and MAX returns the lowest and highest values in a particular colum

These function operates down colum not across a rows. So I could do things like sum up all quantities of paper ever delivered at Kiwanda, Instead of getting a results set many thousands of rows long, I just got one line with the answer !

Kiwanda is not a real company we fabricated it in our first article to help us answers some business questions with data, check it out https://link.medium.com/IC4qug5w6hb

Let us get our hands dirty while we help Kiwanda answer some of its tricky questions. We will look at the following aggregation’s salient features.

  • NULL
  • SUM
  • GROUP BY
  • DATE Functions

We are using Termux and PostgreSQL. Check out our previous article on getting your environment ready here https://link.medium.com/qlS3FPfx6hb

Null

Before we dive deep into SQL aggregations, let’s take a look at the concept of NULLs , NULL means no data, it is different from 0 or space and there is a good reason. This is a very crucial concept guys, because from a business perspective, for Kiwanda a 0 means that no paper was sold, which could mean a sell was attempted but not made, a NULL could mean no sell was even attempted.

This is a very meaningful difference. So it's important to know what NULLs are and how to work with them to gain meaningful insights from data.

Scenario

Imagining yourself as a Sales manager at Kiwanda, you might want want to know all the accounts for which the person of contact (primary_poc) is NULL, if you don’t have a person of contact , chances are you are not going to be able to keep that customer for much longer. Right? Let’s find these accounts !

SELECT id , name , website, sales_rep_id

FROM accounts

WHERE primary_poc IS NULL;

Rows for NULL values in primary_poc returned

Note: We have used is NULL instead of = NULL because NULL is not a value but is the property the data.

Date function.

I noticed that date are abit hard to work with, aggregating by date fields in particular does not work in a practical way, it treats each time stamp as unique , when it will be more practical to round to the nearest day, week or month and aggregated across that period.

The good news though, there are plenty of special functions to make date easy to work with.

When storing dates , different format exists , like for those who live in the United States you probably used to seeing dates formated as mm/dd/yy or similar month first format. which is non-conventional compared to the rest of the world standard because the rest of the world the dates are formated as dd/mm/yy. This is not necessary better or worse it just different.

Different date and time formats in the world

Databases on the other side do it differently, in a yet another interesting way, ordering from least to most granular part of the date, yy/mm/dd. This is a very specific utility because it makes it so that the date sorted alphabetically are also in chronological order, in other words, date ordering is same whether you think dates or bits of text.

The year first way that databases store dates is idea for sorting the way we want to retrieve this information in the future.Whether we want the most recent or oldest information, day first and month first date formats are sort funny ways that don't make any sense if you think about it.

The benefit I love the most about this date format is that, dates can easily be truncated inorder to group them for analysis.

Scenario

Lets explore this date functions with our Kiwanda data. On what day of the week are the most sales made ?

SELECT DATE_PART('dow', occurred_at) AS day_of_week,

SUM(total) AS total_qty

FROM orders

GROUP BY 1

ORDER BY 2 DESC;

Looks like the most paper was ordered on a Sunday

Note: dow stands for the day of the week and returns a value from 0 to 6 , where 0 is Sunday and 6 is Saturday.

Now,

Imagining yourself as marketing manager at Kiwanda you want to compare Facebook as a marketing channel against all other channels.You know Facebook is a great channel for your business, but is it better than all the rest combined ? How would use SQL to answer this question?

--

--