How to use SQL subqueries to substantially increase the types of problems you can solve

Dominic Imbuga
DevCNairobi
Published in
5 min readAug 2, 2021

How do I substantially increase the volume of data I can comfortably work with? How do I increase the problems I can solve ? I had just realized that, a major part of machine learning is data.

It was so important that I spend more time on exploring data.Data is the most important part of Machine learning. If I have no data, then machine learning cannot help me, so before I get myselfself elbow-deep in machine learning, it was important for me to set myself for success

At this point on my machine learning journey, I had learnt all the basics on SQL tools to work creatively with data, I understood database structure and generally knew how to access information. I could even do some simple aggregation.

However there were few things left to learn that would make me able to solve any problem I'm confronted with and this would set me along the path to becoming a skilled practitioner of the art .

In this article ,let us happily explore and enjoy the journey as I share what I learnt, we are going to learn to query from the results of another query. This allowed me to build more transformation into my query , organize work more easily and even make my query run faster !

Subqueries also known as inner queries or nested queries which is a tool for performing operations in multiple steps. This tool will allow us to answer more complex questions.

Now that we are getting into more complex queries using subqueries the Termux postgres terminal workspace for executing our SQL maybe alittle crumped .If you have not already been writing your code in an editor, its great time to start .

In my workflow in termux, I have two sessions running,one with vim file open with subquery.sql extension which I can edit the queries then copy paste the query for evaluation In the second Termux session that has the postgre running.

Cheers to the creators of Termux and Vim , it is an awesome tools for developers and Engineers alike to leverage on their mobile phones to do some cool stuff like this series of articles you are reading on SQL Data Analysis.

Kiwanda is not a real company, We’ve fabricated it and all the data for the sake of this article.The questions we will answer using Kiwanda are meant to simulate real world problems,To get your environment set go here https://link.medium.com/1c3YbDqDoib, we had done that .

Let us put our marketing manager hats on, we would like to know which channel sends the most traffic per day on average to Kiwanda , which is tricky because in order to do this we will have to aggregate events by channel, by day then we need to take those and average them.

Let's break down that example,

First we start by quering the underline table to make sure the data makes sense for what we are trying to do

Next we will count up all the events in each channel in each day.

NOTE: I'm using vim to write, edit , copy and paste to postgreSQL .The file is saved as subquery.sql that is why my vim has highlighted my SQL code.

The last step is that we want to average across the events colum we have created, in order to do that we would want to query against the results from this query.

We can do that by wrapping our query in a parentheses and using it in our FROM cluase in the next query that we write. As you can now see it is a query inside a query in other words a subquery !

Subqueries are required to have aliases, which are added after the parentheses the same way you will add an alias to a table. Here we are just selecting all the data from the subquery.

Let us go the last mile and average events for each channel, since the subquery acts like a one table in the FROM clause , we will put a GROUP BY cluase after the subquery .

Since we are now reordering based on this new aggregation, we no longer need the ORDER BY statement in the subquery, so let us take that out to keep things clean.

Just so we are clear on how things are really happening here, let us breakdown on how this new query runs. First your inner query will run, it may sound like a no brainer, but it is important !

The inner query must actually run on its own as the database will treat it as an independent query. Once the inner query is complete, the rest of the query also known as the outer query will run across the results set created by the inner query.

Some SQL editor actually allows you to highlight and run the inner query and do some edits before you run the outer query. This is a great when you want say to edit the inner query, quickly see its output if it is correct before running the outer query again. This is a good practice because it saves on resources.

Subqueries can be used in several places within a query, they can really be used in anywhere you might use a table name or even a column name or an individual value.

They are especially useful in conditional logic in conjunction with WHERE or JOIN clauses or in the WHEN potion of a CASE statement. Writing subqueries allows you to substantially increase the types of problems you can solve and the volume of data you can comfortably work with in SQL

Congratulations for reading to the end !

--

--