The real power of SQL is JOINs

Dominic Imbuga
DevCNairobi
Published in
4 min readJul 14, 2021

The real power of SQL comes from working with data across multiple tables at once. The term relational database reffers to the fact that tables within it realate to one another. They contain common identifiers that allow information from multiple table to be easily combined.

Keeping all companies data from purchasing transactions to employee job satisfaction to inventory in a single Excell dataset doesn’t make aton of sense.The table would hold a ton of information and it will be hard to determine row colum structure for so many different kind of data.

Databases give us the flexibility to keep things organized neatly in their own tables, So they are easy to find and work with , while also allowing us to combine tables as needed to solve problems that require several types of data.

Our goal is to use SQL’s data analysis superpower to answer tricky businesses questions). Now you are about to see why SQL is one of the most popular environments for working with data as we learn how to write joins

We'll be using Termux! This way we can practice to write SQL queries anywhere using Android smartphones, don't let that long ride bore you, performs some data science ! You can find Termux on Google store and more information on how to install Postgres and git on Termux , and all the cools things Termux can do , head over here https://wiki.termux.com . Please feel free to reach out if you having issues setting it up.

Setup

When in Termux, create a directory

mkdir kiwanda && cd Kiwanda

I have already created a repository on gitlab that has a data set name kiwanda.sql we will use postgres ,so go ahead and clone the repo

git clone https://gitlab.com/Domminique/kiwanda.git

To use analysis kiwanda.sql dataset we need PostgreSQL with our kiwanda.sql.Before we can use PostgreSQL we need to install it in Termux of course. Head here https://wiki.termux.com/wiki/Postgresql to do it

Make sure your Postgress is running since PostgreSQL uses a client/server model. A PostgreSQL session should be running.

Create a new database with

$ psql createdb kiwanda

Then copy the kiwanda.sql dataset we cloned from gitlab to that database for our analysis. Note the path of the SQL file location

$ psql kiwanda < /data/data/com.termux/files/home/kiwanda/kiwanda.sql

psql kiwanda tells PostgreSQL we are using the kiwanda database and the kiwanda=# means our database it ready for queries and \qt list tables in the database

If you are here, you are set 🚀

To understand what joins and why they are helpful, lets think about Kiwanda's orders table.

Looking back at this table , you will notice that non of the orders say the name of the customer. Instead the table reffers to customers by numerical values in the account_id colum. We will need to join another table inorder to connect this data to names.

Lets look at the Entity Relationship Diagram of Kiwanda database

There are reasons someone might have made the decision to separate orders from the information about the customer placing those orders, two of the most important ones reasons are,

Orders and accounts are two different kinds of objects and would be easy to organize if kept separate.

This multiple-table structure allows queries to execute more quickly

JOINs will make your queries alittle more complicated . Let's first break down all the parts of a JOIN and to see what they do before you write some of your own.

We will start by an INNER JOIN then we will move into the others later. In order to write a JOIN the first thing we will need is a SELECT and FROM clauses just like any other query.

In this case, we are looking to add account name to each order, so we start with the orders table. The next thing we need is a JOIN clause, you can think of this as a second FROM clause, it identifys the table where the data we want to join lives. In this case account name lives in the account table. Finally we need to specify the relationship between the two tables . This happens by writing a logical statement in the ON clause.

You can see the results contain orders and the accounts that placed them. The tricky part about JOINs is that they are means to do other types of analysis.

Combining data isn’t an end into itself , It’s a tool that allows us to filter or aggregate with an expanded set of information. That means queries involving JOINs can get pretty complicated . In order to get the exact results we are after we need to be careful about exactly how we filter our data.

Congratulations for reading to the end ! You have skilled up

--

--