SQL for Data Analytics-2

Nilay Chauhan
Data Stash
Published in
4 min readJul 18, 2020

What I have noticed, when you search for interview questions for Data Science you will see two things the most of times — Statistics and SQL. So according to me, everyone who is planning to move forward in this field should have good knowledge of SQL. And for those who are planning to go for Data Engineering SQL is the most important prerequisite.

In the previous blog, we saw some basic SQL functions, In this blog, we will see SQL JOINS.

So now, let’s continue our blog series SQL for Data Analytics and see what are SQL JOINs?

Before going into that, we need to understand the concept of Primary Key(PK) and Foreign Key(FK).

Primary Key(PK)

It is a unique column in a particular table, which uniquely identifies each record in a table and a table can have only ONE primary key.

Generally, it is the first column of our table and it must contain UNIQUE values, and cannot contain NULL values.

Foreign Key(FK)

It is a column in one table that is the primary key in another table.

It is used to link two tables together.

(Source: Udacity)

By looking at the above you can see that id from the accounts table is the primary key in the accounts table and primary key in orders and web_events table.

SQL JOINs

Before seeing what is SQL JOINs let’s think about why we need JOINs ??

See when you see big databases all the information are not stored in the single table. There are many tables in which all the information is stored, for example, refer to the image above, all the data is stored in multiple tables and to access data from multiple tables in a single query we need SQL JOINs. (SQL JOINs are not the only way to access data from multiple tables in a single query).

JOINs are useful for allowing us to pull data from multiple tables.

They are both simple and powerful all the same time.

As now we have added JOINs to our toolkit, we will also be adding the ON statement. ON to specify a JOIN condition which is a logical statement to combine the table in FROM and JOIN statement.

SELECT t1.*

FROM table_one as t1

JOIN table_two as t2

ON t1.id = t2.t1_id;

The SELECT statement indicates which columns of data you'd like to see in the result. The FROM statement indicates the first table from where we are pulling data, The JOIN statement indicates the table we would like to join. The ON clause specifies the columns on which user would like to merge the two tables together.

Types of SQL JOINs

There are different types of JOINs in SQL:

Inner JOINs

Inner JOINs always returns rows only if they match across two tables.

SELECT *
FROM t1
INNER JOIN t2
ON t1.column_name = t2.column_name;

LEFT JOIN(or LEFT OUTER JOIN)

LEFT JOINs always returns all the rows from the left side and the matched rows across two tables. There is a NULL result from the right side if there is no match.

SELECT *
FROM t1
LEFT JOIN t2
ON t1.column_name = t2.column_name;

RIGHT JOIN(or RIGHT OUTER JOIN)

RIGHT JOINs always returns all the rows from the RIGHT side and the matched rows across two tables. There is a NULL result from the left side if there is no match.

SELECT *
FROM t1
RIGHT JOIN t2
ON t1.column_name = t2.column_name;

FULL OUTER JOIN

FULL OUTER JOIN returns rows when there is a match in one of the tables.

SELECT *
FROM t1
FULL OUTER JOIN t2
ON t.column_name = t2.column_name
WHERE condition;

This was the second blog of my SQL FOR DATA ANALYTICS series, In the next part, I will be writing about SQL AGGREGATE FUNCTIONS.

THANK YOU!!

BE SAFE!!

If you are also learning Data Science and if you want to write blogs in my publication you can DM me at LinkedIn/ Instagram or you can mail me.

You can connect with me at:

https://www.linkedin.com/in/nilay-chauhan-2756b4153/

https://www.instagram.com/nilay.codes/

--

--