Joins — You Need to Know Them.

You may not like them but they’ll always be around for you.

Peter Hui
Peter Hui
Sep 1 · 4 min read
Image for post
Image for post
Photo by Laura Ockel on Unsplash

here ever you go, data is every where. Look at your car dashboard. It is a dashboard of data. It tells you your speed, mileage and gas in a few simple visuals. Go to a cash register and it tells you how much you have to pay. Go to work and you may be confronted with spreadsheets. Data is everywhere and to appreciate and work with them, you need to understand joins. If you are like me and have worked in this industry, the below may sound familiar to you. :)

Person A is asking if dataset can be joined to B, nothing seems to work. Which join should be used?
Person A is asking if dataset can be joined to B, nothing seems to work. Which join should be used?
Free stock images creative commons

Data comes in various sources, and to combine it into something useful requires clear understanding of the various “Joins” between data sets.

For simplicity sake, joins are can be thought of as a collection of common and different information between data sets.

A “connection” is the common key between the data sets. This connection is required among a vast majority of data sets in order to obtain anything insightful between them. The “Name” column between these two data sets acts as a connection between the two. For there to be a “connection”, there has to be a common column or columns between the two data sets.

I hope the below helps to explain this concept a bit further.

Image for post
Image for post
image by author

I won’t get into the technicalities here, but connecting data sets essentially requires you to understand joins and connections. There are a few things you need to worry about when it comes to connecting data sets — but I’ll write about this in another article.

Let’s get into various join examples.

I have two data sets — one, name and job title of employees in 2007 and another a list of employees in 2020. Each of these have their own rows.

Image for post
Image for post
Image by author

I want to answer to answer 3 questions based on the data sets.

  1. Who are still working in this company and what jobs are they doing?
  2. Who are no longer in the company?
  3. Who are new to the company?

Who are still my working in this company and what jobs are they doing?

Inner join can answer this question. Inner join takes the employees in 2007 and combines it with 2020 using their employee name. The connection here is the “Name” column.

Here is a picture of inner join.

Image for post
Image for post
Image by author

The common rows between them in this case are Peter, Jennifer and Alan.

Peter was a manager in 2007, now he is a CEO. Jennifer was a team lead, now she is a divisional lead. Alan is now a intern in a different sector.

Who are no longer in the company?

This is an example of left anti join. The data set is on the left compared to the right. This is why it is called a “left” anti join. We want to know the employees that do not appear in 2020. Here is an example below.

Image for post
Image for post
Image by author

See those X’s? We can’t find Tom or Lisa in the 2020 list. Looks like Tom and Lisa are no longer in with company.

Who are new to the company?

This is an example of a right anti join. It is a “Right” anti join because the data set we are interested in is on the right hand side. It contains Sarah and Abraham who did not work for the company in 2007. Looks like they are new to the company.

Image for post
Image for post

Simple enough right? ☺

There is also the left join — it is the example of the left anti and inner join combined together.

How about the union/append/binding type joins? they are just data sets stacked one on top of another.

In summary,

Inner Join — Common rows between data sets

Anti Join — Row difference between data sets

Left Join — Common and row difference between data sets together in one view includes only one side

Full Join — Common and row difference between data sets together in one view includes both left and right hand side

Union Join — data sets stacked one on top of another.

TL:DR ? I put this together for you.

Image for post
Image for post
Image by author

In the next article, we will examine some simple ways of doing these joins on Power BI.

Hope you like this article!

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Peter Hui

Written by

Peter Hui

I love working with data. HR and Psychology background. Microsoft certified data analyst. Power BI and R are my favorite tools. Cleaning and wrangling? YES.

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Peter Hui

Written by

Peter Hui

I love working with data. HR and Psychology background. Microsoft certified data analyst. Power BI and R are my favorite tools. Cleaning and wrangling? YES.

The Startup

Medium's largest active publication, followed by +734K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store