Beginner’s Guide to Using Joins in R

Join merges two tables into a single one.

Rishi Sidhu
The CodeHub

--

Photo by Bryson Hammer on Unsplash

Data in the real world exists as a collection of distinct information. More often than not this information is organised as tables. Each table collecting data on a different aspect of the ecosystem. For example, a school might organise its data into the following tables

  1. Teacher information
  2. Student attendance
  3. Student marks
  4. Salary information, etc…

We can clearly see that tables 2 and 3 will have something in common (student IDs for one), just like tables 1 and 4 (teacher IDs for instance).

Such overlap of information across tables is quite common and becomes the basis for a join. A join lets you combine information from different tables. In R we have 6 types of joins. Let’s explore each of them with an example. But first let’s look at the data we are going to be working with.

Dummy Data

We will visit the world of Hogwarts and pick up some data.

library(dplyr)
citizen_df <- tribble
(
~Person, ~Citizenship,
"Harry", "UK",
"Harry", "USA",
"Ron", "India",
"Ron", "Pakistan",
"Hermoine", "UK",
"Hermoine", "USA",
"Hermoine", "Russia",
"Dumbeldore"…

--

--

Rishi Sidhu
The CodeHub

Blockchain | Machine Learning | Product Management