Member-only story

Beginner’s Guide to Using Joins in R

Join merges two tables into a single one.

Rishi Sidhu
The CodeHub
6 min readJun 2, 2020

--

Photo by Bryson Hammer on Unsplash

Data in the real world exists as a collection of distinct information. More often than not this information is organised as tables. Each table collecting data on a different aspect of the ecosystem. For example, a school might organise its data into the following tables

  1. Teacher information
  2. Student attendance
  3. Student marks
  4. Salary information, etc…

We can clearly see that tables 2 and 3 will have something in common (student IDs for one), just like tables 1 and 4 (teacher IDs for instance).

Such overlap of information across tables is quite common and becomes the basis for a join. A join lets you combine information from different tables. In R we have 6 types of joins. Let’s explore each of them with an example. But first let’s look at the data we are going to be working with.

Dummy Data

We will visit the world of Hogwarts and pick up some data.

library(dplyr)
citizen_df <- tribble
(
~Person, ~Citizenship,
"Harry", "UK",
"Harry", "USA",
"Ron", "India",
"Ron", "Pakistan",
"Hermoine", "UK",
"Hermoine", "USA",
"Hermoine", "Russia",
"Dumbeldore"…

--

--

The CodeHub
The CodeHub

Published in The CodeHub

A place to start loving the coding life. Begin and enhance your coding life. We publish articles on the beauty of coding in a variety of languages like Python, R, Javascript etc.

Rishi Sidhu
Rishi Sidhu

Written by Rishi Sidhu

Blockchain | Machine Learning | Product Management

No responses yet