Beginner’s Guide to Pivoting Data Frames in R

A step by step tutorial on how to convert a wide data frame to a long one

Rishi Sidhu
The CodeHub
Published in
8 min readApr 21, 2020

--

Photo by Jeff Sheldon on Unsplash

Many-a-times data collection happens in a column-by-column fashion. That means for every new data series we create a new column in our data table. E.g. John Hopkins COVID-19 dataset is built like that. A new column is added for every new day.

1 column for every day of data.

This results in very wide data frames. Such wide data frames are generally difficult to analyse. R language’s tidyverse library provides us with a very neat method to pivot our data frame from a wide format to a long one. Let’s take a look at a few examples.

Basic Pivot Longer

pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns. To illustrate the most basic use of pivot_longer function we generate a dummy dataset using tribble() method.

Income Data Country-wise

This dummy dataset contains a country’s wealth distribution. Each row corresponds to a single country. It contains country’s name, and the percentage of people in one of the five wealth categories.

dummy_data_1

Now this is a wide format let’s convert it into a long format. In the long format we will have only 3 columns

  1. Country name
  2. Income category
  3. Percentage of people in that category
income_data <- dummy_data_1 %>%
pivot_longer(-c(Country), names_to = "income", values_to = "percentage")

Points to be noted

  • dummy_data_1 is the input data (created by using tribble method)
  • income_data is the output data frame
  • %>% is the pipe operator. Basically, anything that comes after the pipe is applied to anything that comes before it. This article explains how piping works in R

--

--

Rishi Sidhu
The CodeHub

Blockchain | Machine Learning | Product Management