Beginner’s Guide to Pivoting Data Frames in R
A step by step tutorial on how to convert a wide data frame to a long one
--
Many-a-times data collection happens in a column-by-column fashion. That means for every new data series we create a new column in our data table. E.g. John Hopkins COVID-19 dataset is built like that. A new column is added for every new day.
This results in very wide data frames. Such wide data frames are generally difficult to analyse. R language’s tidyverse library provides us with a very neat method to pivot our data frame from a wide format to a long one. Let’s take a look at a few examples.
Basic Pivot Longer
pivot_longer()
makes datasets longer by increasing the number of rows and decreasing the number of columns. To illustrate the most basic use of pivot_longer function we generate a dummy dataset using tribble() method.
Income Data Country-wise
This dummy dataset contains a country’s wealth distribution. Each row corresponds to a single country. It contains country’s name, and the percentage of people in one of the five wealth categories.
Now this is a wide format let’s convert it into a long format. In the long format we will have only 3 columns
- Country name
- Income category
- Percentage of people in that category
income_data <- dummy_data_1 %>%
pivot_longer(-c(Country), names_to = "income", values_to = "percentage")
Points to be noted
- dummy_data_1 is the input data (created by using tribble method)
- income_data is the output data frame
- %>% is the pipe operator. Basically, anything that comes after the pipe is applied to anything that comes before it. This article explains how piping works in R