Splitting Text in R

Trevor French
Trevor French
Published in
2 min readOct 18, 2022

If you’ve worked in a spreadsheet application before, you’re likely familiar with the “text-to-columns” tool. This tool allows you to split one column of data into multiple columns based on a delimiter. This same functionality is also achievable in R through functions such as the “separate” function from the “tidyr” library.

To test this function out, let’s first require the “tidyr” library and then create a test dataframe for us to use.

library(tidyr)
df <- data.frame(person = c("John_Doe", "Jane_Doe"))

We now have a dataframe with one column which contains a first name and a last name combined by an underscore. Let’s now split the two names into their own separate columns.

df <- df %>% separate(person, c("first_name", "last_name"), "_")

Let’s break down what just happened. We first declared that “df” was going to be equal to the output of the function that followed by typing “df <-”. Next we told the separate function that it would be altering the existing dataframe called “df” by typing “df %>%”.

We then gave the separate function three arguments. The first argument was the column we were going to be editing, “person”. The second argument was the names of our two new columns, “first_name” and “last_name”. Finally, the third argument was our desired delimiter, “_”.

--

--

Trevor French
Trevor French

I am an Analytics Manager in the Crypto industry with an M.S. in Data Analytics and a B.S. in Business Analytics. I talk about R, Python, and Data Science.