R Function of the Week: Pipe + Mutate

Statistics Without Borders
4 min readMar 22, 2022

--

Chaining Functions with the Pipe Character and Mutate

This week at SWB, we are highlighting the underappreciated pipe character and mutate functions. While this is a well-known method for intermediate R users, it can be a game changer for those just starting out with the language. The pipe character, represented as “%>%” in R is a powerful way to chain processes and skip intermediary steps that often make data processing tedious. It makes code 1000% easier to read as each transformation flows seamlessly into the next.

Fun fact: The chaining technique actually originates from mathematics. For example, if f is a function of g(x), it can be written as f(g(x)) and the resolved value of g(x) can be passed to f.

The mutate function is used to “mutate” a dataframe by adding new variables. The function on its own expects a dataframe, column name, and column definition as an argument. When combined with the pipe character, a dataframe does not explicitly need to be specified as an input. Keeping this in mind, let’s venture into some code.

Both the mutate and pipe character function can be used together via the dplyr package. Let’s say we have a dataframe called df defined as:

df <- ### create a dataframe with individual attributes

data.frame(person_name = c('Chandler', 'Joey', 'Monica', 'Pheobe', 'Rachel', 'Ross'),

hair_color = c('Brunette', 'Brunette', 'Brunette', 'Blonde', 'Blonde', 'Brunette'),

apartment_color = c('White', 'White', 'Yellow', 'Purple', 'Purple', 'Brown'),

no_of_jobs = c('2', '1', '4', '2', '3', '1'))

Dataframe “df”

Where “no_of_jobs” indicates the number of careers they have had.

Now, let’s say we are only interested in individuals with “colorful” apartments and want to create a “colorful apartment” flag, then use this flag to filter out “White” apartments. The traditional way to do this would be using the following steps:

> df$colorful_flag <- ifelse(df$apartment_color == 'White', 0, 1)  ## Create flag   and resave> df <- df[which(df$colorful_flag==1),] ## filter out 0s and resave> dfperson_name hair_color apartment_color no_of_jobs colorful flag3      Monica   Brunette          Yellow          4             14      Pheobe     Blonde          Purple          2             15      Rachel     Blonde          Purple          3             16        Ross   Brunette           Brown          1             1

This works but when you have to run several of these simple data processing tasks, it can get messy really quickly. This is where the %>% and mutate() function come in and can essentially turn this into a one line operation like so:

Library(dplyr)df %>% mutate(colorful_flag = ifelse(apartment_color=='White', 0, 1)) %>% filter(colorful_flag==1)person_name hair_color apartment_color no_of_jobs colorful_flag1      Monica   Brunette          Yellow          4             12      Pheobe     Blonde          Purple          2             13      Rachel     Blonde          Purple          3             14        Ross   Brunette           Brown          1             1

Voila, same results, twice as fast! The most convenient part of all of this is that my original dataframe “df” has remained as is but I have been able to slice and dice it. Notice how the code does not include variable reassignment but a chaining of functions instead. Thus, preserving the original value of the dataframe.

Let’s break this down further. The value of “df” is passed to mutate as is through the “%>%” character. The mutate function adds a new column to the dataframe by taking the name of the column and column definition as an input:

mutate(colorful_flag = ifelse(apartment_color=='White', 0, 1))

The results of the mutate function are then passed as is to the filter function via the “%>%”:

%>% filter(colorful_flag==1)

Since this is the last function in the chain, the results of this final step are then printed in the console. If we wanted to do any further manipulations, we could do say by adding another pipe character. Let’s say we’re interested in grouping by colorful apartments, knowing the average job changes of people in colorful apartments vs. white apartments, inserting these averages into a column at the end, and filtering by brunettes. We can do so by chaining these individual calls together via the pipe character:

> df %>% mutate(colorful_flag = ifelse(apartment_color=='White', 0, 1)) %>% group_by(colorful_flag) %>% mutate(avg_jobs = mean(as.numeric(no_of_jobs))) %>% filter(hair_color == 'Brunette')# A tibble: 4 x 6# Groups:   colorful_flag [2]person_name hair_color apartment_color no_of_jobs colorful_flag avg_jobs<chr>       <chr>      <chr>           <chr>              <dbl>    <dbl>1 Chandler    Brunette   White           2                      0      1.52 Joey        Brunette   White           1                      0      1.53 Monica      Brunette   Yellow          4                      1      2.54 Ross        Brunette   Brown           1                      1      2.5

If you’re still with me, you may have noticed the result is a tibble and not a dataframe. This can easily be remedied by adding yet another pipe character followed by the as.data.frame() function.

>df %>% mutate(colorful_flag = ifelse(apartment_color=='White', 0, 1)) %>% group_by(colorful_flag) %>% mutate(avg_jobs = mean(as.numeric(no_of_jobs))) %>% filter(hair_color == 'Brunette') %>% as.data.frame()person_name hair_color apartment_color no_of_jobs colorful_flag avg_jobs1    Chandler   Brunette           White          2             0      1.52        Joey   Brunette           White          1             0      1.53      Monica   Brunette          Yellow          4             1      2.54        Ross   Brunette           Brown          1             1      2.5

Lastly, to save the results all we have to do is add “df <- “ before the “df %>%” call like so:

## Save filtered and transformed dfdf <- df %>% mutate(colorful_flag = ifelse(apartment_color=='White', 0, 1)) %>% group_by(colorful_flag) %>% mutate(avg_jobs = mean(as.numeric(no_of_jobs))) %>% filter(hair_color == 'Brunette') %>% as.data.frame()

There you have it, a dataframe showing the average career changes of people with colorful apartments vs. non colorful apartments.

Want to learn more about Statistics Without Borders? Check out our website or follow us on Twitter and LinkedIn.

Already a volunteer and interested in contributing to this blog? Reach out to SWB Marketing Communications at statisticswithoutborders@gmail.com.

--

--

Statistics Without Borders

Statistics Without Borders (SWB) is an apolitical probono organization under the auspices of the American Statistical Association.