The purrr Package: A Conductor’s Baton for the Tidyverse Orchestra in R

Numbers around us
Numbers around us
Published in
11 min readApr 13, 2023

The purrr package is a vital player in the tidyverse, an ecosystem of R packages designed to streamline data analysis tasks. As the Swiss Army knife of functional programming in R, purrr provides a versatile toolkit for working with data structures, especially lists and data frames. By simplifying complex operations, it brings clarity and elegance to your code, enabling you to manipulate, transform, and summarize data with ease. Picture purrr as the conductor of an orchestra, harmonizing the different sections of the tidyverse to create a beautiful symphony of data analysis. In this article, we’ll delve into the intricacies of purrr and discover how it can help you harness the full potential of R in your data analysis journey.

Understanding purrr: The Functional Programming Paradigm

To fully appreciate the purrr package, it’s essential to understand the functional programming paradigm, which serves as the foundation of purrr’s capabilities.

Functional programming is a programming approach that treats computation as the evaluation of mathematical functions while avoiding changing state and mutable data. This style of programming emphasizes the use of pure functions, which are functions that, given the same input, will always produce the same output without any side effects. Picture functional programming like a composer, who brings together various instruments, playing their individual parts in perfect harmony, to create a unified and elegant piece of music.

Functional programming offers several benefits when working with R, particularly for data analysis. Some of these advantages include:

  1. Readability: Functional programming promotes writing clean and modular code, making it easier for others (and yourself) to understand and maintain the code. Think of it as a well-organized musical score, with each section clearly marked and easy to follow.
  2. Reusability: Pure functions can be easily reused across different parts of your code, as they don’t rely on any external state. This reduces the need to write repetitive code and allows you to create a library of versatile functions, much like a conductor reusing musical motifs throughout a symphony.
  3. Ease of debugging: By minimizing the use of mutable data and global state, functional programming reduces the likelihood of unexpected bugs, making the code more predictable and easier to debug. It’s akin to a conductor being able to isolate and resolve any discordant notes within the orchestra.
  4. Parallel processing: The absence of side effects in functional programming allows for more efficient parallel processing, enabling you to harness the full power of modern multi-core processors. It’s like having multiple conductors working in perfect sync, seamlessly leading the orchestra in harmony.

The purrr package is designed to work seamlessly with R’s functional programming capabilities. One of its key strengths lies in its ability to apply functions to elements within data structures, such as lists and data frames. The package offers a range of “map” functions that allow you to elegantly iterate over these structures, transforming and manipulating the data as needed. This powerful feature of purrr serves as the conductor’s baton, guiding the flow of your data analysis and helping you create a harmonious and efficient workflow.

In the following sections, we will explore purrr’s key functions and demonstrate how they can help you streamline your data analysis process in R.

A Closer Look at purrr’s Key Functions

Now that we have a solid understanding of the functional programming paradigm, let’s dive into some of the key functions that the purrr package offers. These functions, like a conductor’s hand gestures, guide the flow of data through various operations, ensuring an efficient and harmonious analysis.

map() and its variants: Turning a caterpillar of code into a butterfly

The map() function is the cornerstone of the purrr package, allowing you to apply a function to each element of a list or vector. This versatile function can simplify your code by replacing cumbersome for loops and lapply() calls with a more concise and expressive syntax. The map() function comes in several variants, each tailored to return a specific type of output, such as map_lgl() for logical, map_chr() for character, and map_dbl() for double values. This flexibility enables you to transform your code into a more elegant and streamlined form, much like a caterpillar metamorphosing into a beautiful butterfly.

pmap(): Mastering multiple inputs like juggling balls

The pmap() function is designed to handle multiple input lists or vectors, iterating over them in parallel and applying a specified function. This powerful function allows you to juggle multiple inputs effortlessly, enabling complex data manipulation and transformation with ease. Like a skilled juggler, pmap() keeps all the input “balls” in the air, ensuring that they’re processed and combined as intended.

keep() and discard(): Handpicking data like sorting apples

When you need to filter data based on specific criteria, purrr’s keep() and discard() functions come to the rescue. keep() retains elements that meet a given condition, while discard() removes elements that meet the condition. These functions let you handpick data elements as if you were sorting apples, keeping the good ones and discarding the bad. With their intuitive syntax and functional programming approach, keep() and discard() make data filtering a breeze.

reduce(): Folding data like origami

The reduce() function in purrr allows you to successively apply a function to elements of a list or vector, effectively “folding” the data like an intricate piece of origami. This function is particularly useful when you need to aggregate data or combine elements in a specific manner. By iteratively applying a specified function, reduce() skillfully folds your data into the desired shape or form.

safely(): Handling errors gracefully like a trapeze artist

In data analysis, errors and unexpected situations can arise. The safely() function in purrr enables you to handle these scenarios with grace and poise, much like a trapeze artist performing a complex routine. safely() takes a function as input and returns a new function that, when applied, captures any errors and returns them as part of the output, rather than halting the execution. This allows you to identify and address errors without disrupting the flow of your analysis.

These key functions, along with many others in the purrr package, provide a powerful toolkit for efficient and harmonious data analysis in R. In the next sections, we’ll explore how to apply these functions to real-life data analysis tasks and demonstrate their practical applications.

Applying purrr to Real-Life Data Analysis Tasks

Now that we’ve explored the key functions of the purrrpackage, let’s examine how they can be applied to real-life data analysis tasks. By integrating purrr into your workflow, you can master the art of data analysis like a skilled conductor, guiding the flow of data through various operations and producing harmonious results.

Data transformation: Cleaning up a messy room

Data transformation is an essential step in the data analysis process, as real-world data can often be messy and unstructured. Using purrr’s map() functions, you can easily apply cleaning and transformation operations to your data, much like tidying up a cluttered room. For example, you might use map_chr() to extract specific information from text strings, or map_dbl() to convert data types within a data frame. By applying these functions iteratively, you can transform and reshape your data into a more structured and usable format.

Data aggregation: Assembling a puzzle

In many cases, you’ll need to aggregate data from multiple sources or perform complex calculations to derive insights. The reduce() function in purrr allows you to combine data elements like puzzle pieces, iteratively applying a function to merge or aggregate data as needed. Whether you’re summing up values, calculating averages, or performing custom aggregations, reduce() can help you assemble the data puzzle and reveal the bigger picture.

Data summarization: Condensing a novel into a short story

Data summarization is the process of distilling large amounts of information into concise, meaningful insights. Using purrr’s functional programming approach, you can create custom summary functions that extract relevant information from your data, much like condensing a novel into a short story. By chaining together map() functions with other tidyverse tools, such as dplyr’s summarize() and mutate() functions, you can generate insightful summaries that highlight the most important aspects of your data.

Iterative operations: Unraveling the threads of data

Many data analysis tasks require performing iterative operations, such as running simulations, fitting models, or processing data in chunks. With purrr’s pmap() function, you can effortlessly juggle multiple inputs and apply functions across them in parallel. This enables you to unravel the threads of data, revealing patterns and relationships that might otherwise remain hidden. Additionally, by combining purrr’s functions with other R tools, such as parallel processing packages or machine learning libraries, you can further enhance the efficiency and power of your iterative operations.

In summary, purrr’s functional programming capabilities enable you to tackle a wide range of data analysis tasks with elegance and efficiency. By integrating purrr into your workflow, you can master the art of data analysis, conducting your data orchestra in perfect harmony.

Case Study: Building Models and Creating Visualizations with purrr and Nested Data

In R we usually have many function vectorized which mean that for example they can be used on column of dataframe without using loop, apply or map. Purrr’s map functions can of course be used to apply vectorized functions, but is too easy. Let me show you something little bit harder and showing more of purrr’s capabilities.

In this case study, we will demonstrate how to use purrr with nested data to build multiple models and create custom visualizations.

Introducing the dataset: A collection of diverse species

Imagine we have a dataset containing measurements of various iris species, including sepal length, sepal width, petal length, and petal width, as well as the species classification. Our goal is to create separate linear regression models for each species to predict petal length based on petal width and visualize the results.

Data preparation: Nesting the data like a matryoshka doll

To begin, we need to split the dataset by species and create a nested data frame. We can use dplyr’s group_by() and tidyr’s nest() functions for this task:

library(tidyverse)
nested_data <- iris %>%
group_by(Species) %>%
nest()

This results in a nested data frame with one row for each species and a nested column containing the data for each group.

Model building: Crafting artisanal models for each species

Now that our data is nested, we can use purrr’s map() function to fit a linear regression model for each species:

model_data <- nested_data %>%
mutate(models = map(data, ~lm(Petal.Length ~ Petal.Width, data = .x)))

This adds a new column to our data frame containing the fitted linear regression models for each species.

Model evaluation: Extracting golden nuggets of insight

To evaluate the performance of our models, we can use purrr to calculate performance metrics, such as the R-squared value, for each model:

model_data <- model_data %>%
mutate(r_squared = map_dbl(models, ~summary(.x)$r.squared))

This adds a new column to our data frame with the R-squared value for each species’ model, allowing us to compare their performance.

Visualization: Painting a vibrant portrait of species-specific trends

Finally, we can use purrr and ggplot2 to create custom visualizations for each species. We will create a scatter plot of petal length versus petal width, along with the fitted regression line for each species:

plot_data <- model_data %>%
mutate(plots = map2(data, models, ~{
ggplot(.x, aes(x = Petal.Width, y = Petal.Length)) +
geom_point() +
geom_abline(aes(intercept = coef(.y)[1], slope = coef(.y)[2]), color = 'blue') +
ggtitle(paste('Species:', unique(Species))) +
theme_minimal()
}))

This adds a new column to our data frame containing a ggplot object for each species. To display the plots, we can use purrr’s walk() function:

walk(plot_data$plots, print)

This will display the three separate scatter plots with the fitted regression lines, allowing us to visually compare the relationships between petal length and petal width for each iris species.

In conclusion, this case study demonstrates how the purrr package can be used to work with nested data, fit multiple models, and create custom visualizations. By leveraging the power of purrr and functional programming, you can efficiently analyze complex datasets and gain valuable insights.

Tips and Best Practices for Using purrr

When incorporating the purrr package into your R programming toolbox, following these tips and best practices can help you make the most of its powerful functional programming capabilities:

  1. Embrace functional programming principles: By familiarizing yourself with the core concepts of functional programming, such as immutability, function purity, and higher-order functions, you’ll be better equipped to leverage purrr’s full potential.
  2. Combine purrr with other tidyverse packages: The purrr package is designed to work seamlessly with other tidyverse packages, such as dplyr, tidyr, and ggplot2. By integrating purrr into your existing tidyverse workflow, you can streamline your data analysis and visualization processes.
  3. Use the appropriate map() variants: The map() function in purrr has several variants tailored for specific output types, such as map_lgl(), map_chr(), and map_dbl(). Make sure to choose the appropriate variant for your use case to simplify your code and improve its readability.
  4. Opt for purrr functions over for loops and *apply() functions: Purrr functions are often more concise, expressive, and efficient than for loops and base R *apply() functions. Whenever possible, use purrr functions to iterate over lists or vectors and apply functions to their elements.
  5. Make use of pmap() for parallel iteration: When working with multiple input lists or vectors, use the pmap() function to iterate over them in parallel. This can simplify your code and make it easier to read and understand.
  6. Utilize safely() and other adverbs for error handling: The safely() function, along with other purrr adverbs like quietly() and possibly(), can help you handle errors and unexpected situations gracefully. Use these functions to create robust, error-resistant code.
  7. Experiment with list columns and nested data frames: Purrr excels at working with complex data structures, such as list columns and nested data frames. By nesting your data and using purrr functions, you can perform sophisticated analysis and manipulation tasks with ease.
  8. Familiarize yourself with purrr’s other functions: Beyond its core functions, purrr offers a wide array of additional tools for working with functions, lists, and other data structures. Explore the purrr documentation to discover functions like reduce(), pluck(), and flatten() that can help you further streamline your data analysis workflow.
  9. Test your code with smaller datasets: When working with large or complex datasets, it’s a good idea to test your purrr functions on smaller subsets of data first. This can help you identify errors and fine-tune your code before scaling up to the full dataset.
  10. Share your knowledge: Finally, as you gain expertise with the purrr package, share your knowledge and insights with others. By discussing your experiences and collaborating with fellow R users, you can contribute to the vibrant R community and help others make the most of purrr’s powerful functional programming capabilities.

In conclusion, the purrr package is a powerful and versatile tool that can significantly enhance your data analysis capabilities in R. By embracing the functional programming paradigm and integrating purrr with other tidyverse packages, you can create elegant, efficient, and expressive code to tackle a wide array of data analysis tasks.

The case studies presented in this article demonstrate how purrr can be used to work with nested data, fit multiple models, and create custom visualizations, among other applications. By following the tips and best practices outlined, you can make the most of purrr’s features and streamline your data analysis workflow.

Ultimately, incorporating the purrr package into your R programming toolbox will empower you to conduct your data orchestra with finesse, producing harmonious results and extracting valuable insights from even the most complex datasets. As you continue to explore and master purrr, remember to share your knowledge and experiences with the broader R community, contributing to the collective understanding and appreciation of this powerful tool.

Special gift at the end.

--

--

Numbers around us
Numbers around us

Self developed analyst. BI Developer, R programmer. Delivers what you need, not what you asked for.