Teaching the tidyverse to R novices
This semester I am running my R workshops once again, and as always I start by teaching people the packagers of the tidyverse
. As part of Endangered Data Week, I am teaching two workshops introducing beginner R programmers to data tidying/manipulation and data visualization.
I’ve taken this approach to using the tidyverse
instead of base R for two primary reasons. First, learning how to manipulate data with dplyr
and tidyr
is easy to understand conceptually and often easier than learning the idiosyncrasies of R. When I show students two lines of code that achieve the same thing in base R and dplyr
, I’ve always gotten the same answer: the dplyr
way is much easier to read and understand.
I’m not alone in my approach here — David Robinson has made the same case in regard to ggplot2
. My rationale largely follows his: that teaching students the basics of the tidyverse
means they can be up and running with a powerful set of tools quickly. In the case of Endangered Data Week, that means introducing students to messy government data, tidying that data, working with data to produce new data, and drawing conclusions. I’m able to teach these concepts relatively quickly thanks to the power behind dplyr
and tidyr
. I don’t need to worry about teaching the syntax around [[]]
or $
or c()
. If students need base R techniques or have questions, they can always get in touch with me for more pointers.
For our data manipulation exercises in our workshop, we work off an RMarkdown worksheet together during the session. I provide them with some population data I compiled for a project I worked on last year and we work through most of the functions available in dplyr
and tidyr
— and if we don’t get through it all, that’s fine; they have the worksheet to complete on their own time. (I make teaching these workshops a little easier for myself by also installing RStudio Server and the necessary packages on Digital Ocean so we can be up and running quickly.)
Second, students can be up and running with a good amount of knowledge about R, data manipulation, and visualization in a relatively short amount of time. After an hour-and-a-half together, even students who haven’t programmed previously are learning to work with the language. The grammar of data tidying allows these concepts to be grasped quickly since each step builds upon the previous one. Chaining together a series of tidyverse
functions allows the students to see the steps necessary to reshape, clean, and explore a dataset. And those skills can be applied to any dataset, meaning students can take what they learn and use them towards other projects or classes. Likewise, I prize tidyverse
methods for their consistency. I’ve seen some wild ways people have accessed or manipulated columns in a data frame (just spend some time on Stack Overflow), but anytime I read someone’s tidyverse
example the process clicks faster. That consistency, again, makes using, finding answers, and learning the language that much easier.
This isn’t to say I don’t teach any base R — even in the above workshops, students still learn about sum()
, slice()
, logical operators, and other base methods. But pairing some of the base R methods with the tidyverse
makes for a powerful set of tools that can have students manipulating and visualizing data quickly.
This approach of teaching tidyverse
with an interactive worksheet has worked well — students are up and running with R and applying new skills quickly. My goal is to help people to work with data, and the tidyverse
provides a powerful way to get started.