A dplyr reference for the time-constrained

A no-nonsense approach to get up and running — or remember the basics.

Ponder that for a while. The subject matter and the code. Mostly you’re seeing what genocide looks like in dry statistics on average life expectancy.
 — Jenny Bryan

Scope

This is a quick reference for the most-used (by me at least) functions / verbs provided by the dplyr R package from Hadley Wickham. No join functions are included (you can check Jenny Bryan’s excellent cheatsheet here if you’re interested in those).


Quick summary

  • filter(): subset with row logic, e.g.: filter(Country == Greece), filter(speed < 5) etc.
  • select(): subset specific columns by name i.e. choose which variables to work with, e.g.: select(year), select(Voltage, Current).
  • mutate(): calculate new variables (columns), e.g.: mutate(Power = Voltage * Current), mutate(speed = x / t).
  • arrange(): think of it as sorting (combine with desc() for descending order) and note that it can be done with 1+ variables at the same time, e.g.: arrange(year, continent), arrange(desc(lifeExp)).
  • rename(): as implied, change the variable names for a data frame / tibble, e.g.: rename(new_name = oldName).
  • summarise(): calculate quick-n-dirty statistics, most useful when combine with the next item. E.g.: summarise(min = min(lifeExp), avg = mean(lifeExp), max = max(lifeExp).
  • group_by(): create groups in your data by variable. Saves you from exhausting your loop-fu and repeating yourself, e.g.: mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg) will return the average mile-per-gallon (mpg) for each group defined by the number of cylinders ( cyl).

Tip (& fun fact): in Rstudio, you can quickly type the pipe operator %>% by pressing Ctrl+Shift+M (or Cmd+Shift+M on a Mac). If M doesn’t make sense, remember that the pipe was introduced with the magrittr package, an excellent reference to Magritte’s painting “The treachery of images”, where he famously wrote “ Ceci n’est pas une pipe.” That should help you remember it :)


Combined Example

Work your way through the the lines, one by one so you can see in detail the effect of each new verb introduced. Don’t worry about large print-outs, a great characteristic of tibbles is how they print in console.

That’s it! Work your way through the the lines, one by one.

The nest obvious step is to visualize. As an example, here’s an R script of how I went about it:


Credits

Based on Jenny Bryan’s original tutorial. Essentially this is the tl:dr version compiled from my notes as I read the original.