A dplyr reference for the time-constrained
A no-nonsense approach to get up and running — or remember the basics.
Ponder that for a while. The subject matter and the code. Mostly you’re seeing what genocide looks like in dry statistics on average life expectancy.
— Jenny Bryan
This is a quick reference for the most-used (by me at least) functions / verbs provided by the
dplyr R package from Hadley Wickham. No join functions are included (you can check Jenny Bryan’s excellent cheatsheet here if you’re interested in those).
filter(): subset with row logic, e.g.:
filter(Country == Greece),
filter(speed < 5)etc.
select(): subset specific columns by name i.e. choose which variables to work with, e.g.:
mutate(): calculate new variables (columns) or replace existing (if names are matched), e.g.:
mutate(Power = Voltage * Current),
mutate(speed = x / t).
arrange(): think of it as sorting (combine with
desc()for descending order) and note that it can be done with 1+ variables at the same time, e.g.:
rename(): as implied, change the variable names for a data frame / tibble, e.g.:
rename(new_name = oldName).
summarise(): calculate quick-n-dirty statistics, most useful when combine with the next item. E.g.:
summarise(min = min(lifeExp), avg = mean(lifeExp), max = max(lifeExp)or
summarise(count = n()).
group_by(): create groups in your data by variable. Saves you from exhausting your loop-fu and repeating yourself, e.g.:
mtcars %>% group_by(cyl) %>% summarise(avg = mean(mpg)will return the average mile-per-gallon (
mpg) for each group defined by the number of cylinders (
Tip (& fun fact): in Rstudio, you can quickly type the pipe operator
%>% by pressing
Cmd+Shift+M on a Mac). If
M doesn’t make sense, remember that the pipe was introduced with the
magrittr package, a reference to Magritte’s painting “The treachery of images”, where he famously wrote “ Ceci n’est pas une pipe.” That should help you remember it :)
Work your way through the the lines, one by one so you can see in detail the effect of each new verb introduced. Don’t worry about large print-outs, a great characteristic of tibbles is how they print in console.
That’s it! Work your way through the the lines, one by one.
The next obvious step is to visualise. As an example, here’s an R script of how I went about it: