5 Shortcuts in R You Need To Know (Part 1)

Part 1: magrittr, ifelse(), ggtheme, mapply() and anonymous function.

1. Magrittr Operator

The syntax in R has lots of nested function calls and parentheses. This can lead to readability issues.

Let’s say you have data of people’s hair and eye colors. You want to 1) create a two-way contingency table , 2) express the frequencies as probabilities, and 3) round the numbers to one decimal digit.

In vanilla R, you need to do the following:

round(prop.table(table(eye_color, hair_color)), 1)

The pipe operator %>% from a library called magrittr (also loaded in dplyr), allows you to do the same task by chaining operations in the order they are performed:

table(eye_color, hair_color) %>% prop.table() %>% round(1)

The code just got much more readable. And you’ll feel the difference while coding. No need to match parentheses. Instead of thinking inside out, think sequentially.

There are nuances when you pipe something into another function with multiple arguments. Check out the manual.

2. ifelse( )

Let’s say you want to give free tickets to anyone under 13, you can do the following:

free <- NA
for (i in 1:length(age)) {
if (age[i] < 13)
free[i] <- "Yes"
else
free[i] <- "No"
}

The R-ight way to do it is:

free <- ifelse(age < 13, "Yes", "No")

ifelse initializes a new vector and all its elements in one line of code.

3. ggtheme

ggplot allows you to create nice looking, publication quality graphics. But you may need to write many lines of code.

The theme methods in ggplot are used to modify the overall look of a graphic. An example might be:

ggplot() + ... +
theme(legend.position = c(1, 1), legend.justification = c(1, 1))

A library called ggtheme allows you to skip the coding and use templates inspired by the best graphic producers, like The Wall Street Journal, FiveThirtyEight, etc.

Here’s a default ggplot:

ggplot() + geom_point(aes(hp, wt), data = mtcars)

Let’s apply a Wall Street Journal template:

... + theme_wsj()
Left: A default plot | Middle: Using WSJ template | Right: Using 538 template.

You can try a FiveThirtyEight template:

... + theme_fivethirtyeight()

Here’s a list of available templates.

Note that while this package solves the issue of creating a template, you still have to customize other elements like labels, scale, color, type of chart, etc.

4. mapply( )

Let’s say you created a function to calculate the power of a hypothesis test under a certain condition. The function is called estimatePower and takes two arguments. You specify a real mean and a sample size.

What you really want to do is to see how powerful the hypothesis test is under multiple real means and multiple sample sizes:

real_means <- c(100, 150, 200, 250, 300, 500)
sample_sizes <- c(10, 30, 60, 100, 200, 500, 2000)

To do so, you want to call the function many times.

You can do the following:

results <- numeric(6 * 7)  # 42 possible conditions
iter <- 1
for (i in 1:6)
for (j in 1:7) {
results[iter] <- estimatePower(real_means[i], sample_sizes[j])
iter <- iter + 1
}

A much faster way can be achieved by using mapply:

results <- mapply(estimatePower, real_means, sample_sizes)

No need to initialize a vector to store the results and no need to write a nested for loop.

It works like magic when you need to call a function using many combinations of conditions.

5. Anonymous Function

The family of apply methods are some of best shortcuts in R, allowing you to perform an operation across multiple rows or columns of data at once. A common usage is to summarize a statistic for each column variable:

apply(data, 2, mean)

The following is equivalent:

apply(data, 2, FUN = mean)

Here’s the takeaway: you can define any custom function right after FUN.

Let’s say I want to know which column variables show little-to-no variance — perhaps to select the variables that are useful for a later task. I can do the following:

apply(data, 2, FUN = function(x) {
ifelse(var(x) < 0.5,
"Low-variance",
"Non-low-variance")
}
)

In the summary output, all the variables that have variance of less than 0.5 are labeled “Low-variance” and all others are labeled “Non-low-variance”.

Note that the function itself is unnamed and cannot be reused, hence anonymous. In general, anonymous functions are used one-time only in a particular place.

Another example. let’s say you want to produce a scatterplot and relabel the y axis scales to “Low” if they correspond to a number less than or equal to 3, and to “High” otherwise:

ggplot() + 
geom_point(aes(hp, wt), data = mtcars) +
theme_wsj() +
scale_y_discrete(breaks = 0:6,
labels = function(x) {
ifelse (x <= 3, "Low", "High")
}
)
Changing the y labels using anonymous function

Here, the anonymous function modifies the breaks along the y axis that correspond to numerical values.

You might be confused as to what x in the function declaration corresponds to.

In the apply example, it corresponds to each column of data since we specified the second argument (MARGIN) to be 2.

In this example, x corresponds to breaks, since labels can modify only breaks.

Being able to create a function on-the-fly is useful for customizing many base R functions and extending the features of libraries.

Learn more here.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.