R Function Of The Week: apply() and sapply()

6 min readApr 13, 2022

Author: Neha Anwer, Statistics Without Borders

This Article Contains

A brief overview of the apply() concept and general use case
apply() vs sapply() with examples

This week at Statistics Without Borders, we go over the apply() concept and discuss when it is most helpful to use. While programming, you may come across situations that require you to iterate over a set of values and apply some kind of function or calculation to each value. Traditionally, this is accomplished using a loop construct.

Overview of `apply()`

The apply() function can be used in lieu of loops and help speed up your code. apply() can take matrix-like structures as input and supply output results in the form of a vector, array, or list. Because it has the built-in capability to accept multi-faceted inputs and produce similar output structures, using this function eliminates the need to write lengthy loops.

In the example code below, we will perform some simple operations on a built-in R dataset called Orange. This dataset records growth of orange trees.

First, we will load the data set into our environment using the data command.

## Load in data
data('Orange')
head(Orange)

Output

#   Tree  age circumference
# 1    1  118            30
# 2    1  484            58
# 3    1  664            87
# 4    1 1004           115
# 5    1 1231           120
# 6    1 1372           142

Next, we will engineer a new feature. This new feature will calculate each tree’s circumference value divided by the largest circumference value in the data.
We will first perform this calculation using a for loop and then compute the same values using apply().

Let’s write a for loop to express each circumference value as a percentage of the largest value in the circumference column

max_circ <- max(Orange$circumference) # maximum circumference
pct_circ <- list()  ## create an empty list to store valuesfor(i in 1:nrow(Orange)) { # loop over each row
  
  # Divide each value by the max and multiply by 100
  pct_circ[i] <- (Orange$circumference[i] / max_circ) * 100
  
} # End for looppct_circ

Screenshot of for loop results in R console — Result

As you can see, we were successfully able to iterate over each row and compute percentage of total. However, we can do this in one line using the apply() function. The general syntax of the function is as follows:

apply(data, Margin, Function)

The first argument expects an array or matrix. The second argument can be 1, 2, or a vector of indices. This argument tells the function whether the operation should be applied across rows (1), down a column (2), or to a specific cell. The last argument expects the name of a function that is going to applied. In our case, since there is no built-in function that can compute a given number as a percentage of another number, we will write a built in function:

pct_func <- function(x) {
  result <- x / max_circ   # x is a vector/matrix
  result <- result * 100 # convert from decimal to %
  return(result)
}

Now that we have our function ready, we just need to specify the correct inputs to the apply() function. Remember that it expects data to be a dataframe or a matrix. We will supply the circumference column (column 3), as our input matrix. When indexing column 3 of the Orange dataframe, we will have to specify an additional argument drop=F to ensure our matrix maintains its dimensions.

apply(Orange[,3,drop=F], 1, pct_func)

Output

# output from console
# 1         2         3         4         5         6         7         8         9        10 
# 14.01869  27.10280  40.65421  53.73832  56.07477  66.35514  67.75701  15.42056  32.24299  51.86916 
# 11        12        13        14        15        16        17        18        19        20 
# 72.89720  80.37383  94.85981  94.85981  14.01869  23.83178  35.04673  50.46729  53.73832  64.95327 
# 21        22        23        24        25        26        27        28        29        30 
# 65.42056  14.95327  28.97196  52.33645  78.03738  83.64486  97.66355 100.00000  14.01869  22.89720 
# 31        32        33        34        35 
# 37.85047  58.41121  66.35514  81.30841  82.71028

As you can see, we were able to achieve the same results in one line. Using the apply() method, we can quickly apply any function to a set of data. While in this case we wrote our own user defined function, combining an anonymous function (commonly referred to as a “lambda expression”) with apply() allows users to write their own functions within the apply() function. While we will not be discussing lambda expressions within this article, this blog post does a good job of walking through how they work in both R and Python.

apply() vs. sapply()

Building off of the apply() function, sapply()takes in more flexible input types and is ideal for vector operations. sapply() takes in a list, vector, or DataFrame as an input and returns a matrix or vector of the same length as an output. The general syntax of the sapply() function is:

sapply(data, function)

Notice that there is no margin argument. This is because the sapply() function will apply the function to each element of the data by default. Due to this default behavior, the margin argument is unnecessary.

Following the same example as above, let’s compute circumference as a percentage of the largest circumference in our data set using sapply() . Note that since the input does not have to be a matrix with usable dimensions, we can just supply the “circumference” column by name and will get a list object of the same length as an output.

sapply(Orange$circumference, pct_func)#Output:
# [1]  14.01869  27.10280  40.65421  53.73832  56.07477  66.35514  67.75701  15.42056  32.24299
# [10]  51.86916  72.89720  80.37383  94.85981  94.85981  14.01869  23.83178  35.04673  50.46729
# [19]  53.73832  64.95327  65.42056  14.95327  28.97196  52.33645  78.03738  83.64486  97.66355
# [28] 100.00000  14.01869  22.89720  37.85047  58.41121  66.35514  81.30841  82.71028

If we wanted to, we could apply the pct_func to the age column in our dataset as well, we could like so:

sapply(Orange[,c('circumference', 'age')], pct_func)#Output: 
# circumference       age
# [1,]      14.01869  55.14019
# [2,]      27.10280 226.16822
# [3,]      40.65421 310.28037
# [4,]      53.73832 469.15888
# [5,]      56.07477 575.23364
# [6,]      66.35514 641.12150
# [7,]      67.75701 739.25234
# [8,]      15.42056  55.14019
# [9,]      32.24299 226.16822
# [10,]      51.86916 310.28037
# [11,]      72.89720 469.15888
# [12,]      80.37383 575.23364
# [13,]      94.85981 641.12150
# [14,]      94.85981 739.25234
# [15,]      14.01869  55.14019
# [16,]      23.83178 226.16822
# [17,]      35.04673 310.28037
# [18,]      50.46729 469.15888
# [19,]      53.73832 575.23364
# [20,]      64.95327 641.12150
# [21,]      65.42056 739.25234
# [22,]      14.95327  55.14019
# [23,]      28.97196 226.16822
# [24,]      52.33645 310.28037
# [25,]      78.03738 469.15888
# [26,]      83.64486 575.23364
# [27,]      97.66355 641.12150
# [28,]     100.00000 739.25234
# [29,]      14.01869  55.14019
# [30,]      22.89720 226.16822
# [31,]      37.85047 310.28037
# [32,]      58.41121 469.15888
# [33,]      66.35514 575.23364
# [34,]      81.30841 641.12150
# [35,]      82.71028 739.25234

As a result, we get a DataFrame of the same length as our input DataFrame. Super convenient! I tend to use the function on the daily to perform data manipulations. Both sapply() and apply() are 2 of the functions in the larger apply family of functions. I find that these tend to cover most of the use cases I come across on a day-to-day basis but if you are interested in learning more, this post covers the entire apply family.

Want to learn more about Statistics Without Borders? Follow us on Twitter and LinkedIn, and check out our website.
Want to volunteer on projects or contribute to this blog? Send us an email at statisticswithoutborders@gmail.com.

Meet the Author

Neha Anwer

Neha is a data science professional with domain experience in the financial advisory field. She has been a volunteer at Statistics Without Borders for over a year and most recently joined the SWB Marketing & Communications team. In her spare time, Neha enjoys traveling, curling up with a fiction read, and spending time outdoors.

R Function Of The Week: apply() and sapply()

This Article Contains

Overview of apply()

apply() vs. sapply()

Meet the Author

Written by Statistics Without Borders

Overview of `apply()`