R Tutorial — Apply Function Family in R

Learn how to use apply() function family in R

Kunal Ajay Kulkarni
Nerd For Tech
10 min readMay 29, 2021

--

In this article, we will discuss the apply() function family in the R, its types, and a few of its types applied to different data structures. The apply() family is an in-built package in R, so we don’t have to install it separately. The main advantage of apply() function is that we can use these functions as an alternative to loop operations.

In this post, we’ll learn how we can use the R apply() function, and its different types such as lapply(), sapply(), tapply(), mapply(), and the replicate() function applied to different data structures. So let's get started!

The apply() Family —

The apply() family is pre-installed in the R base package and is made up of various functions to manipulate the data from arrays, lists, matrices, and dataframes in a repetitive way. The apply() function family helps us to perform operations with very few lines of code. These functions could be —

  1. Vectorized structures like lists, arrays, matrices, etc.
  2. An aggregate function like mean, sum, average, etc.

How and when should we use these?

The use of these functions depends on the structure of the data that we want to operate on and the format of the output that we need. Let’s see how to execute these functions one by one —

The apply() Function

Let’s start with the first function of the family, the apply() function. This operates on arrays. To simplify things, we will only use 2D arrays in this tutorial. First, let’s see how the basic apply function works. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows —

Where:-

  1. X is a 2D array.
  2. MARGIN is a variable that defines how we can apply the function. When Margin =1, it is applied to rows, and when Margin = 2, it is applied to columns, and when Margin = c(1,2), then it is applied to both rows and columns.
  3. FUN is the function that we apply to the data. It can be any R function, including the User Defined Function (UDF).
  4. ….. this is used for any other arguments to be passed to the function.

Let’s construct a 4 x 4 matrix and calculate the sum of the values of each column —

We’ll get the output as follows —

Let’s calculate the sum of the values of each row —

We’ll get the output as follows —

Let’s calculate the sum of the values of each column —

We’ll get the output as follows—

Let’s introduce the NA value in the matrix and see how we can execute the function —

We’ll get the output as follows—

Let’s use the apply function —

We’ll get the output as follows —

Passing the na.rm argument to the above code —

We’ll get the output as follows —

Now, let’s remove this NA —

We’ll get the output as follows —

In the above examples, we used the apply() function to calculate the sum of a row or a column. In the below example, we will use the apply() function to transform the values in each row and column. Please pay attention to the MARGIN argument. We will define a function to multiply each element of the matrix by 5 and set the MARGIN argument to 1:2 so that the function can operate on every row and column of the matrix.

We’ll get the output as follows—

In the previous examples, we used the apply() function on a matrix. But let’s see what happens when we loop through a vector instead?

Let's create a vector first —

We will get the output as follows —

Let’s use the apply() function —

If you run this function, you’ll get the following error —

As you can see, it didn’t work because the apply() function works best only when the data has at least two dimensions. If the data used is in the vector format, then we need to use the other functions such as lapply(), sapply(), or vapply() instead.

Photo by Magnet.me on Unsplash

The lapply() Function

We use this function when we want to apply a given function that will loop through every element of the data in a list or a vector. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function.

We can use this function for other objects such as dataframes, lists, or vectors, and the output we get in return is in the form of a list (hence the name starts with “l”), which has the same number of elements as the object passed to it. We will create three matrices named A, B, and C and extract values from a column to see how this works.

We will get the output as follows —

Let’s calculate the sum of each list —

We will get the output as follows —

We can see how the results are saved as a list. If we want a result in a vector form, then we’ve to pass the unlist argument to the lapply() function —

We will get the output as follows —

We can also create our own function and pass it to the lapply(). For example, if we want to add 10 to each element, we will use the code as shown below —

We will get the output as follows —

The sapply() Function

The sapply() function works just like the lapply() function, but it tries to simplify the output if possible. And indeed, sapply() function is a “wrapper” function for lapply(). We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function as well.

This means that instead of returning a list like lapply(), it will return a vector instead if the data is simplificable. Let's use the same example of my_list for this also.

We will get the output as follows —

Now let’s pass the simplify = FALSE argument to the above code and see what happens —

We will get the output as follows —

Here, we can see that the output is returned in the form of a list instead of a vector. Like the apply() function, we can use a function to transform the data as well. Here is how we can do this —

We will get the output as follows —

The vapply() function

The vapply() function is similar to the sapply() function, but it requires users to specify what type of data they’re passing to the arguments of the vapply() function. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function as well.

FUN.VALUE is where we need to specify the type of data that we’re passing. If you want each item in the list to return a single numeric value, so we use the argument as FUN.VALUE = integer(1). Another thing to remember here is that the simplification is always done in the case of the vapply() function. This function checks that all values of FUN are compatible with the FUN.VALUE, for which they must have the same length and type.

We will get the output as follows —

The replicate() Function

This function is often used with the apply() function family. When we pass the replicate() function to a vector, it replicates its values a specified number of times. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function as well.

Where —

  1. n is an integer that shows the number of replications.
  2. expr is the expression to evaluate repeatedly.

Let's look at an example —

We will get the output as follows —

Image by Author

The mapply() function

The “m” in mapply() function stands for “multivariate” apply. This is modeled after the sapply() function. We use this function to vectorize arguments to a function that is usually not accepting vectors as arguments. We can apply this function to multiple lists or multiple vector arguments. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function as well.

Where —

  1. Fun is the function to apply
  2. …. is the arguments to vectorize over
  3. MoreArgs is a list of other arguments to FUN.
  4. SIMPLIFY is logical or a character string. This tries to reduce the result to a vector, matrix, or a higher-dimensional array.
  5. USE.NAMES is used if it is a character vector, we use that character vector as names.

Let’s consider the following example. Suppose we want to replicate 1 at 4 times, 2 at 3 times, 3 at 2 times, and 4 at 1 time.

We will get the output as follows —

There is another way to do this —

We will get the output as follows —

We will get the output in the form of a list. To get the output in the vector form, use the following code —

We will get the output as follows —

Now, Let’s create a function and pass the mapply() function to it. Suppose we’ve two vectors and want to multiply them by 2 after adding each other. So let's create the function first and pass the arguments to it.

We will get the output as follows —

The tapply() function —

This function applies a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. We can use the help section of the RStudio to get a description of this function.

The R documentation shows the syntax of the function as follows. Please note that there is no MARGIN argument in this function as well.

Let’s load the iris dataset in R and use the tapply() function on this iris dataset. Let's load the data set —

We will get the output as follows —

Suppose if we want to calculate the mean value of the Sepal Length for all species —

We will get the output as follows —

Similarly, we can calculate the median Sepal Length for the three types of species by using this code —

We will get the output as follows —

Conclusion —

In this article, we learned about the apply() functions family in R. If you liked this article or have any suggestions for me, please let me know by commenting below.

Thank You!

--

--

Kunal Ajay Kulkarni
Nerd For Tech

Instrumentation Engineer | Data Science and Machine Learning enthusiast | Avid Reader