Creating functions and using lapply in R

Functions are used to simplify a series of calculations.

For instance, let us suppose that there exists an array of numbers which we wish to add to another variable. Instead of carrying out separate calculations for each number in the array, it would be much easier to simply create a function that does this for us automatically.

A function in R generally works by:

a. Defining the variables to include in the function and the calculation. e.g. to add two numbers together, our function is:

function(number1,number2) {(number1+number2)}

b. Using sapply to define the list of numbers and typically an associated sequence:

sapply(c(20,40,60), number1addnumber2, seq(2,20,by=2))

To see how this works in practice, let us take a look at the examples below:

1. Add an array of numbers to a sequence

> #Function1
> number1addnumber2<-function(number1,number2) {(number1+number2)}
> result1<-sapply(c(20,40,60),number1addnumber2,seq(2,20,by=2))
> result1df<-data.frame(result1)

> result1df
X1 X2 X3
1 22 42 62
2 24 44 64
3 26 46 66
4 28 48 68
5 30 50 70
6 32 52 72
7 34 54 74
8 36 56 76
9 38 58 78
10 40 60 80

2. Subtract an array of numbers from a sequence

> #Function2
> number1minusnumber2<-function(number1,number2) {(number1-number2)}
> result2<-sapply(c(20,40,60),number1minusnumber2,seq(2,20,by=2))
> result2df<-data.frame(result2)

> result2df
X1 X2 X3
1 18 38 58
2 16 36 56
3 14 34 54
4 12 32 52
5 10 30 50
6 8 28 48
7 6 26 46
8 4 24 44
9 2 22 42
10 0 20 40

3. Multiply an array of numbers by a sequence

> #Function3
> multiplynumber1bynumber2 <-function(number1,number2) {(number1*number2)}
> result3<-sapply(c(20,40,60), multiplynumber1bynumber2, seq(2,20, by=2))
> result3df=data.frame(result3)

> result3df
X1 X2 X3
1 40 80 120
2 80 160 240
3 120 240 360
4 160 320 480
5 200 400 600
6 240 480 720
7 280 560 840
8 320 640 960
9 360 720 1080
10 400 800 1200

4. Divide an array of numbers by a sequence

> #Function4
> dividenumber1bynumber2<-function(number1,number2) {(number1/number2)}
> result4<-sapply(c(20,40,60), dividenumber1bynumber2, seq(2, 20, by=2))
> result4df<-data.frame(result4)

> result4df
X1 X2 X3
1 10.000000 20.000000 30.000000
2 5.000000 10.000000 15.000000
3 3.333333 6.666667 10.000000
4 2.500000 5.000000 7.500000
5 2.000000 4.000000 6.000000
6 1.666667 3.333333 5.000000
7 1.428571 2.857143 4.285714
8 1.250000 2.500000 3.750000
9 1.111111 2.222222 3.333333
10 1.000000 2.000000 3.000000

5. Raise number to a power

> #Function5
> raisenumber1bypower<-function(number1,power) {(number1^power)}
> result5<-sapply(c(20,40,60),raisenumber1bypower,seq(0.5,2.5,by=0.5))
> result5df<-data.frame(result5)

> result5df
X1 X2 X3
1 4.472136 6.324555 7.745967
2 20.000000 40.000000 60.000000
3 89.442719 252.982213 464.758002
4 400.000000 1600.000000 3600.000000
5 1788.854382 10119.288513 27885.480093

6. Create a probability function

This function calculates a mu variable by using lambda and powers.

> #Functionprob
> muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
> probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 10))
> probability_at_lambdadf=data.frame(probability_at_lambda)

> probability_at_lambdadf
X1 X2 X3
1 0.0000000 0.0000000 0.0000000
2 0.1829272 0.3351674 0.4613849
3 0.3323920 0.5579976 0.7098938
4 0.4545157 0.7061424 0.8437444
5 0.5542996 0.8046338 0.9158384
6 0.6358303 0.8701142 0.9546693
7 0.7024469 0.9136477 0.9755842
8 0.7568774 0.9425902 0.9868493
9 0.8013511 0.9618321 0.9929168
10 0.8376894 0.9746247 0.9961849
11 0.8673804 0.9831297 0.9979451

Run Functions Across Lists Using lapply

Functions are very useful when it comes to running a command more than once on particular groups of data. While we could use a for loop for this purpose, combining a pre-defined function with lapply is a very efficient and useful function in R. Let’s see how it works.

Suppose that we have two groups of time series (TS1 and TS2). The objective is to split these two time series and then run an ARIMA forecast on both of them. Instead of running the ARIMA forecast twice, we wish to use a function to run it repeatedly in a similar manner to a loop.

Firstly, we are defining our dataframe and then splitting it by group:

group<-c("TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2")
value<-c("23","19","22","31","33","25","32","29","32","34","41","45","47","49","42","44","43","50","55","57","410","395","402","403","401","390","420","415","417","410","425","427","423","430","432","428","410","405","410","414")
df1<-data.frame(group,value)
newlist <- split(df1, df1$group)
newlist

Then, we are defining our forecast function using auto.arima:

arm <- function(x) plot(forecast(auto.arima(x$value),10))

We can now use lapply to run the function on the list that has now been split by group:

armforecast <- lapply(newlist,arm)

Upon doing this, R returns ARIMA plots for both of our time series.

To summarise, we have taken a look at how to create functions in R, and how we can use lapply to replicate a for loop by running a function numerous times on data that has been split according to certain criteria.

Thank you for reading!

Michael Grogan (MGCodesandStats)

Written by

Machine Learning Consultant, Educator, and Speaker