# Creating functions and using lapply in R

May 31 · 4 min read

Functions are used to simplify a series of calculations.

For instance, let us suppose that there exists an array of numbers which we wish to add to another variable. Instead of carrying out separate calculations for each number in the array, it would be much easier to simply create a function that does this for us automatically.

A function in R generally works by:

a. Defining the variables to include in the function and the calculation. e.g. to add two numbers together, our function is:

`function(number1,number2) {(number1+number2)}`

b. Using sapply to define the list of numbers and typically an associated sequence:

`sapply(c(20,40,60), number1addnumber2, seq(2,20,by=2))`

To see how this works in practice, let us take a look at the examples below:

# 1. Add an array of numbers to a sequence

`> #Function1> number1addnumber2<-function(number1,number2) {(number1+number2)}> result1<-sapply(c(20,40,60),number1addnumber2,seq(2,20,by=2))> result1df<-data.frame(result1)> result1df   X1 X2 X31  22 42 622  24 44 643  26 46 664  28 48 685  30 50 706  32 52 727  34 54 748  36 56 769  38 58 7810 40 60 80`

# 2. Subtract an array of numbers from a sequence

`> #Function2> number1minusnumber2<-function(number1,number2) {(number1-number2)}> result2<-sapply(c(20,40,60),number1minusnumber2,seq(2,20,by=2))> result2df<-data.frame(result2)> result2df   X1 X2 X31  18 38 582  16 36 563  14 34 544  12 32 525  10 30 506   8 28 487   6 26 468   4 24 449   2 22 4210  0 20 40`

# 3. Multiply an array of numbers by a sequence

`> #Function3> multiplynumber1bynumber2 <-function(number1,number2) {(number1*number2)}> result3<-sapply(c(20,40,60), multiplynumber1bynumber2, seq(2,20, by=2))> result3df=data.frame(result3)> result3df    X1  X2   X31   40  80  1202   80 160  2403  120 240  3604  160 320  4805  200 400  6006  240 480  7207  280 560  8408  320 640  9609  360 720 108010 400 800 1200`

# 4. Divide an array of numbers by a sequence

`> #Function4> dividenumber1bynumber2<-function(number1,number2) {(number1/number2)}> result4<-sapply(c(20,40,60), dividenumber1bynumber2, seq(2, 20, by=2))> result4df<-data.frame(result4)> result4df          X1        X2        X31  10.000000 20.000000 30.0000002   5.000000 10.000000 15.0000003   3.333333  6.666667 10.0000004   2.500000  5.000000  7.5000005   2.000000  4.000000  6.0000006   1.666667  3.333333  5.0000007   1.428571  2.857143  4.2857148   1.250000  2.500000  3.7500009   1.111111  2.222222  3.33333310  1.000000  2.000000  3.000000`

# 5. Raise number to a power

`> #Function5> raisenumber1bypower<-function(number1,power) {(number1^power)}> result5<-sapply(c(20,40,60),raisenumber1bypower,seq(0.5,2.5,by=0.5))> result5df<-data.frame(result5)> result5df           X1           X2           X31    4.472136     6.324555     7.7459672   20.000000    40.000000    60.0000003   89.442719   252.982213   464.7580024  400.000000  1600.000000  3600.0000005 1788.854382 10119.288513 27885.480093`

# 6. Create a probability function

This function calculates a mu variable by using lambda and powers.

`> #Functionprob> muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}> probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 10))> probability_at_lambdadf=data.frame(probability_at_lambda)> probability_at_lambdadf          X1        X2        X31  0.0000000 0.0000000 0.00000002  0.1829272 0.3351674 0.46138493  0.3323920 0.5579976 0.70989384  0.4545157 0.7061424 0.84374445  0.5542996 0.8046338 0.91583846  0.6358303 0.8701142 0.95466937  0.7024469 0.9136477 0.97558428  0.7568774 0.9425902 0.98684939  0.8013511 0.9618321 0.992916810 0.8376894 0.9746247 0.996184911 0.8673804 0.9831297 0.9979451`

# Run Functions Across Lists Using lapply

Functions are very useful when it comes to running a command more than once on particular groups of data. While we could use a for loop for this purpose, combining a pre-defined function with lapply is a very efficient and useful function in R. Let’s see how it works.

Suppose that we have two groups of time series (TS1 and TS2). The objective is to split these two time series and then run an ARIMA forecast on both of them. Instead of running the ARIMA forecast twice, we wish to use a function to run it repeatedly in a similar manner to a loop.

Firstly, we are defining our dataframe and then splitting it by group:

`group<-c("TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2")value<-c("23","19","22","31","33","25","32","29","32","34","41","45","47","49","42","44","43","50","55","57","410","395","402","403","401","390","420","415","417","410","425","427","423","430","432","428","410","405","410","414")df1<-data.frame(group,value)newlist <- split(df1, df1\$group)newlist`

Then, we are defining our forecast function using auto.arima:

`arm <- function(x) plot(forecast(auto.arima(x\$value),10))`

We can now use lapply to run the function on the list that has now been split by group:

`armforecast <- lapply(newlist,arm)`

Upon doing this, R returns ARIMA plots for both of our time series.

To summarise, we have taken a look at how to create functions in R, and how we can use lapply to replicate a for loop by running a function numerous times on data that has been split according to certain criteria.