Writing R Functions — 3

Vivekanandan Srinivasan
Analytics Vidhya
Published in
6 min readNov 29, 2019

If you have not read part 2 of the R basics series kindly go through the following article where we discussed Advanced-Data Structures in R — 2. This series covers fundamentals of R that include data types, control structures, loops, functions, and advanced data structures.

If you are already familiar with these topics and looking for a comprehensive introduction to all important topics in statistics and machine learning using R. Kindly start off with the following series which discusses all necessary topics related to data science.

Many Ways of Reading Data Into R — 1

The contents in the article are gist from a couple of books that I got introduced during my IIM-B days.

R for Everyone — Jared P. Lander

Practical Data Science with R — Nina Zumel & John Mount

All the code blocks discussed in the article are present in the form of R markdown in the Github link.

To see all the articles written by me kindly use the link, Vivek Srinivasan.

If we find ourselves running the same code repeatedly, it is probably a good idea to turn it into a function. In programming, it is best to reduce redundancy whenever possible. There are several reasons for doing so, including maintainability and ease of reuse. R has a convenient way to make functions, but it is very different from other languages, so some expectation adjustment might be necessary.

Hello World

This would not be a serious blog about a programming language if we did not include a “Hello, World!” example, so we will start with that. Let’s build a function that simply prints “Hello, World!” to the console.

say.hello <- function()
{
print("Hello, World!")
}
say.hello()

First, note that in R the period (.) is just another character and has no special meaning,1 unlike in other languages. This allows us to call this function say.hello.

Next, we see that functions are assigned to objects just like any other variable, using the <- operator. This is the strangest part of writing functions for people coming from other languages.

Following the function are a set of parentheses that can either be empty — not have any arguments — or contain any number of arguments.

The body of the function is enclosed in curly braces ({ and }). This is not necessary if the function contains only one line, but that is rare. Notice the indenting for the commands inside the function. While not required, it is good practice to properly indent code to ensure readability. It is here in the body that we put the lines of code we want the function to perform. A semicolon (;) can be used to indicate the end of the line but is not necessary, and its use is actually frowned upon.

Function Arguments

More often than not we want to pass arguments to our function. These are easily added inside the parentheses of the function declaration. We will use an argument to print “Hello Jared.”

Before we do that, however, we need to briefly learn about the sprintf function. Its first argument is a string with special input characters and subsequent arguments that will be substituted into the special input characters.

sprintf("Hello %s, today is %s", "Jared", "Sunday")

The argument name can be used as a variable inside the function (it does not exist outside the function). It can also be used like any other variable and as an argument to further function calls.

hello.person <- function(name)
{
print(sprintf("Hello %s", name))
}

hello.person("Jared")

The argument name can be used as a variable inside the function (it does not exist outside the function). It can also be used like any other variable and as an argument to further function calls.

We can add a second argument to be printed as well. When calling functions with more than one argument, there are two ways to specify which argument goes with which value, either positionally or by name.

hello.person <- function(first, last)
{
print(sprintf("Hello %s %s", first, last))
}
hello.person("vivek","srinivasan") orhello.person(first="vivek",last="srinivasan")

Being able to specify the arguments by name adds a lot of flexibility to calling functions. Even partial argument names can be supplied, but this should be done with care.

hello.person(fir="Jared", l="Lander")

Default Arguments

When using multiple arguments it is sometimes desirable to not have to enter a value for each. In other languages, functions can be overloaded by defining the function multiple times, each with a different number of arguments. R instead provides the capability to specify default arguments. These can be NULL, characters, numbers or any valid R object. Let’s rewrite hello.person to provide “Doe” as the default last name.

Let’s rewrite hello.person to provide “Doe” as the default last name

hello.person <- function(first, last="Srinivasan")
{
print(sprintf("Hello %s %s", first, last))
}
# call without specifying last
hello.person("Vivek")

Extra Arguments

R offers a special operator that enables functions to take an arbitrary number of arguments that do not need to be specified in the function definition. This is the dot-dot-dot argument (…). This should be used very carefully, although it can provide great flexibility. For now, we will just see how it can absorb extra arguments; later we will find a use for it when passing arguments between functions.

# call hello.person with an extra argument 
hello.person("Jared", extra="Goodbye")
hello.person <- function(first, last="Srinivasan",...)
{
print(sprintf("Hello %s %s and %s", first,last,...))
}
# call hello.person with an extra argument
hello.person ("Vivek", extra="Goodbye")

Return Values

Functions are generally used for computing some value, so they need a mechanism to supply that value back to the caller. This is called returning and is done quite easily. There are two ways to accomplish this with R. The value of the last line of code in a function is automatically returned, although this can be bad practice. To illustrate, we will build a function that doubles its only argument and returns that value.

# first build it without an explicit return
double.num <- function(x)
{
x * 2
}
double.num(5)

The return command more explicitly specifies that a value should be returned and the function should be exited.

# build it again, this time with another argument after the explicit return
double.num <- function(x)
{
return(x * 2)
# below here is not executed because the function already exited
print("Hello!")
return(17)
}
double.num(5)

do.call

A particularly underused trick is the do.call function. This allows us to specify the name of a function either as a character or as an object, and provide arguments as a list.

do.call("hello.person", args=list(first="Jared", last="Lander"))or do.call(hello.person, args=list(first="Jared", last="Lander"))

This is particularly useful when building a function that allows the user to specify an action. In the following example, the user supplies a vector and a function to be run.

run.this  <- function(x, func=mean)
{
do.call(func, args=list(x))
}
## calling the run.this with vector and function
run.this(c(1,2,3,4),mean)
run.this(c(1,2,3,4),sum)

Functions allow us to create reusable code that avoids repetition and allows easy modification. Important points to remember are function arguments, default values and returned values. In the upcoming articles, we will see functions that get far more complicated than the ones we have seen so far. In the next article, we will discuss control structures and loops in R.

Control Statements And Loops In R — 4

Do share your thoughts and support by commenting and sharing the article among your peer groups.

--

--

Vivekanandan Srinivasan
Analytics Vidhya

An analytics professional with over six years of experience spanning across predictive modelling, statistical analysis and big data technologies.