Control Statement and Loops In R — 4

Vivekanandan Srinivasan
Analytics Vidhya
Published in
9 min readNov 29, 2019

If you have not read part 3 of the R basics series kindly go through the following article where we discussed Writing Functions in R — 2. This series covers fundamentals of R that include data types, control structures, loops, functions, and advanced data structures.

If you are already familiar with these topics and looking for a comprehensive introduction to all important topics in statistics and machine learning using R. Kindly start off with the following series which discusses all necessary topics related to data science.

Many Ways of Reading Data Into R — 1

The contents in the article are gist from a couple of books that I got introduced during my IIM-B days.

R for Everyone — Jared P. Lander

Practical Data Science with R — Nina Zumel & John Mount

All the code blocks discussed in the article are present in the form of R markdown in the Github link.

To see all the articles written by me kindly use the link, Vivek Srinivasan.

Control statements allow us to control the flow of our programming and cause different things to happen, depending on the values of tests. Tests result in a logical, TRUE or FALSE, which is used in if-like statements. The main control statements are if, else, ifelse and switch.

if and else

The most common test is the if command. It essentially says. If something is TRUE, then perform some action; otherwise, do not perform that action. The thing we are testing goes inside parentheses following the if command.

The most basic checks are: equal to (==), less than (<), greater than or equal to (>=) and not equal (!=). If these tests pass they result in TRUE, and if they fail they result in FALSE. As explained in previous articles,TRUE is numerically equivalent to 1 and FALSE is equivalent to 0.

# set up a variable to hold 1
toCheck <- 1
# if toCheck is equal to 1, print hello
if(toCheck == 1)
{
print("hello")
}

Notice that if statements are similar to functions, in that all statements (there can be one or multiple) go inside curly braces.

Life is not always so simple that we want action only if some relationship is TRUE. We often want a different action if that relationship is FALSE. In the following example, we put an if statement followed by an else statement inside a function, so that it can be used repeatedly.

# first create the function
check.bool <- function(x)
{
if(x == 1)
{
# if the input is equal to 1, print hello
print("hello")
}else
{
# otherwise print goodbye
print("goodbye")
}
}
check.bool(0)

Anything other than 1 caused the function to print “goodbye.” That is exactly what we wanted. Passing TRUE will print “ hello” because TRUE is numerically the same as 1. Perhaps we want to successively test a few cases. That is where we can use else if. We first test a single statement, then make another test, and then perhaps fall over to catch-all. We will modify check.bool to test for one condition and then another.

check.bool <- function(x)
{
if(x == 1)
{
# if the input is equal to 1, print hello
print("hello")
}else if(x == 0)
{
# if the input is equal to 0, print goodbye
print("goodbye")
}else
{
# otherwise print confused
print("confused")
}
}
check.bool(2)

Switch Statements

If we have multiple cases to check, writing else if repeatedly can be cumbersome and inefficient. This is where switch is most useful. The first argument is the value we are testing. Subsequent arguments are a particular value and what should be the result. The last argument, if not given a value, is the default result. To illustrate we build a function that takes in a value and returns a corresponding result.

use.switch <- function(x)
{
switch(x,
"a"="first",
"b"="second",
"z"="last",
"c"="third",
"other")
}
use.switch("a")

If the first argument is numeric, it is matched positionally to the following arguments, regardless of the names of the subsequent arguments. If the numeric argument is greater than the number of subsequent arguments, NULL is returned.

# nothing is returned
is.null(use.switch(6))

ifelse

While if is like the if statement in traditional languages ifelse is more like the if function in Excel. The first argument is the condition to be tested (much like in a traditional if statement), the second argument is the return value if the test is TRUE and the third argument is the return value if the test if FALSE. The beauty here — unlike with the traditional if — is that this works with vectorized arguments.

As is often the case in R, using vectorization avoids for loops and speeds up our code. The nuances of ifelse can be tricky, so we show numerous examples. We start with a very simple example, testing whether 1 is equal to 1 and printing “Yes” if that is TRUE and “No” if it is FALSE.

# see if 1 == 1
ifelse(1 == 1, "Yes", "No")
# see if 1 == 0
ifelse(1 == 0, "Yes", "No")

This clearly gives us the results we want. ifelse uses all the regular equality tests seen before and any other logical test. It is worth noting, however, that if testing just a single element (a vector of length 1 or a simple is.na), it is more efficient to use if than ifelse. This can result in a nontrivial speedup of our code. Next, we will illustrate a vectorized first argument.

toTest <- c(1, 1, 0, 1, 0, 1)
ifelse(toTest == 1, "Yes", "No")

Now let’s say that toTest has NA elements. In that case, the corresponding result from ifelse is NA.

toTest <- c(1, 1, 0, 1, 0, 1)
ifelse(toTest == 1, toTest*3, toTest)
## vectors with NA values
toTest <- c(1, NA, 0, 1, 0, 1)
ifelse(toTest == 1, toTest*3, toTest)

Compound Tests

The statement being tested with if, ifelse and switch can be any argument that results in a logical TRUE or FALSE. This can be an equality check or even the result of is.numeric or is.na. Sometimes we want to test more than one relationship at a time. This is done using logical and and or operators. These are & and && for and and | and || for or.

The differences are subtle but can impact our code’s speed. The double form (&& or ||) is best used in if and the single form (& or |) is necessary for ifelse. The double form compares only one element from each side, while the single form compares each element of each side.

a  <- c(1, 1, 0, 1)
b <- c(1, 0, 0, 1)
# this checks each element of a and each element of b
ifelse(a == 1 & b == 1, "Yes", "No")
# this only checks the first element of a and the first element of b
# it only returns one result
ifelse(a == 1 && b == 1, "Yes", "No")

Another difference between the double and single forms is how they are processed. When using a single form, both sides of the operator are always checked. With the double form, sometimes only the left side needs to be checked. For instance, if testing 1 == 0 && 2 == 2, the left side fails, so there is no reason to check the right hand side. Similarly, when testing 3 == 3 || 0 == 0, the left side passes, so there is no need to check the right side. This can be particularly helpful when the right side would throw an error if the left side had failed.

There can be more than just two conditions tested. Many conditions can be strung together using multiple and or or operators. The different clauses can be grouped by parentheses just like mathematical operations. Without parentheses, the order of operations is similar to PEMDAS, where and is equivalent to multiplication and or is equivalent to addition, so and takes precedence over or.

Loops, the Un-R Way to Iterate

When starting to use R, most people use loops whenever they need to iterate over elements of a vector, list or data.frame. While it is natural to do this in other languages, with R we generally want to use vectorization. That said, sometimes loops are unavoidable, so R offers both for and while loops.

for Loops

The most commonly used loop is the for loop. It iterates over an index — provided as a vector — and performs some operations. The loop is declared using for, which takes one English-seeming argument in three parts.

The third part is any vector of values of any kind, most commonly numeric or character. The first part is the variable that is iteratively assigned the values in the vector from the third part. The middle part is simply the word in indicating that the variable (the first part) is in the vector (the third part).

# build a vector holding fruit names
fruit <- c("apple", "banana", "pomegranate")
for(a in fruit)
{
print(sprintf("Length of %s is %s",a,as.character(nchar(a))))
}

Again, R’s built-in vectorization could have made all of this much easier.

nachar(fruit)

while Loop

Although used far less frequently in R than the for loop, the while loop is just as simple to implement. It simply runs the code inside the braces repeatedly as long as the tested condition proves true. In the following example, we print the value of x and iterate it until it reaches 5. This is a highly trivial example but shows the functionality nonetheless.

x  <- 1
while(x <= 5)
{
print(x)
x <- x + 1
}

Controlling Loops

Sometimes we have to skip to the next iteration of the loop or completely break out of it. This is accomplished with next and break. We use a for loop to demonstrate.

## Example for skipping an iteration
for(i in 1:10)
{
if(i == 3)
{
next
}
print(i)
}
## Example for breaking the loop
for(i in 1:10)
{
if(i == 3)
{
break
}
print(i)
}

Here, even though we told R to iterate over the first ten integers, it stopped after 2 because we broke the loop at 3.

The two primary loops are for, which iterates over a fixed sequence of elements, and while which continues a loop as long as some condition holds true. As stated earlier, if a solution can be done without loops, via vectorization or matrix algebra, then avoid the loop. It is particularly important to avoid nested loops. Loops inside other loops are extremely slow in R.

So we have comprehensively covered the very basics of R programming. If you are interested in learning some advanced programming and machine learning in R start off with the advanced R series in the link given below.

Many Ways Of Reading Data Into R — 1

Do share your thoughts and support by commenting and sharing the article among your peer groups.

--

--

Vivekanandan Srinivasan
Analytics Vidhya

An analytics professional with over six years of experience spanning across predictive modelling, statistical analysis and big data technologies.