Control Statement and Loops In R — 4
If you have not read part 3 of the R basics series kindly go through the following article where we discussed Writing Functions in R — 2. This series covers fundamentals of R that include data types, control structures, loops, functions, and advanced data structures.
If you are already familiar with these topics and looking for a comprehensive introduction to all important topics in statistics and machine learning using R. Kindly start off with the following series which discusses all necessary topics related to data science.
Many Ways of Reading Data Into R — 1
The contents in the article are gist from a couple of books that I got introduced during my IIM-B days.
R for Everyone — Jared P. Lander
Practical Data Science with R — Nina Zumel & John Mount
All the code blocks discussed in the article are present in the form of R markdown in the Github link.
To see all the articles written by me kindly use the link, Vivek Srinivasan.
Control statements allow us to control the flow of our programming and cause different things to happen, depending on the values of tests. Tests result in a logical, TRUE
or FALSE
, which is used in if-like statements. The main control statements are if
, else
, ifelse
and switch.
if and else
The most common test is the if command. It essentially says. If something is TRUE
, then perform some action; otherwise, do not perform that action. The thing we are testing goes inside parentheses following the if
command.
The most basic checks are: equal to (==)
, less than (<)
, greater than or equal to (>=)
and not equal (!=)
. If these tests pass they result in TRUE
, and if they fail they result in FALSE
. As explained in previous articles,TRUE
is numerically equivalent to 1
and FALSE
is equivalent to 0
.
# set up a variable to hold 1
toCheck <- 1# if toCheck is equal to 1, print hello
if(toCheck == 1)
{
print("hello")
}
Notice that if
statements are similar to functions, in that all statements (there can be one or multiple) go inside curly braces.
Life is not always so simple that we want action only if some relationship is TRUE
. We often want a different action if that relationship is FALSE
. In the following example, we put an if
statement followed by an else
statement inside a function, so that it can be used repeatedly.
# first create the function
check.bool <- function(x)
{
if(x == 1)
{
# if the input is equal to 1, print hello
print("hello")
}else
{
# otherwise print goodbye
print("goodbye")
}
}check.bool(0)
Anything other than 1
caused the function to print “goodbye.” That is exactly what we wanted. Passing TRUE
will print “ hello” because TRUE
is numerically the same as 1
. Perhaps we want to successively test a few cases. That is where we can use else if
. We first test a single statement, then make another test, and then perhaps fall over to catch-all. We will modify check.bool
to test for one condition and then another.
check.bool <- function(x)
{
if(x == 1)
{
# if the input is equal to 1, print hello
print("hello")
}else if(x == 0)
{
# if the input is equal to 0, print goodbye
print("goodbye")
}else
{
# otherwise print confused
print("confused")
}
}check.bool(2)
Switch Statements
If we have multiple cases to check, writing else if
repeatedly can be cumbersome and inefficient. This is where switch
is most useful. The first argument is the value we are testing. Subsequent arguments are a particular value and what should be the result. The last argument, if not given a value, is the default result. To illustrate we build a function that takes in a value and returns a corresponding result.
use.switch <- function(x)
{
switch(x,
"a"="first",
"b"="second",
"z"="last",
"c"="third",
"other")
}use.switch("a")
If the first argument is numeric, it is matched positionally to the following arguments, regardless of the names of the subsequent arguments. If the numeric argument is greater than the number of subsequent arguments, NULL
is returned.
# nothing is returned
is.null(use.switch(6))
ifelse
While if
is like the if statement in traditional languages ifelse
is more like the if
function in Excel. The first argument is the condition to be tested (much like in a traditional if statement), the second argument is the return value if the test is TRUE
and the third argument is the return value if the test if FALSE
. The beauty here — unlike with the traditional if — is that this works with vectorized arguments.
As is often the case in R, using vectorization avoids for
loops and speeds up our code. The nuances of ifelse
can be tricky, so we show numerous examples. We start with a very simple example, testing whether 1 is equal to 1 and printing “Yes”
if that is TRUE
and “No”
if it is FALSE
.
# see if 1 == 1
ifelse(1 == 1, "Yes", "No")# see if 1 == 0
ifelse(1 == 0, "Yes", "No")
This clearly gives us the results we want. ifelse
uses all the regular equality tests seen before and any other logical test. It is worth noting, however, that if testing just a single element (a vector of length 1 or a simple is.na), it is more efficient to use if
than ifelse
. This can result in a nontrivial speedup of our code. Next, we will illustrate a vectorized first argument.
toTest <- c(1, 1, 0, 1, 0, 1)
ifelse(toTest == 1, "Yes", "No")
Now let’s say that toTest has NA
elements. In that case, the corresponding result from ifelse
is NA
.
toTest <- c(1, 1, 0, 1, 0, 1)
ifelse(toTest == 1, toTest*3, toTest)## vectors with NA values
toTest <- c(1, NA, 0, 1, 0, 1)
ifelse(toTest == 1, toTest*3, toTest)
Compound Tests
The statement being tested with if
, ifelse
and switch
can be any argument that results in a logical TRUE
or FALSE
. This can be an equality check or even the result of is.numeric
or is.na
. Sometimes we want to test more than one relationship at a time. This is done using logical and
and or
operators. These are &
and &&
for and
and |
and ||
for or
.
The differences are subtle but can impact our code’s speed. The double form (&& or ||)
is best used in if
and the single form (& or |)
is necessary for ifelse
. The double form compares only one element from each side, while the single form compares each element of each side.
a <- c(1, 1, 0, 1)
b <- c(1, 0, 0, 1)# this checks each element of a and each element of b
ifelse(a == 1 & b == 1, "Yes", "No")# this only checks the first element of a and the first element of b
# it only returns one result
ifelse(a == 1 && b == 1, "Yes", "No")
Another difference between the double
and single
forms is how they are processed. When using a single form, both sides of the operator are always checked. With the double
form, sometimes only the left side needs to be checked. For instance, if testing 1 == 0 && 2 == 2
, the left side fails, so there is no reason to check the right hand side. Similarly, when testing 3 == 3 || 0 == 0
, the left side passes, so there is no need to check the right side. This can be particularly helpful when the right side would throw an error if the left side had failed.
There can be more than just two conditions tested. Many conditions can be strung together using multiple and
or or
operators. The different clauses can be grouped by parentheses just like mathematical operations. Without parentheses, the order of operations is similar to PEMDAS
, where and
is equivalent to multiplication and or
is equivalent to addition, so and
takes precedence over or
.
Loops, the Un-R Way to Iterate
When starting to use R
, most people use loops whenever they need to iterate over elements of a vector
, list
or data.frame
. While it is natural to do this in other languages, with R we generally want to use vectorization
. That said, sometimes loops are unavoidable, so R offers both for
and while
loops.
for Loops
The most commonly used loop is the for
loop. It iterates over an index
— provided as a vector
— and performs some operations. The loop is declared using for
, which takes one English-seeming argument in three parts.
The third part is any vector
of values of any kind, most commonly numeric
or character
. The first part is the variable that is iteratively assigned the values in the vector from the third part. The middle part is simply the word in
indicating that the variable (the first part) is in the vector (the third part).
# build a vector holding fruit names
fruit <- c("apple", "banana", "pomegranate")for(a in fruit)
{
print(sprintf("Length of %s is %s",a,as.character(nchar(a))))
}
Again, R’s built-in vectorization could have made all of this much easier.
nachar(fruit)
while Loop
Although used far less frequently in R
than the for
loop, the while
loop is just as simple to implement. It simply runs the code inside the braces repeatedly as long as the tested condition proves true. In the following example, we print the value of x and iterate it until it reaches 5. This is a highly trivial example but shows the functionality nonetheless.
x <- 1
while(x <= 5)
{
print(x)
x <- x + 1
}
Controlling Loops
Sometimes we have to skip to the next
iteration of the loop or completely break
out of it. This is accomplished with next
and break
. We use a for
loop to demonstrate.
## Example for skipping an iteration
for(i in 1:10)
{
if(i == 3)
{
next
}
print(i)
}
## Example for breaking the loop
for(i in 1:10)
{
if(i == 3)
{
break
}
print(i)
}
Here, even though we told R
to iterate over the first ten integers, it stopped after 2 because we broke the loop at 3.
The two primary loops are for
, which iterates over a fixed sequence of elements, and while
which continues a loop as long as some condition holds true. As stated earlier, if a solution can be done without loops, via vectorization
or matrix algebra
, then avoid the loop. It is particularly important to avoid nested loops. Loops inside other loops are extremely slow in R
.
So we have comprehensively covered the very basics of R programming. If you are interested in learning some advanced programming and machine learning in R start off with the advanced R series in the link given below.
Many Ways Of Reading Data Into R — 1
Do share your thoughts and support by commenting and sharing the article among your peer groups.