R Function Of The Week: ifelse() vs. if_else()

Statistics Without Borders
9 min readApr 6, 2022

--

Author: Neha Anwer, Statistics Without Borders

For the longest time, I thought both of these were exactly the same until I started seeing errors in my results.

This Article Contains

  • An overview of both functions and general use case
  • Important differences between the two approaches with examples
  • Other key differences to consider
  • Additional Resources on ifelse() and if_else() as well as other conditional programming functions

This week at Statistics Without Borders, we discuss the differences between ifelse() and if_else(). For the longest time, I thought both of these were exactly the same until I started seeing errors in my results. I tried to understand the difference between both functions, but most of the posts out there just describe dplyr’s if_else() as being a stricter version of base R’s ifelse(). While this is true, it doesn’t necessarily give users a good understanding of when to use which one. In this article, I will attempt to make these distinctions clearer.

Overview of Both Functions

If you’ve used R before, you probably know that you can use if, elif, and else to control the flow of your program. Below is an example of how I would assign a value to a new variable x based on the sum of a vector using the if(){}else{} code block.

## Create vector vec
vec <- c(1,0,2,5,3)
## The sum of vec is equal to 11
sum(vec)
if (sum(vec) > 10){ # Value if true goes in first curly brackets
x <- 'Greater than 10'
}
else{ # Value if false goes in second curly brackets
x <- 'less than 10'
} # End if else block
print(x)

This certainly gets the job done, but it is a lot of code for a simple operation.

At a high level, both ifelse() and if_else() can be used as short hand for the longer if(){}else{} code block. Using either of these, we can cut the code down to one line like so:

## Using ifelse()
x <- ifelse(sum(vec)>10, 'Greater than 10', 'Less than 10')
print(x)
## Using dplyr's if_else()
library(dplyr)
x <- if_else(sum(vec)>10, 'Greater than 10', 'Less than 10')
print(x)

Both of these will work and give you the correct result. I find the shorthand to be especially helpful for creating new DataFrame columns or variables. However, the dplyr version will incorporate a few checks before returning a value.

Important Differences Between the Two

As I mentioned earlier in the article, if_else() is stricter than ifelse(). What exactly does this mean?

For one thing, if_else() will throw an error:

  1. If the test condition does not yield a True/False, or
  2. If the length of the result does not match the length of the conditional

This can be confusing to wrap your head around. We can explore what this means using the same use case from above.

First Point of Contrast: Expecting True/False

Let’s break down the True/False conditional expectation. We know that a line of code can result in varying data types such as numbers, characters, boolean etc. The if_else() statement expects a Boolean (True/False) output from the line of code that is entered in the first argument. If the result is anything else, such as an integer or a text string, it errors out.

The ifelse() statement on the other hand is very loose with its definition of a test condition. It does not check to see if the result of the first argument is a boolean value before continuing on to the remaining arguments. This can lead to incorrect True/False dependencies being executed.

As an example, using the same vec from above, assume we want to:

  • Assign a value to variable alert based on the value of the sum of vec. If vec adds up to more than 10, we want our alert variable to say “RED FLAG.” If not, we want the variable to say “Green Flag.”
  • Examine the value of variable alert if we alter the condition slightly like so sum(vec>10) #instead of sum(vec)>10.
  • Walk through what happens if we change the values in vec

Recall that the individual elements in vec add up to 11. Let’s construct an if_else() statement to assign a value to alert using the example condition from above.

alert <- if_else(sum(vec > 10), 'RED FLAG', 'Green Flag')

We immediately get an error. This is because sum(vec >10) evaluates to 0 because there are 0 individual elements of vec which are greater than 10. Since 0 is an integer and not a boolean value, our condition’s result does not supply the function with the data type it is expecting to continue. Thus we get the following error:

Screenshot of R console displaying “condition must be a logical vector, not an integer vector”
Error Message Displayed in R Console after running above code

If we do this same thing using the ifelse() variation instead, our alert variable will have the value “Green Flag,” which in this case is incorrect since the sum of vec is greaten than 10. This is happening because the conditional statement itself is incorrect. The statement is counting the number of individual elements that are greater than 10 rather than calculating the sum of all elements. Since the conditional results in a 0, the function assumes our statement is False and executes the False block.

alert <- ifelse(sum(vec > 10), 'RED FLAG', 'Green Flag')print(alert)
Printed value “Green Flag”
value of alert

If we reassign the first value in vec to be 11, the function will assume our statement is true and execute the True block because sum(vec>10) will result in the number 1.

vec <- c(11,0,2,5,3)alert <- ifelse(sum(vec > 10), 'RED FLAG', 'Green Flag')print(alert)
Printed value “RED FLAG”
value of alert after reassigning vec and running ifelse()

Technically, both examples are flawed because our conditional is incorrect. Neither evaluate to a True or False as it should. The difference is that the if_else() function catches that and errors out. While this used to annoy me at first, it is actually a life saver. I would have assigned an incorrect value to my variable which could have undesired consequences on my code down the road.

That being said, there are cases where we may want our function to be less strict. For instance, if our workflow needs to check for any value at all as opposed to a specific value, it may come in handy. I have found this function to be helpful when creating complex Rshiny apps with error catching. Sometimes it is difficult to anticipate all error types and ifelse() offers a quick solution for cataloging any kind of error that may come up. This can help create more elaborate tryCatch blocks. I’m sure there are more sophisticated solutions out there but I find that ifelse() comes in handy in cases like these.

Second Point of Contrast | Condition and Result Length

This one is a little bit harder to demonstrate but stay with me. The general idea is that if_else() will check to make sure that the result of the test condition and the value assigned in each case are equal in length. In the example below, I will create a DataFrame with column test_col. I will then add a second column that will indicate whether all of the values in test_col are unique or not. If the values are unique, I want every row in the result column to have a list with the following 2 elements:

  1. Count of the unique rows in the dataframe
  2. A text string saying “Unique Values”
## Create a df with one column
df <- data.frame(test_col = c(1,5,7))
## df looks like:
print(df)
Screenshot of test_col with values (1,5,7)
df with test_col

As you can see, all the values are unique. Let’s add another column using dplyr’s if_else() function:

df$dplyr_rslt <- 
if_else(n_distinct(df$test_col) == nrow(df), list(nrow(df),' unique values' ), 'Dupes')

Since we used quite a few functions in this line, here is a description of what we are expecting each component to accomplish

  • We are using the n_distinct() function to count the number of distinct values in column test_col.
  • We are then comparing these results to nrow(df). The nrow() function counts the number of rows in our dataframe.
  • If the two values are equal, we want each row in the column to contain a list object with two elements described above. To accomplish this, in our True execution block, we use the list() function.

What happens when we execute the code?

R will instantly throw an error and a new column will not be created. This is because the value we want to use when our condition is True contains 2 elements and is therefore of length 2. Our conditional returns a result of length 1 (True/False) which does not match our replacement (list of 2 elements). Essentially, the function is confused and does not know how to place this replacement in the output column so it does nothing.

Screenshot of R console displaying `true` must be length 1 (length of `condition`), not 2.
if_else() length mismatch error

Now let’s see what happens if we try using ifelse() instead.

df$base_rslt <- 
ifelse(n_distinct(df$test_col) == nrow(df), list(nrow(df),' unique values' ), 'Dupes')
#Let's see what df looks like now
print(df)
screenshot of base_rslt column displaying (3,3,3)
df base_rslt column

The code runs without any issues but the result is incorrect. We have a new column called base_rslt but each row consists of just a row count of (3). This is easy to spot here but you can see how this might get out of hand as data sets get larger.

Other Key Differences Between if_else() and ifelse()

I found that the differences in the sections above were hardest to wrap my head around and warranted more explanation. Below are a few more differences that are worth highlighting:

  1. if_else() has a built-in handler expression for NA values. The function has a 3rd optional argument: if_else(condition, yes, no, missing_values).
  2. if_else() preserves data types. For example, if the data type of the variable being tested is a factor then the resulting assignment will be a factor as well. However with ifelse(), this is not guaranteed. The resulting assignment may be a character instead of a factor.
  3. Base R’s ifelse() is slower. I’ll admit I haven’t been able to prove this one out however it has come up in a few posts. This post indicates that the dplyr if_else() was 70% faster for their use case. This guide on Efficient R Programming also mentions that dplyr’s version is faster.

In conclusion, the dplyr variation is safer to use in most cases because it is stricter but as mentioned above, there are instances where base R’s leniency comes in handy.

Additional Resources

  • Just Getting Started with ifelse()? Check out the official documentation.
  • Learn more about the dplyr version of if_else() here.
  • Interested in other conditional approaches? This blog post does a great job at comparing a few more advanced alternatives.
Statistics Without Borders Logo

Meet the Author

Author’s Portrait

Neha Anwer

Neha is a data science professional with domain experience in the financial advisory field. She has been a volunteer at Statistics Without Borders for over a year and most recently joined the SWB Marketing & Communications team. In her spare time, Neha enjoys traveling, curling up with a fiction read, and spending time outdoors.

--

--

Statistics Without Borders

Statistics Without Borders (SWB) is an apolitical probono organization under the auspices of the American Statistical Association.