R Function Of The Week: ifelse() vs. if_else()

9 min readApr 6, 2022

For the longest time, I thought both of these were exactly the same until I started seeing errors in my results.

This Article Contains

• An overview of both functions and general use case
• Important differences between the two approaches with examples
• Other key differences to consider
• Additional Resources on `ifelse()` and `if_else()` as well as other conditional programming functions

This week at Statistics Without Borders, we discuss the differences between `ifelse()` and `if_else()`. For the longest time, I thought both of these were exactly the same until I started seeing errors in my results. I tried to understand the difference between both functions, but most of the posts out there just describe dplyr’s `if_else()` as being a stricter version of base R’s `ifelse()`. While this is true, it doesn’t necessarily give users a good understanding of when to use which one. In this article, I will attempt to make these distinctions clearer.

Overview of Both Functions

If you’ve used R before, you probably know that you can use `if`, `elif`, and `else` to control the flow of your program. Below is an example of how I would assign a value to a new variable `x` based on the sum of a vector using the `if(){}else{}` code block.

`## Create vector vecvec <- c(1,0,2,5,3)## The sum of vec is equal to 11sum(vec)if (sum(vec) > 10){ # Value if true goes in first curly bracketsx <- 'Greater than 10'}else{ # Value if false goes in second curly bracketsx <- 'less than 10'} # End if else blockprint(x)`

This certainly gets the job done, but it is a lot of code for a simple operation.

At a high level, both `ifelse()` and `if_else()` can be used as short hand for the longer `if(){}else{}` code block. Using either of these, we can cut the code down to one line like so:

`## Using ifelse()x <- ifelse(sum(vec)>10, 'Greater than 10', 'Less than 10')print(x)## Using dplyr's if_else()library(dplyr)x <- if_else(sum(vec)>10, 'Greater than 10', 'Less than 10')print(x) `

Both of these will work and give you the correct result. I find the shorthand to be especially helpful for creating new DataFrame columns or variables. However, the dplyr version will incorporate a few checks before returning a value.

Important Differences Between the Two

As I mentioned earlier in the article, `if_else()` is stricter than `ifelse()`. What exactly does this mean?

For one thing, `if_else()` will throw an error:

1. If the test condition does not yield a True/False, or
2. If the length of the result does not match the length of the conditional

This can be confusing to wrap your head around. We can explore what this means using the same use case from above.

First Point of Contrast: Expecting True/False

Let’s break down the True/False conditional expectation. We know that a line of code can result in varying data types such as numbers, characters, boolean etc. The `if_else()` statement expects a Boolean (True/False) output from the line of code that is entered in the first argument. If the result is anything else, such as an integer or a text string, it errors out.

The `ifelse()` statement on the other hand is very loose with its definition of a test condition. It does not check to see if the result of the first argument is a boolean value before continuing on to the remaining arguments. This can lead to incorrect True/False dependencies being executed.

As an example, using the same `vec` from above, assume we want to:

• Assign a value to variable `alert` based on the value of the sum of `vec`. If `vec` adds up to more than 10, we want our `alert` variable to say “RED FLAG.” If not, we want the variable to say “Green Flag.”
• Examine the value of variable `alert` if we alter the condition slightly like so `sum(vec>10) #instead of sum(vec)>10`.
• Walk through what happens if we change the values in `vec`

Recall that the individual elements in `vec` add up to 11. Let’s construct an `if_else()` statement to assign a value to `alert` using the example condition from above.

`alert <- if_else(sum(vec > 10), 'RED FLAG', 'Green Flag')`

We immediately get an error. This is because `sum(vec >10)` evaluates to 0 because there are 0 individual elements of vec which are greater than 10. Since 0 is an integer and not a boolean value, our condition’s result does not supply the function with the data type it is expecting to continue. Thus we get the following error:

If we do this same thing using the `ifelse()` variation instead, our alert variable will have the value “Green Flag,” which in this case is incorrect since the sum of `vec` is greaten than 10. This is happening because the conditional statement itself is incorrect. The statement is counting the number of individual elements that are greater than 10 rather than calculating the sum of all elements. Since the conditional results in a 0, the function assumes our statement is False and executes the False block.

`alert <- ifelse(sum(vec > 10), 'RED FLAG', 'Green Flag')print(alert)`

If we reassign the first value in `vec `to be 11, the function will assume our statement is true and execute the True block because `sum(vec>10)` will result in the number 1.

`vec <- c(11,0,2,5,3)alert <- ifelse(sum(vec > 10), 'RED FLAG', 'Green Flag')print(alert)`

Technically, both examples are flawed because our conditional is incorrect. Neither evaluate to a True or False as it should. The difference is that the `if_else()` function catches that and errors out. While this used to annoy me at first, it is actually a life saver. I would have assigned an incorrect value to my variable which could have undesired consequences on my code down the road.

That being said, there are cases where we may want our function to be less strict. For instance, if our workflow needs to check for any value at all as opposed to a specific value, it may come in handy. I have found this function to be helpful when creating complex Rshiny apps with error catching. Sometimes it is difficult to anticipate all error types and `ifelse()` offers a quick solution for cataloging any kind of error that may come up. This can help create more elaborate tryCatch blocks. I’m sure there are more sophisticated solutions out there but I find that `ifelse()` comes in handy in cases like these.

Second Point of Contrast | Condition and Result Length

This one is a little bit harder to demonstrate but stay with me. The general idea is that `if_else()` will check to make sure that the result of the test condition and the value assigned in each case are equal in length. In the example below, I will create a DataFrame with column `test_col`. I will then add a second column that will indicate whether all of the values in `test_col` are unique or not. If the values are unique, I want every row in the result column to have a list with the following 2 elements:

1. Count of the unique rows in the dataframe
2. A text string saying “Unique Values”
`## Create a df with one columndf <- data.frame(test_col = c(1,5,7))## df looks like:print(df)`

As you can see, all the values are unique. Let’s add another column using dplyr’s `if_else()` function:

`df\$dplyr_rslt <-   if_else(n_distinct(df\$test_col) == nrow(df), list(nrow(df),' unique values' ), 'Dupes')`

Since we used quite a few functions in this line, here is a description of what we are expecting each component to accomplish

• We are using the `n_distinct()` function to count the number of distinct values in column `test_col`.
• We are then comparing these results to `nrow(df)`. The `nrow()` function counts the number of rows in our dataframe.
• If the two values are equal, we want each row in the column to contain a list object with two elements described above. To accomplish this, in our True execution block, we use the `list()` function.

What happens when we execute the code?

R will instantly throw an error and a new column will not be created. This is because the value we want to use when our condition is True contains 2 elements and is therefore of length 2. Our conditional returns a result of length 1 (True/False) which does not match our replacement (list of 2 elements). Essentially, the function is confused and does not know how to place this replacement in the output column so it does nothing.

Now let’s see what happens if we try using `ifelse()` instead.

`df\$base_rslt <-   ifelse(n_distinct(df\$test_col) == nrow(df), list(nrow(df),' unique values' ), 'Dupes')#Let's see what df looks like nowprint(df)`

The code runs without any issues but the result is incorrect. We have a new column called `base_rslt` but each row consists of just a row count of (3). This is easy to spot here but you can see how this might get out of hand as data sets get larger.

Other Key Differences Between if_else() and ifelse()

I found that the differences in the sections above were hardest to wrap my head around and warranted more explanation. Below are a few more differences that are worth highlighting:

1. if_else() has a built-in handler expression for NA values. The function has a 3rd optional argument: `if_else(condition, yes, no, missing_values)`.
2. if_else() preserves data types. For example, if the data type of the variable being tested is a factor then the resulting assignment will be a factor as well. However with `ifelse()`, this is not guaranteed. The resulting assignment may be a character instead of a factor.
3. Base R’s ifelse() is slower. I’ll admit I haven’t been able to prove this one out however it has come up in a few posts. This post indicates that the dplyr `if_else()` was 70% faster for their use case. This guide on Efficient R Programming also mentions that dplyr’s version is faster.

In conclusion, the dplyr variation is safer to use in most cases because it is stricter but as mentioned above, there are instances where base R’s leniency comes in handy.

Additional Resources

• Just Getting Started with `ifelse()`? Check out the official documentation.
• Learn more about the dplyr version of `if_else()` here.
• Interested in other conditional approaches? This blog post does a great job at comparing a few more advanced alternatives.

Meet the Author

Neha Anwer

Neha is a data science professional with domain experience in the financial advisory field. She has been a volunteer at Statistics Without Borders for over a year and most recently joined the SWB Marketing & Communications team. In her spare time, Neha enjoys traveling, curling up with a fiction read, and spending time outdoors.

--

--

Statistics Without Borders (SWB) is an apolitical probono organization under the auspices of the American Statistical Association.