How to Convert LEFT(), MID() and RIGHT() into R

Michael T Vu
5 min readNov 3, 2022

--

To begin with the topic today, let’s have a look on this scenario. Imagine when you have a dataset with multiple fields and there might be time you need to extract part of the string. It could be the beginning, the middle, or the end of the string. How would you do that?

In Excel, it is very simple to do in a second as the following:

If you are used to working with Excel, you know that we have LEFT(), MID() and RIGHT() function to extract part of the string. However, when working with R, you don’t have similar function like its counterpart. There are many ways to extract string in R, but it is useful for those who get used to Excel or those are new to R if those functions are available in R. It’s time to get your hand dirty.

Convert LEFT(), MID(), and RIGHT() from Excel to R

Converting LEFT() from Excel to R

To do that we need to understand how str_sub() function works. Basically, we need to pass the string, the position of the first character and the position of the last character. If you don’t pass the position of the string, it will get the default.

library(dplyr)library(stringr)df <- data.frame(CODE = c("100M885", "250K460", "871L420", "430T210"))my_left <- function(data,num){str_sub(data, end = num)}

data: pass your data to the function

num: the number of characters you want to extract from the left

end = num: passing num to end as the last position to extract in the str_sub(). This function we don’t need to set start = 1 because it has already been set by default as the first character.

Let’s try the newly function we have just created. We create a new column named LEFT and apply my_left() function, pass the data and the number of characters into it and see how it works.

df2 <- df %>%
mutate(LEFT = my_left(CODE, 3))

Here is how it looks. It works as our expectation. We get the first three character from the left of the string.

Next time, you can reuse this function without coding it again. Generally, you write an R file and put all your function there, when you start a new project, you just need to refer to the file source as an example below in your R script. I am not going off the main focus today. Let’s talk in another article.

source(“C:\\MyFolder\\left_mid_right.R”)

Convert MID() from Excel to R

In my_left() function, we only specify the end argument, because the system understand the beginning is 1 by default. However, MID() function we need to specify the start position to the end position where we want to extract the string. Let’s say

my_mid <- function(data, start_num, end_num){str_sub(data, start = start_num, end = end_num)}

This function is similar to my_left() but we specify the start position as we wish. Let’s try

df2 <- df %>%mutate(LEFT = my_left(CODE, 3),MID = my_mid(CODE, 4,4))

It extracts only the character in the position 4 in the middle of the string. You can try it yourself by changing the start_num and end_num. Now you have successfully converted LEFT() and MID() function from Excel into R. Let’s move on to complete the RIGHT() function.

Convert RIGHT() from Excel to R

Ideally, users only need to pass the data and the number of characters from the right of the string. How could we do it? This time is a little bit different with the other two when we build the function behind the scene.

Look at this string “100M885”, how does the system understand if we only pass the number 3. Whether 3 characters from the left or 3 characters from the right. The logic is to extract only the characters from the right of the string, we already know the end position of the string (we don’t specify the end position, then the system will take the last character as the end position). What is the start position of “885” in the “100M885”? By our naked eyes, we know the position is of 885 is 5, 6 and 7 respectively.

For example, we only need 3 characters from the right of the “100M885”. The only thing we need to specify is the start position. The length of “100M885” is 7. Here is the logic to get there. Take the length of the string minus the number of characters in the right plus 1 will return the start position. 7 — 3 + 1 = 5

start = 5

my_right <- function(data, num){str_sub(data, start = str_length(data)-num+1)}df2 <- df %>%mutate(LEFT = my_left(CODE, 3),MID = my_mid(CODE, 4,4),RIGHT = my_right(CODE,3))

It works like a charm. From now then on, you can reuse these handy functions in a similar way to LEFT(), MID() and RIGHT() in Excel. As time goes by, it saves you tons of time when using these handy functions. You do not need to write it again and again or sometimes you think about regular expression to split strings. These functions are very handy in many cases. If you find it useful, don’t forget to hit the clap button and follow me.

See you in the next one.

--

--