Introduction to R for Data Science (Part Five)

This is the fifth introduction to R. This would cover apply, math functions, Dplyr, and its features.

Ivan Huang
3 min readMar 30, 2023

*Originally published on my Substack. This is just a part of the article.

PS: Please read ‘Introduction to R for Data Science (Part Four)’ before reading this one. This is a continued version of part four.

Part four: Introduction to R for Data Science (Part Four)

Apply

Using the sample statement is going to randomize a number. It will give us a different number every time you want to run the code.

lapply() is going to take an input function and a vector(can be a list) and it is going to apply this function to every element in the vector. So in a nutshell, it’s saying go to every number and add a random number to it. It will store it as a list.

But sometimes, we don’t want a list, so we can use sapply(). This will give us a vector of five(since we have five numbers) with random numbers.

Math Functions

Basic math functions in R:

  • abs()
  • sum()
  • mean()
  • round()

abs() will give you the absolute value.

sum() will return the sum of all the values.

mean() will give you the mean.

round() will round the decimal. In this case, I want it to round by the second digit, so I put a comma two after the decimal. You can adjust it and put whatever number you want it to round to.

Here is a reference card to have when programming with R: https://cran.r-project.org/doc/contrib/Short-refcard.pdf

Regular Expressions

We’re going to focus on:

  • grepl()
  • grep()

grepl() is going to take in the term you’re searching for. In this case, I want to search for “there”. The second thing grepl() will take is the actual thing you want to search which is text in my case. This returns TRUE because ‘there’ is inside the text. If I put something else that isn’t in the text, it will return FALSE.

grep() will return the index location. In this case, I want to know where ‘b’ is, so I’ll use grep() and the results show that it is in the second row. If there was multiple ‘b’ it will show multiple locations like how I did it with ‘c’.

Data Manipulation

--

--