Basics in R Programming

You are about to begin a project on R? Before you watch any tutorial, read these basic standards.

Alexandre Sapet
Sicara's blog
4 min readApr 29, 2019

--

Read the original article on Sicara’s blog here.

I spent my last 8 weeks on R and I must admit that, after many months on Python and JavaScript, I almost got knocked out by R ways of working.

As a consultant, I am led to work on different projects in a team. My last one brought us to develop a tailor-made R solution in the style of a Python scraping project. Sadly, the client environment made us use R. Are you used to develop on a state-of-the-art IDE? Are you familiar with basic programming principles? Then read the following!

Customize RStudio

You should begin by creating the best programming environment so that the hours you will spend in the near future are the more comfortable. The last thing you want to do is to spend hours on a trivial “bug” such as:

Yes, it took us hours to understand that this O is not a 0

So, start with configuring RStudio:

  • Choose a font that can’t trick you (Global Options > Appearance)
  • Choose a theme that fits your eyes and your taste (Global Options > Appearance)

Moreover, if you want to use git correctly:

  • Remove trailing whitespaces on saving (Global Options > Code > Saving)
  • Make sure that your files end by a new line (Global Options > Code > Saving)
  • Encode your files with UTF-8 (Global Options > Code > Saving). Note that this command may alter the opening of differently encoded files.

Naming standards

Only in R could I see as many naming conventions as this:

dot.case, camelCase, you name it.

You should choose one naming convention and respect it throughout your project. Indeed, fixing it afterward with multi-cursor won’t work 90% of the time. Do not let yourself be influenced by the disparity of R.

Not convinced by the naming conventions? Check out this article.

Unit-tests

As long as you develop your code as a package, R offers an easy testing environment. Keep in mind that the more you unit-test your code, the more confident you can be in what it actually does.

If you don’t know unit-testing or don’t see the point, give 5 minutes to this StackOverflow thread.

Hidden behaviors

There are many things that happen behind your code in R and some are not straightforward. Here are a few behaviors we discovered on a span of 8 weeks.

Checking if a variable is NA

One of our regular pain points was checking if a variable is NA. There are lots of ways to do it:

  • variable == NA: the double equals operator checks if the value of your variable equals NA. This operation has no sense in R (if you want more details, you can refer to this StackOverflow thread) and won’t work.
  • is.na(variable): this function is optimized for tables. It is performed element-wise and thus creates a mask suited to your variable.
  • identical(variable, NA): this function is reliable to test if a variable is an atomic vector with single value NA. Nevertheless, it won’t work on other NA types in R.

Indeed, R contains different types of NA. But R also allows functions to return custom NAs, such as in the package “rvest”.

If you have this kind of issues with NA values, you should use anyNA(variable).

Unexpected Autocompletion

R has a tendency to autocomplete a few key elements:

  • functions’ arguments (that are also keyword arguments):
  • column names when manipulating data frames:

It didn’t bother us but I can imagine plenty of situations where it could have.

EDIT: You can make RStudio display warnings when such autocompletion happens, editing your .Rprofile:

R is a functional programming language

Coming from Python, with a more procedural/imperative programming use, I was surprised by the following behavior. R is a functional programming language, meaning that every call, every expression is a value.

Read the full article on Sicara’s blog here.

--

--