Tips for organizing your R code

Embrace every tip that will help your R code be more readable, organized and cleaner

Tomaz Kastrun
R-evolution
6 min readJan 26, 2023

--

Keeping your R code organized is not as straightforward as one might think. Just think about the libraries, variables, functions, and many more. All these objects can be defined and later rewritten, some might get obsolete during the process.

This process is proven to be even more crucial when you are part of a larger group of engineers, and scientists, who collaborate with you.

Photo by Raymond Rasmusson on Unsplash

Motivation

The most important step toward code reproducibility is to keep code organized, and atomic, storing it by layers or components and keeping up-to-date documentation. Because R language is a scripting language, organizing files into directories and subdirectories is important for later re-usage, and collaboration with different departments in an organization.

Ideally, the names of files and subdirectories are self-explanatory, so that one can tell at a glance what data files contain, what scripts do, and what came from what.

The following R tips are based on frequent problems, many organizations are facing. All R samples and themes are created to be fictional and can be attached to your organizational environment. The dataset used is the iris dataset, used to show the custom theme and use of functions. All images unless otherwise noted are by the author.

1. Organising R files

You can always call R files, libraries, functions, and settings from a different file. This gives a great segue to creating a folder structure, where each developer can clone or access and get all necessary files, themes, and functions that the organization is pushing.

Organising R folders and R scripts

Structuring R files, functions, data and many more is an essential step toward reproducibility.

Using Projects is a great place to start (also available in Posit — RStudio), but you can always create your own structure, that will help you with code and file organization.

2. Installing and attaching R libraries

Installing and attaching R libraries is in almost all cases part of the R code. Whenever you are writing R code, there will be a point, that you will be referencing to an external library.

You don’t want to install and attach single or multiple libraries as the sample of the pseudo-code below.

Instead, you can create a string vector with libraries names and install them if they do not exist and attach them with a shorter code:

In many enterprise environments, you might have issues with installing some packages on your local disk. These packages may contain *.zip or *.exe files and the security policy will deny them. In this case, the best solution is to install dedicated folder(s), as introduced in “Organise R files”. You will have to add the installation path and loading path for the packages.

The next step is to use a TXT file and write down all the packages needed for an R project/script. Let’s create a requirements.txt file (just like with YAML, Python,…) and put it inside package names.

Consider two R packages to help you achieve installation from the requirements.txt file. These two are requiRements and versions. Both are similar if your package list is stored in requirements.txt, but the versions function will also take the package version as input, which brings a whole new capability. On the other hand, the base function install.packages() gives you the possibility to specify in detail the arguments as repos, lib (path), destdir and many system variables. But these will be essential to store packages at the desired location.

If you want to simplify the process, you can always create a ZIP file of all working packages and restore and install it at any given time. In this case, it is advised to add the R version in the ZIP file as well.

Protip: use library() instead of require(). The first one will fail and give you a warning, whereas, the require() will silently fail, causing you later failures in the code.

3. Use the corporate themes

Every corporate environment should follow the theme, with predefined colours, table design, pixel-perfect diagrams and positions. With the ggplot package, you can create a theme, that will follow your design guidelines.

Furthermore, adding a condition to your theme, that the same colours will always reflect the same KPI, can also be achieved with the themes.

Besides graphs, tables can also follow a similar theme. R package flextable offers a great framework for creating tables with astonishing formats, layouts, cell formats and plotting capabilities.

With both packages, you will be able to create corporative reports, with capabilities to export them to different tools or formats (word, PDF, PowerPoint, HTML, and others).

4. Use coding practices and never forget to document

There are many sections, that will improve your code readability and reusability. I have grouped them into scopes, that each delivers better code.

Documenting code

  • Starting your code with an annotated description of what the code does when it is run will help you when you have to look at or change it in the future. Give the author name, date and, change log.
  • Loading all of the dependencies, packages and files in accordance with your file structure. Also, add the global environments and R engine version. In addition, a nice way to do this is also to indicate which packages are necessary to run your code.
  • Use setwd()to determine the files (script, project or packages) location, unless there are standards in your organisation, that make this obvious.
  • Use comments to mark off sections of code.
  • Comment your code with care. Comments should explain the why, not the what. Add comments to your function with the added description of all input arguments and result set.

Syntax practices

  • Place spaces around all operators (=, +, -, <-, boolean, etc).
  • Use <-, not =, for the assignment.
  • When using packages with similar function names, add a package name to the function: dplyr::filter() and a Filter() function from base R.
  • To improve readability, indent the code inside the curly braces. You can also use the formatR package to help you refactor and indent your code.
  • Factor out common operations rather than repeating them. And keep your code in smaller chunks. If a single function or loop gets too long, consider looking for ways to break it into smaller pieces.
  • There is a 80 characters line, that will help you comfortably fit code on a printed page at a reasonable size. If you find yourself running out of room, consider encapsulating some of the work in a separate function.

Naming convention

There are many naming conventions to choose from and all are ok, as long as you are using the selected one consistently. I will just list a few:

  • alllowercase: e.g. irisdataset
  • period.separated: e.g. iris.dataset
  • underscore_separated: e.g. iris_dataset
  • lowerCamelCase: e.g. addIrisDataset
  • UpperCamelCase: e.g. AddIrisDataset

Keep names concise and meaningful, nouns and verbs should be used in functions and variables. Give the function a verb, eg.: add, calculate, reduce, and give a variable a noun, eg.: calculatedNumbers, vectorOfValues.

In general, you can separate helper functions with a prefix of “.”. And also distinguish between local and global variables, data objects and functions.

Also, store your files with meaningful names and always store them in *.R

Posit — RStudio tips

  • Choose your IDE. Consider using Posit — RStudio. My second favourite for writing R code is Visual Studio code.
  • There is no need to save the current workspace if you are writing reproducible code. You should be able to reproduce the workspace by re-running your script.
  • Keep track of data, variables, and functions versions, and use also integrated facilities to access SVN or Github.
  • R projects are a great way to organize your script files, and your outputs consider using Markdown to prepare finalised reports of your analysis
  • Check the memory used, use a garbage collector (gc()) and it always helps to keep session information in your project.

The complete code is available on Github. Follow this link.

Tomaž Kaštrun is a data geek, in data mining and data science, and enjoys working with data. Community is core to technology development. Microsoft data platform MVP. Github: http://www.github.com/tomaztk

More content at medium.com/r-evolution. Follow us on Twitter and sign up for our free weekly newsletter.

--

--

Tomaz Kastrun
R-evolution

Data Platform MVP, Data scientist, Geek. Community is core to technology development.