Why You Should Build That R Package Already (Especially for Grad Students)

Scores of people use R in their professional or academic endeavors, and many others join the club each day. R is increasingly the best choice for getting the job done in a fast and elaborate way.

This post is not a direct rant about R, but it is written to urge somewhat non-beginner R users (i.e. say, you are non-beginner if you started using R as a regular part of your day) to start working on their packages. A rite of passage, one might daresay.

You should turn that spaghetti of a code of yours into an R package ASAP.

Without further ado, here is the pitch. It is a good pitch. It is the best pitch, believe me. Those who do not follow my words are pathetic. SAD. (I probably shouldn’t have written that, but whatever I’m keeping it.)

“OK, I am already sold. Show me the way.” The reasons I am about to describe uses the procedure building your package using devtools and roxygen2 packages, and using GitHub. Sure, there are other ways but that was the most convenient to me. Start from here: [1], [2], [3], [4].


Reason 1: Portability

All your code, at your service, with a single line of code of library(myabstrusepackagename). You can take it to any machine easily and if you use GitHub as a repository, just download it using devtools::install_github. It works like a charm, your research goes with you to everywhere.

Reason 2: Reproducibility

Ok, despite the late surge of interest, reproducibility is still a hipster thing. Reproducibility can be defined as “If I repeat your experiment with your methodology and with your data, I should come to your exact conclusions. If I repeat your experiment with your methodology and with my data, I should come up with similar conclusions.” Fundamental scientific property.

Today’s academic world is filled with noise (i.e. irreproducible studies). Perhaps, publish or perish rule is to blame. This is not a piece about reproducibility in science, so I will just drop it here.

Dealing with somebody else’s data and code is the utter nightmare. But also, your study should be reproducible so others can confirm your results and build on it (and citations will come). You should make it easy for others to use your research. Your R package is the way to establish reproducibility.

Convenience and Order

Reason 3: Convenience & Order

No more different versions of the same .R files in the same directory (you can use branches instead if you know your way around Git). No more code in different directories. No more lost code in your hard drive or flashdisk, written in a burst of productivity. It also urges you to write better, more careful and with more planning ahead. These are extremely good traits of a work often sacrificed for speed or experimentation.

Just code for the package, follow simple rules, easily build your pipeline.

Reason 4: Documentation & Presentation

One of the best properties of R was calling the immediate help function with ? and the function name e.g. ?thefunctionididntknowabout. With the help of roxygen2 package, you can write documentation just above your functions and everything will be sorted out for you!

You don’t need to write documentation immediately, but you can always write your documentation and it will make it easier for everyone (including you).

R has also great storytelling tools such as R Markdown and Shiny. You can embed document templates or user interfaces in your package. So, you can demonstrate the power of your work with a single line of code even to those who have no idea about it.

R has the power of demonstration at the lowest labor cost.

Reason 5: A product (at no marginal cost)

This is the biggest reward of all. You are going to have a product which you will use extensively and repeatedly. No more lost code after you graduate. You can use it elsewhere, improve it, build on it and always have a working product (old code have the habit of malfunctioning). Never lose your code again.

You can then use your product as a foundation stone for your commercial project and even build a startup around it. Integrate your product with other tools. Make it a part of something bigger. You will never ever have to start from scratch again.

For instance, an educational package I created before helped me (and Nezih) to build another product to create a scenario analysis screen for the Turkish Referendum which predicted the outcome with high accuracy. See the story here.

Final Say

  • If you are a grad student, start an R package. Demonstrate your work to your thesis supervisor and your peers easily.
  • If you are a researcher, start an R package. So, you will not lose track of your research project and always have a reliable tool.
  • If you are a professor, ask your students to build an R package. So their progress can be improved, their work can be screened and made reusable.

Also, don’t hesitate to contact me if you have any questions about R and packages. I will be glad to help out.