Good practices when starting your statistical analysis project in R.

Maureen Waitherero
4 min readFeb 5, 2017

--

Before coding starts in any project, it is important to keep in mind some good practices that in the long run save your time and make your code easier to ingest for a collaborator, a client or the future-you. I’ll give some of my thoughts on some practices I have made ritual before starting any statistical analysis project in R.

We will cover;
i). Setting up working directory.
ii). Commenting.
iii). Labeling of scripts.
iv). Apply good coding style.
v). Version control.

i). Setting up working directory.

A working directory is the default location that R looks for files to bring in and where to store files as outputs.Setting a working directory is good practice and convenient for you to know where all your files are located.

R is always pointed at a directory on your computer. You can find out which directory by running the getwd (get working directory) function; this function has no arguments.

getwd()  #Get current working directory

To change your working directory, use setwd() and specify the path to the desired folder.

setwd("/home/maureen/Documents/R-Analytics") # set working directory

ii). Always put comments in your code generously!

Comments are small notes you leave yourself inside your code.
Often times we write some piece of code and don’t come back to it for some time. Commenting is important for documenting what your code is doing for yourself and others.

Use “#” at the beginning of a line to make a single-line comment .Highlight the text and use CTRL+SHIFT+C to multi-line comments.

Commenting makes your code “friendly” and it takes a shorter time to do it as opposed to the time the future-you or coauthors/colleagues will spend deciphering unfriendly code.

iii). Label script

I find it really useful to name your script according to what it does i.e if your script imports data it is then only appropriate to name it import_data.r
I have made it a habit of including name of author, date and purpose of script as the first three lines in any script I write.

#Author: Maureen Waitherero Wachira
#For:R basics SERIES
#Date: 3rd October 2016

iv). Apply good coding style.

Good coding style is makes your code easier to read in terms of indentation and spacing among other factors.
Hadley Wickham explains in detail his preferred coding style which I have been trying to adopt here. It has considerably made my code easier to read, I hope you adopt this too.

v). Use version control.

A few years ago whenever I started a project my flash drive became my most precious possession. I had different versions of my project saved after every milestone in case I messed up and needed to retrieve an earlier version.This would mean I had multiple versions of my project file in my flash drive . It was not an ideal situation.

What is “version control”, and why should you care?
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

I was introduced to GitHub a code sharing platform based on a popular version control system named Git. Now once I start a project I create a repository in GitHub for the said project. This is creating an initial empty project file in this platform and will be considered as version one of the project.
This is the file I will work on in my local computer.

Now once I reach an important milestone in the project file instead of just saving the file I perform a save-like operation (“commit”) commanding the version control system to store this particular version and include a specifying message stating the reason for the change (from previous version) or what it accomplishes. We now have version two of the project.

When another commit is made this acts as version three of the project. However the previous version remains in history where its changes can be examined at a later time.This means I can review version one ,two and three.If need be I can still revert back to version one or two from version three.

GitHub allows me to work on only one file in my personal computer while having different versions of my project in my GitHub repository.This acts as an important backup and can also solve the problem of reviewing and retrieving previous changes and allow single files to be used rather than duplicates.

Learn more on version control from this online tutorial.

Final word

I hope these become a habit for you too!

--

--