Making R learning simple (for everyone)

As a doctor, beginning to learn coding is a steep curve of a gorgeous girl way out of our league. Learning R is taking it up several levels. It is indeed a difficult path.

But fear not. I have, since being troubled by my programming endeavours, devised a strategy to emerge triumphant amidst the rubble of codes. This is not a retrospectively contrived plan (where I’m remembering stuff with bias on what worked) but rather a prospective set of rules I will be employing to make the best of my (and, hopefully, your) time and efforts. Should you choose to join me in this journey, please bear in mind — I don’t promise the world; perhaps, at least, some satisfaction in the knowledge of how things work behind the scenes and encouragement that we, too, can join the fray.

To begin with, I will follow three (general) rules to learn this language:

  1. Always seek to know best practices and their rationale.
  2. Pseudo-code before the actual code, i.e — write down what you want to do (the algorithm) in plain english, first.
  3. Keep things simple (perhaps, the most difficult).

Let’s start with Day#1 (today):

  1. R is a vector-based language

What the hell does that mean? I’m then told it’s an array programming language. Go figure, still hadn’t the faintest clue. I proceeded to understand this from the very basic. Imagine this matrix (a data structure where all the variables are of the same class like a number or an integer):

3 x 3 matrix

Lets say you wanted to perform some calculation on each number (of ‘integer’ class as above) in each cell of this matrix (or otherwise known as ‘elements’); for instance, you want to divide each of them by 5. In traditional scalar-based language, you’d have to write the code for each element and apply them separately which would be unpleasantly long. In R, its a single-line command, applied once and is executed on each element without being explicitly specified. This is what R being vector-based really means. We’ll get to the nitty-gritty as we go along but understanding this is fundamental.

2. Work with dataframes for an easier life

There are many different types of data structures in R but two emerge to be ubiquitous: matrix and dataframe. The latter has the advantage of working with variables of different classes and hence most widely used in analysis. Matrices are more memory-efficient. But I got plenty. So henceforth, I will be reading in my dataset in R as dataframes (read.csv command).

3. Operators are k!=ng

Since dataframes are stored in objects, most of our analysis would be centered on these objects. The right way to ‘objectify’ dataframes is with <- (assignment operator), although you could also use = but it is strongly discouraged. A list of common operators can be found here.

4. Use RStudio, not base R… again, for an easier life.

One of the many advantages, I realized today, is the relative lack of quote-unquote (“”) errors in RStudio. Start using RStudio as your default R platform and you’ll know why. Plus the 4 quadrant layout is visually appealing than wrangling in 4 pop-up-like tabs in base R.

5. R is made of functions

Functions are rules to execute long pieces of codes to accomplish a task. R has built-in functions on which it runs everything else. We can import other functions by downloading packages (install.packages) and then loading (library) them in our workspace. We can also write our own functions and this prospect is what turns me on more than anything else. You could never write your own functions in other platforms like Stata and this is a key feature which sets R apart from the rest.

Thats all for today. For beginners (like me), intermediates and advanced programmers, a word of caution: the above lessons are my own and not set in stone. I’m quite certain there are better ways to optimize our work or alternative rationale on, for example - why we should work on base R only. Whatever the case, please do share your experiences, thoughts, ideas and criticisms.

A google search for learning R will return an array of resources. One site I found particularly pertinent for our “keep things simple” approach to learning R is the Nice R code blog. My next challenge will be to create my own functions relevant for my data and I will share my experience in the next article.