I am an R user for more than 6 years. My introduction to R was in a PhD course of Monte Carlo Simulation in Finance by Wolfgang Hörmann. I have previous experience with other languages also; most recent is C++ (which I love, too.) But, none of them are as convenient as R.
Before I start to list why R is the best language for the rest of us, let me clarify the “rest of us” part. We, the Excel users, SPSS users, not computer science (CS) people*; in other terms, “customers of programming” who strive for simplicity, yet search for power and flexibility. R is the best programming language for us.
I will make some comparisons with Python because it is the closest peer of R. Sorry in advance, Python users. I love you, too.
*Actually, I am a bit of a CS person (but only a bit). Nothing wrong with that :)
1. Ready for Work
Just download R and install. You can start programming immediately. Of course, it will be better if you have an editor like Atom or R Studio. But, even without them you are up and running.
If or when you work with Python, you will see the initialization is not straightforward. You can run from script in Shell or you can use IDLE. You can even use bundlers like Anaconda. Just ask a newbie to install and run some commands in Python.
Python has also two versions: 2.x and 3.x. Even if they say 3.x is now taking over (naturally) there are loads of stuff depending on 2.x. Confusion starts from the beginning.
2. Simplistic Design
R language is not initially designed by CS people but statistics people. So, some very convenient conventions for the rest of us can be preposterous to CS people. Let me give you a simple example.
Suppose we have an “object” which holds letters A-B-C-D-E. Each element of the object holds a letter. Let’s name it as myobject.
In R, myobject returns “A”. In Python it returns “B” because index starts from 0 in Python. myobject returns “A”. When I tell CS people R object index starts from 1, they literally gasp. Let’s say we can be over it. Next is better.
In R myobject[1:3] returns “A”,“B”,“C”, it literally says index 1 to 3. In Python it returns “B”,“C” because the end point is not included. To get the exact ABC, it should be myobject[0:3]. It will get index 0–1–2, therefore “A”,“B”,“C”. So, literally WTF?!
In R myobject[-1] returns all elements except the first one; “B”,“C”,“D”,“E”. In Python, it returns the last item “E”. So, myobject returns the second item but myobject[-1] returns the first item from the end? That’s a bit too much.
I will not even talk about indentation vs bracketing discussion. In my opinion, brackets are educational and useful. Python forces you to use indentation, but it is not a big deal.
In a similar fashion R has no “methods” (e.g. object.method()), only functions. Yeah you can define a function in an R object (object$method()) but who does this? You don’t need to.
You can literally do thousands of stuff in R. For starters, just check the Task View. It is not limited to the task view. CRAN (the App Store of R) has more than 10.000 packages right now.
Whatever you are doing, or need to code, probably there is an R package for that. There are many statistical models and other tools out there for a variety of needs.
Most of the latest advancements in the literature are presented as R packages. Because installing and loading packages in R is a no-brainer. Plus, you can upload your packages to GitHub.
CRAN is a high quality repository, therefore package uploading can be painful there. So, many R developers put their packages on GitHub first, then send them to CRAN.
To install a package just write install.packages(“packagename”) once (it downloads the package and its dependencies) and write library(packagename) to start working with it.
4. Documentation and Community
R might not have the largest community out there, but it definitely is one of the best. Almost every problem you encounter with R, you can find someone with the same problem on StackExchange or similar platforms. If not, there are more than enough people out there to help you.
Documentation is also amazing. If you don’t remember the use of a function just put a ? before the name of the function on the console (e.g. ?functionname). It will open a help file with the information about the process, parameters and outcomes. At the end, there is also an example section (for the most of them) where you can see how the code works in real time and you can replicate the results.
5. Data Structures
Data structures in R are pretty easy to comprehend and keep their properties in mind. They also get along quite nice with each other. Sure, vector calculations can be a bit complicated at first, but R allows you to skip vector-mind calculations (at the expense of speed, though).
6. tidyverse (especially dplyr & ggplot2)
In my opinion tidyverse package is the sweet spot of R. It is actually a meta-package which consists of several packages such as haven, forcats, stringr, tidyr, readr, dplyr and ggplot2. Each of them has great functionality but on the surface two packages are very important. dplyr is for data manipulation and ggplot2 is for plotting. They are quite elegant, unbelievably easy to understand and they have great capabilities that can be expandable with some proficiency or extensions.
7. Sick Reporting Features
Imagine you have a magic wand that turns code into full featured documents, reports and even books. In R, that wand has several names; rmarkdown, knitr / sweave and bookdown. Those packages with the help of pandoc, you can turn R code into output (pdf, docx, html whatever you like).
8. R as an Interface Tool
I should admit, R is still in infancy here. But, it has a package that gets the job done at least for prototyping purposes: Shiny. It simply turns your code into an interactive web page.
Even the weaknesses of R cannot be a reason to not to use.
- R is not well suited to big data projects, because it functions in-memory. So, stuff like writing to and reading from hard disk needs work. That’s ok, because for the most of the cases the provided memory will be enough. If not, then there is always the possibility to set up a VPS and get enough resources.
- Base R does not support parallel programming. Packages that do are limited. That’s also ok because parallel programming requires a special “style”. Since I’m advocating R to non-CS people, they usually don’t care about such stuff.
- R has odd syntax. This one is actually not true. It may be odd to CS people. When I tell the CS people that index starts from 1 not 0, they cringe. Though, it is generally accepted that first element is named 1.
10. Final Words
Start using R. Start from here.