The 5 Most Effective Ways to Learn R
Brian Back

I think the notion that there is a single “effective” way of “learning” R (or, indeed, anything else) is a bit misleading.

My background is in academic research and I came to “learn” R as a more flexible alternative to using stats-only programs like SAS or Stata, without any serious programming experience after my last CS class more than 10 years before I came across R. I thought had had become a fairly decent at coding in R, to get it to do the kind of things I needed it to, like simulating certain situations or manipulating data as I found necessary, as well as running statistical algorithms that I needed to run, both of my own making and others’ packages, but not terribly good at “programming” in the sense that those who are primarily programmers would recognize. As far as how “good” I was as an R user, well, your mileage could easily vary.

This became a bit of serious problem when I left academia: I found that a lot of R users were vastly better programmers than I was, but were rather naive on the research design and statistical side of things, even when it came to using R and various packages that come with it. They could scrape data, manipulate it, and run algorithms much more efficiently than I could, whereas I still flounder with rather basic functions and take a long time to code because of them. But they were not really understanding, in some cases, why what they were doing was potentially problematic for the problems that they were trying to deal with (e.g. when the problem is fundamentally grounded on systematic sampling bias, no amount of cross validation would address the problem — indeed, since a lot of data sciencey techniques are designed to suppress subtle patterns to guard against overfitting, they are actually destroying potential insights as opposed to blowing up some subset of the data.). It was simultaneously a great opportunity and a potentially disastrous situation (and I’ve been in both): with good communications, it could provide for an effective collaboration, with each side having chance for potentially useful contributions that the other is short on; without good communications, it could be an occasion for disaster.

The bottom line: being a “successful” R user is a multidimensional thing. R is not a general programming language, but a package designed primarily for statistical analysis, and to be a good R user, one needs to be cognizant of the statistical side of the ledger as well as the programming aspects. For people like me, who are not natural programmers, having practice at the programming side of the equation is valuable. But one does not become an effective R user by programming chops alone either. Training a good R user should mean getting people grounded on statistical side of the ledger.

Insofar as picking up good statistical grounding in using packages like R, I’d recommend Philip Good’s Introduction to Statistics Through Resampling Methods and R. Somewhat unconventional, but back in the old days, one could not teach basic statistics by taking advantage of resampling approaches that packages like R make a breeze, and I think the intuitions are far more effectively presented than more traditional stats texts.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.