The CaReTakeR of Data Science and Statistics

Why is ‘R’ so underrated?

Rinu Gour
DataFlair
5 min readMar 23, 2020

--

STARTING OFF

I have been a professional as well as a hobbyist Data Scientist for a good while now. It is not uncommon in any field that a beginner might choose a tool that does not suit them very well.

Therefore, I believe that the reason I chose R when I started as a Data Scientist is not that important. I think the reason I stuck to using R for Statistics and Data Science holds much more value.

The key to unlock the safe of Data Science

So, today I am going to enlighten you about the reasons I still prefer R as my go-to tool for Data Science.

But firstly, let’s get the basics out of the way.

What is R?

R is a programming language widely used for statistical computing and data analysis. It was created in the ’90s as a learning aid for Statistics students. 2000 saw the release if the first stable beta of R.

Since then, R has been one of the most (and at times the most) popular programming language for Data Science. Due to its open-source license, R sees significant contributions from users all over the world.

‘Coz you can go on a WaR with ‘R’

Hence, there are more than 15,000 packages and libraries for R that make it even more powerful and comprehensive than it already is.

Now with that done, let’s get into the fun stuff.

WHY I PREFER ‘R’ FOR STATISTICS and DATA SCIENCE?

R is Open-source

R is open-source, which means that it is entirely free. It also means that if I want a feature that it does not already have, I can make a package that gives R said feature and share it with others as well.

‘Coz ‘R’ is 5 StaR Programming Language

I can make the package open-source and give it away for free or charge money for it as well if I wish to. R’s open-source license also leads to the next reason, which is:

Massive Community Support

R is very popular. More than 2 million people use it worldwide. Not only that, but many users also share their experiences. They also clear the doubts of others and help them on online forums. It does not end there either.

The global community organizes conferences, and meetups all over the world where R programmers gather to share their knowledge, have competitions, introduce new packages, have fun, and much more. Being an R programmer means that you are never alone on a project.

Vector Arithmetic

Unlike other programming languages used for data science like python, R was never created to operate on singular values. R does not have scalar logic at all. It works entirely on vector mathematics and, therefore, does not need looping to deal with sets of multiple values, which means that R programs deal with sets of various values without costly looping algorithms.

Your Data Science Dream not so far with ‘R’

R is Best at Data Visualization

R’s base package can render production-quality graphics with simple commands. Any kind of graphical representation is possible for any data. R even has packages like Plotly and Ggplot2 that increase the customizability of the static graphics and also produce dynamic as well as animated graphics.

Highly Flexible

R was created for students studying statistics and not computer science. This means that many efficient programming practices that govern the syntaxes of other programming languages were not of much importance when creating R.
As a result, R has a very flexible syntax with very few restrictions to enforce uniformity.

On the surface, it may look like a con, but if you take care to implement a few good programming practices, it gives you a much higher degree of freedom. This results in the unbelievable compatibility of R, which is the next reason.

Compatible with other Programming Languages and Technologies

R is very compatible with many other programming languages like C, C++, Java, Python, FORTRAN, etc.. R programs can directly interact with applications created in other languages. R objects can be directly manipulated using Python, Java, or even C programs. The same can be done the other way around as well.

This also means that R can be easily paired with other technologies and software as well. It can be paired with Hadoop to perform complex statistical computations on massive datasets parallelly. It can be used to use a Spark cluster for distributed computing remotely. There are R Packages that can write HTML, CSS, and Javascript code as well, which brings us to the next reason.

Report Generation is Straightforward and Highly Flexible

R’s markdown package can be used to generate HTML pages with R code embedded into them. The pages can be customized entirely in the desired way.

By changing a single line in the header of the code, the package can be used to generate word documents, Powerpoint presentations, and many other formats as well. You can also make templates using the markdown package that can be reused in the future to generate reports of a similar format.

R can also make Interactive Web Apps

R’s Shiny package allows you to generate interactive web apps with your R code, your data, and your analysis results embedded into them. These apps can have dynamic and animated graphics, visualizing your data analysis. The shiny package also allows you to host your shiny apps on shinyapps.io.

WRAPPING IT UP…

Well, there you have it! These were the reasons many other data scientists, and I prefer R as our primary tool for statistics, analytics, and data science. R has stayed as one of the most crucial programming languages for data science for the past two decades, and I expect it to stay that way for several more to come. Learning and mastering R could be a sure-shot way of securing your career as a data scientist.

Your golden pass into the Data Science world

--

--

Rinu Gour
DataFlair

Data Science Enthusiast | Research writer | Blogger | Entrepreneur