Photo by Helena Lopes on Unsplash

Convergence of Data Science Languages?

Berk Orbay
berk-orbay
Published in
5 min readAug 4, 2022

--

RStudio’s latest announcement on changing its name to Posit definitely means a lot. Highlight of the announcement is to take further steps in Python (without giving up on R). As expected, they already showcased some major developments such as Shiny for Python and Quarto.

So, what does this change mean for R and other data science languages?

I teach R with data analysis and communication focus. I also use develop with both R and Python (and occasional attempts on Julia. I also noticed that, lately I use Python more than R, because I write production code for SaaS stuff and my fellow developers prefer Python for better integration.

Is it the end of R?😱 No, I don’t think so. Alas, even though calling the demise of R is quite premature, it is definitely a sign for contemplation.

Needless to say, these are solely my opinions. I will also still continue to teach R and develop in R. This post is written by my perspective and my tooling needs from these languages. So, it is not the whole story.

This will be a post about both R and Rstudio/Posit.

Data Science Languages

It would not be entirely wrong if we say that three programming languages which data scientists use the most are Python, R and Julia. Especially those whose backgrounds do not include computer science (CS) or similar majors tend to pick up and use these languages instead of Java, C++ etc.

I also used to say that people who come from Excel find R (with tidyverse) better and those who come from Java find Python better.

About R

R is simply an amazing language created for (pardon my French) non-CS people. When considered with tidyverse (esp. dplyr+ggplot2), Shiny and RMarkdown it is both an analytical powerhouse and a seamless communication tool. You can simply analyze your data and generate reports (word, pdf, powerpoint, dashboard etc.) for the relevant stakeholders. Its learning curve is next to nothing. Currently no other language/system is close to R in this context as a whole.

R’s License Issue

Perhaps this is a topic which requires its own post, so you may skip this part. Something not discussed much, R has a “license” issue. R’s open source software (OSS) software licenses are a combination of GPL-2 and GPL-3. There is this clause of (paraphrased) “You need to give your source code to your end-user if the user demands it, if you used any code or libraries with GPL-3 license”.

There is a loophole in the license that you can provide services online (i.e. SaaS) and you do not have to provide your source code to your end users. Therefore some software apply AGPL-3 (or simply AGPL) license to comply with GPL-3 even if you provide services in SaaS fashion. For instance, even though we cannot say it is R code, Shiny Server Open Source is AGPL-3 licensed.

There might be discussions about in which conditions you need to disclose your source code and how enforceable it is, but ultimately it is slightly troubling for those who want to keep their source code. Google’s position is kind of strict: “Also, any use of source code under licenses of this type in a Google product will ‘taint’ Google source code with the restricted license.

Ultimately, it is a choice to be respected. Though one of the implications might be R might never find natural support (as in included languages) in major cloud providers (i.e. AWS, Azure, GCP). On the other hand, Python licenses (from 3.8.6) are under BSD-0 and PSF. Julia license is simply under MIT.

Bonus: Read tidyverse’s MIT-ification with this post and their notes if you are interested.

RStudio to Posit Effects

Currently, R is in a phase that, slowly, its unique and beneficial properties can be found more in other languages than having R. RStudio’s Posit pivot only accelerated this progress.

I always missed something like Shiny in Python. Django is a significant investment and Streamlit (albeit simple and powerful) is troublesome beyond simple stuff. Now, they also say it will be possible to make Python Shiny serverless (thanks to WASM capabilities). In Julia, there is Genie. They announced their no-code builder just a few weeks ago. Also, Python Shiny is MIT licensed while R Shiny is still under GPL-3.

Quarto is the replacement for RMarkdown. RMarkdown had support for different languages (esp. Python thanks to reticulate package) for quite some time. Now with Quarto, it is “more official”.

I think it is possible to say that Posit’s seeds have been thrown after the reticulate package. It is not hard to imagine that RStudio’s customers (let me remind you RStudio is a Public Benefit Corporation) push for more Python support.

Here are my perception on the recent developments and some predictions about the mid-term:

  • Posit is going for unifying data science languages. It means code written in R, Python and Julia may seamlessly interact for a wide range of applications. So far, most successful language in this unification is R especially with its awesome Rcpp and reticulate packages. Python’s R support is usually not palatable and Julia is quite nascent. Convergence of Data Science Languages is a hard task but a noble one. I think they will be quite successful.
  • Posit’s IDE will not explode in popularity, at least compared to VS Code. R is more and more usable in VS Code (also hence the Quarto plugin). But they will have quite an advantage in their commercial products.
  • R’s community is a populous one and “nicer” (hearsay:) compared to some other hardcore language communities. They had an academic community and a bio(conductor) community before tidyverse and RStudio. Now, R community is larger than ever with people from all kinds of backgrounds but its acceleration (not speed) will be slowed in the foreseeable future.
  • We are going to have most of the nice things in other languages (trilaterally). On its own, it makes these developments a huge win overall!
  • I will need to do a major revision on my lecture notes next year. Thanks Posit 😒

As final words, millions of thanks to all open source developers making it possible for us to build upon great foundations. Regardless of different preferences, we are very lucky to have such collaboration.

Wrap-up or TLDR

  • RStudio is becoming Posit. They will work more on Python (and Julia).
  • This may lead to a “convergence of data science languages” as they can work with each other seamlessly with Posit’s efforts.
  • It might have a slight effect on R’s popularity. But, overall it is a huge win for all languages.
  • Future progress will create incredible opportunities for developers.

--

--

Berk Orbay
berk-orbay

Current main interests are #OR and #RL. You may reach me at Linkedin.