R vs Python, why not both?

Eugene Olkhov
CompassRed Data Blog
4 min readMar 11, 2020
Source

The topic of whether to choose R or Python for data science work has been debated ad-nauseam. There are already countless articles discussing the pros and cons of both, so I won’t go into too much detail in this post. While there is still a large debate on what to use between R and Python users, many agree that R is superior for data cleaning, exploration, and conducting general statistical analyses, while Python is more capable when it comes to developing and implementing models.

Given these main pros of R and Python, is there a way where we can utilize both languages in a project? Definitely!

One way to combine the two languages in your workflow is to do cleaning/exploration/transformation in R, write the results back to a database (or save the data in some other way), and then import the data into Python to build models.

This method should work, but what if you want to do this all in the same notebook to have the ability to go back and forth between R and Python? For this, you can use the reticulate package in R, while not fully robust, does a good job at converting R objects to Python, and vice versa.

Running Python within RStudio

RStudio has the capability to run Python code inside of RMarkdown without needing external libraries. It’s really simple to do (assuming you already have a version of Python installed on the same machine), just create a new chunk and replace the r with python as below, and you’re ready to run!

The benefit of using the reticulate package though, is that it allows for conversion of R objects to Python and vice versa when possible. For example, if you’re cleaning some data in R and your data resides in a dataframe, when you switch to a Python code chunk, reticulate can be used to convert the dataframe in R to a pandas dataframe to be used in Python like so:

In order to access something from the R environment in a Python chunk, you need to call r['object_name']. If you need to do the opposite, and access something from the Python environment in R, you can call py$object_name.

That’s essentially it! You can now use this workflow to (mostly) seamlessly switch between R and Python to your heart’s content. However, there are some cons to using this workflow.

Cons

While I was VERY excited (ask my co-workers) when I got this working, after using this workflow for a few days, I realized that it’s not quite there yet.

First, it takes a little bit of tinkering for RStudio to accept the version of Python you want to use. I was using an Anaconda installation, and it was not immediately recognized.

The bigger issue is that while for the most part reticulate does a good job at converting objects, there are times where it does not work properly. For example, I found that date objects coming in as part of a pandas dataframe, do not convert nicely to an R dataframe.

The second con essentially defeats the purpose of using reticulate for this particular workflow. If the whole point is being able to seamlessly go back and forth between R and Python for cleaning and modeling, then all objects have to convert properly.

Should you use it?

Probably not. But you should TRY it!

The reason why I wanted to write this article is to make people aware that this is a possibility. While it may not be mature enough as of now to serve as a robust workflow, I do believe that some day it will be.

Regardless, I had a good time testing out this feature and I think everyone should give it a shot. It may not replace a Python IDE, but still nice to have the option to inject some Python code to be used in R if needed.

--

--