Choosing between R and Python: A Digital Analyst’s Guide
“R or Python? That would be an ecumenical matter!”
It was the amusing title of a past data meetup in the city of Dublin where the topic was debated.
Apparently making the choice between R and Python is not the most straightforward decision. A web search will return numerous articles trying to answer which one is better or which one to learn first. After examining facts and figures about each of the two, however, the typical conclusion of those articles is one of the following …
- It doesn’t matter which one to learn — because both languages are great
- Why not learn both? — because that’s always better than knowing just one
- Decide yourself — based on your own field and interests
In other words, there is no clear cut, one-size fits all answer. While all the recommendations above are reasonable, they are not really helpful when it comes to actually making the decision. That’s in fact to be expected. Think about it, the practical applications can range from classification of medical images to self-driving cars software development, to time series forecasting for key business metrics.
To make things simpler, in this blog post we will exclusively look at the question from the perspective of a digital analyst. We will consider the workflows and types of tasks that are typically involved in this field. Of course, digital analysts can serve different roles, so we will look at a couple of different scenarios.
R and Python in digital analytics
Disclosure: I learnt programming with Python. When I started working with digital analytics, I switched to R which has been my primary language for programming since then. I still enjoy using Python and I make sure to keep up to date with the developments in the language.
Initial tip: Let’s stay calm
First of all, let’s reduce any unnecessary stress for potentially failing to choose the “right” language. In the context of digital analytics, the two languages have way more similarities than differences. Essentially no matter what choice you make you should not expect to be at a significant advantage or disadvantage.
- The similarities. I have done no statistical analysis to support this, but empirically for over 90 % of the analytical tasks in digital analytics, R and Python have equivalent functionalities and capabilities. For example, for the common task of importing, transforming and exploring data, simply comparing the equivalent R/Python code can make the similarities in logic and expression between the two fairly obvious.
- The differences. It’s true that there are some conceptual differences between the two languages, e.g. Python is primarily object oriented whereas R is primarily a functional programming language. These differences however are hardly noticeable for the most common digital analytics tasks.
Advantages of R
- The human interpretability factor
In digital analytics much of the analysis is “consumed” by humans and therefore there is a strong emphasis on the communication, interpretation, visualisation and reporting of the analysis- this plays to R’s strengths. R was developed by statisticians with a natural interest — just like digital analysts — in answering the what, how and why behind processes that generate data with emphasis on interpretability. This is reflected in the way the R language and its libraries approach problems and communicate solutions. R’s visualisation capability for example is a favourite among digital and business analysts. It allows users to create elegant visualisations following the principles of tidy data and the grammar of graphics.
2. The digital analytics community and R
Another advantage is simply that you can find support, resources and answers faster as a digital analyst who uses R. I am speaking from my own experiences, but I have always found that there is more code and content related to digital analytics written for R –including packages that are specifically developed for marketing analytics.
Is there a reason why the digital analytics community seems to be more geared towards using R?
I think this is partly because many digital analysts come from non-technical and non-computer science backgrounds. These analysts look for a programming environment in which they can get up and running fast without the need to acquire software development skills first — if all they mean to do is analyse data.
3. An easy-to-get-started-with domain specific language
In this respect R, as a domain specific language for statistics and data analysis, can offer a smoother transition. It allows a digital analyst to go from zero to completing the first data analysis faster and with fewer dependencies compared to other environments.
Advantages of Python
1. Production ready, cloud friendly applications
Python has a growing number of advantages on its side. It is the primary language when it comes to working with cloud services, data and systems at scale, distributed environments and production environments. Even though these advantages might not be directly impacting digital analytics right now, they are still very relevant . In fact, they are likely to become even more so in the near future as the various data systems including those of digital analytics tend to become less siloed.
2. Python: the multi-paradigm glue language
Python also has an “unfair” advantage over R by virtue of it being a so called “glue” language. Python is not just used by data analysts and data scientists but also by database engineers, web developers, system administrators etc. It has the reputation of being the second best language for…almost anything. This has led many organisations and teams to adopt Python as a common framework that minimises friction and avoids having to translate code from one language to another.
3. More Python expected soon
How relevant are the above points for the day to day work of a digital analyst today? Probably not too much (for most of us anyway), but I think few would disagree that it will likely become much more necessary in the near future as it will be useful for interacting with cloud services, managing larger datasets, working with more interdisciplinary data etc. These are all areas where Python excels.
So, which language should a digital analyst choose?
Let’s take a step back first
To answer the question let’s assume first that everything else is equal: If that’s not the case, if for example you have colleagues, partners or even the local community that can support you in learning language “x”, then you already have a very strong reason to select that one, regardless of what you ‘ll read below.
So, with the above assumption in mind, let’s now attempt to address the question. Even though choosing between R and Python is obviously…an ecumenical matter, I would argue that for the majority of digital analysts today, R is the most suitable language to learn.
So, why R ?
As a digital analyst your standard workflow probably involves working with structured/tabular data. Typically you first want to access the data e.g. via an internal database or an external web UI or API, then transform, visualise, (model potentially) and finally report and present to your team.
Does this sound like you?
If so, you probably already know that most of those tasks can be accomplished using a combination of tools like Excel, SQL and others (including Python of course). However, it’s hard to think of a more efficient way to perform this type of analysis and reporting than R — especially with the help of a set of R libraries like dplyr for data manipulation, ggplot2 for visualisation, rmarkdown for reporting and shiny for interactive web applications. These R libraries allow the user to work with the data in a very easy and streamlined way by bringing all aspects together into one place.
When to opt for Python ?
Of course not every analyst and team has the same needs and there is no doubt that there are many cases where Python would be more appropriate or useful.
- Data analytics at scale. Sometimes preparing an ad hoc analysis — using R as described above for example — is perfectly suitable for most processes, but it might not be the optimal option if you have to automate and scale it at a later stage. For example, your organisation might decide to develop infrastructure to run A/B tests at scale or to use the results of an ongoing analysis in order to improve the customer experience in real time. Python is typically the preferred language for this kind of use projects.
- The swiss army knife language. There’s also the type of analytics professional who prefers to move beyond the realm data analysis and use programming skills to accomplish a variety of other tasks such as web crawling, natural language processing, developing web apps or automating various other tasks. Python is a powerful general purpose language, which in fact some programmers refer to as their “swiss army knife”. As such it is recommended for the above use cases, many of which fall within the broader data science area.
- Machine learning. Machine learning and AI in the digital analytics world is currently something that mainly happens behind the scenes at the side of the platform providers, Google, Adobe etc. rather than in-house. But if there is scope for machine learning in your organisation for it to become a significant part of your role, then Python with scikit-learn is a premier language in this field. It offers a very solid and consistent API for machine learning work which has evolved into an industry standard toolkit.
How about learning both languages?
Even though I wouldn’t recommend learning the two languages simultaneously (unless you are in college of course), I do believe that being able to navigate code in both R and Python is a useful skill to have. If you choose R then becoming familiar with Python and being able to read and use Python code could help you solve a broader range of problems faster.
Open platforms like the Rstudio IDE and JupyterLab allow users to combine R, Python and in fact more languages within a single environment. In the long term being able to just use the right tool for the task at hand every time could be the winning strategy.
It is fascinating how open source and open knowledge has allowed many individuals, regardless of where they are located or where they work, to access powerful tools like Python and R and to create great impact within their teams and organisations. Let’s remember though that this openness wasn’t always available and that the use of advanced analytics until recently was a privilege of those large enterprises that could afford the high costs associated with proprietary technology.
So, no matter whether you choose R or Python, now is a great time to embark on this journey — the tools have developed so much and there is no shortage of opportunities to learn. Last but not least, there are very active local and global communities for both R and Python, like #pydata and #rstats which can be great sources of support and inspiration. Similarly the #data-science channel on measure slack is the home of many interesting discussions between digital analysts, around R, Python and beyond.
Originally published at www.london.measurecamp.org on September 10, 2018.
Cover image by Orestis Papageorgiou