R vs Python — Which is best?
If you are reading this article, I imagine you, like many other data scientists, are wondering which programming language to embark upon learning.
Whether you have experience in other coding tools or not, the individual features of these two particular programs, including the vast arrays of libraries and packages may initially seem daunting, but don’t worry, this article is going to help you decide which has the right tools to get you going.
To no one’s surprise, both R and Python boast their respective advantages for a multitude of applications and are widely used by professionals in their global communities.
In order to begin, it’s likely a good idea to revisit what exactly you want to use the programming language for. For example, a data scientist working predominantly on genetics research may find themselves using R (as it’s popular with bioinformaticians), whereas someone working on models for image analysis, say an employee at Tesla creating self-driving car technology, might find themselves working with Python, due to its sophisticated image manipulation tools.
Ultimately, it’s still your choice, and while it would usually never be a good philosophy to just blindly do what everyone else is doing, do take the time to discover why these professionals prefer certain languages. It’s going to be so important to be able to speak the same language as your future peers.
Who uses R and what’s its purpose?
R was created initially as a platform for statistical computing, hosting all the classical tests, time- series analysis, clustering, and more. It has a large community of data miners which means lots of accessible packages, both from R developers and users. In terms of graphics, there is multitude of packages and layers for plotting and analysing graphs, such as ggplot2 .
Importantly, R has emerged onto the new-style AI scene, providing tools for neural networks, machine learning, and Bayesian inference. It’s also compatible with packages for deep learning such as MXNet and TensorFlow. If this doesn’t sound like jargon to you (or you’re curious about these foreign words), you can read more about these here .
It would seem R has a solid following, not only of data scientists but largely of statisticians and those in associated fields requiring data manipulation (for instance those in medicine, finance and the social sciences). For any data scientist, finding a widely used program is important; you’ll want to be able to speak to as many disciplines within one language as possible, making your work easily translatable.
Who uses Python and what’s its purpose?
On the other side of the court, Python is an excellent tool for programmers and developers across the board. Whether developing algorithms for simulating biomolecules or delivering anti-spam software, you’ll find yourself at home using its interface and array of functions.
Released in 1989, it is quoted as being one of the most significant general-purpose object-oriented programming language (which sounds super fancy). Python has an ever-growing popularity among new programmers (data scientists among them), which of course means a rich community of users and trusty trouble-shooters for those annoying problems you’ll inevitably encounter.
Similarly, on the hot topic of AI, Python is also the most popular choice; it has tools for machine learning, neural networks, and Tensorflow. Additionally, covering some more general purposes, its users benefit from libraries such as numpy for statistical analysis, pandas for data preparation, and seaborn for generating plots. Check out this article on the Top 20 Python libraries for Data Science for more info.
R vs Python: Limitations
To the more interesting part: how do they each match up?
Uncovering limitations early is possibly one of the most important pieces of advice you could get. Speaking from experience, jumping from using Matlab where there is an enormity of online support (and usually some wonderful person who’s written an exact code for your needs), to labVIEW where there was little to no online presence, I know the sensation too well of panicking and being unable to solve that bug.
You’ll most definitely become frustrated at not having considered the obvious potential limitations your new language may hold.
Some of the main things to consider for a data science application are:
- Processing speed — Will you be using large amounts of data?
- Online community — It really is invaluable and has saved me many times.
- Steep learning curve — How much time and patience do you have to specialise/have you already learnt programming before and are better equipped to learn a new language?
- User-friendly interface — Are you familiar with programming or do you prefer something easy to visualise and pretty?
- Widely spoken — Have you considered future connections across fields and their languages?
Let’s have a look at how each fares on these topics…
Processing speed:
R is considered to be slow. It requires its objects to be stored in a physical memory, meaning it’s not a great option when trying to harness Big Data. That being said, faster processors are reducing this limitation, and there are various packages out there focused on tackling this. Python, however, is more suited for large datasets and its ability to load large files faster.
For more information, check out Quora’s Which is better for data analysis: R or Python?.
Online community:
As mentioned above, both R and Python have a widely backed support network for you to reach out to, this being an invaluable source of help for those bugs you just can’t seem to troubleshoot readily.
Steep learning curve:
This may or may not be considered as a limitation of R, but its steep learning curve is due to its extensive power for statisticians. Being developed by experts in the field, R is an incredible tool, but you pay the price for this with your initial investment of time.
On the other hand, Python is very attractive to new programmers for its ease of use and its relative accessibility.
Both programs will require you to get familiar with terminology which may seem off-putting at first (like the difference between a “package” and a “library”), with the set-up for Python having the edge on R in terms of the user-friendly experience.
Although, Python will be unrelentingly strict with users on syntax and refuse to run if you haven’t met easily-missable guidelines (though these do enhance user experience in the long run as it makes us better, neater code writers).
R has the lovely attribute in relation to its many academic users of providing the user lots more control over design for their graphics, allowing various display exports and formats.
Importantly, both are interpreter-based and it has been found, in relation to other languages (such as C++), that this makes spotting bugs so much easier.
User-friendly interface:
RStudio is widely considered the favourite platform for interfacing in R and once you begin familiarising yourself with it, you’ll understand why that is the case.
It’s classified as an integrated development environment (IDE) and comprises a console for direct code execution with all the functions for plotting, supporting interactive graphics, debugging and workspace management, see RStudio IDE Features for a more detailed guide.
Python hosts numerous IDEs for the choosing. The benefit of this is that it provides a nice opportunity for you to choose one which feels familiar based on your background. For instance, coming from a computer science background, Spyder is a clear favourite. Whereas, beginners in the field find PyCharm accessible and intuitive.
Top 5 Python IDEs for Data Science is a helpful, comprehensive article on this topic.
Widely used:
We’ve touched on this topic and I would stress that this is subjective to your chosen field. If you are leaning towards the fields of academia, finance, and healthcare (to name a few), R would most likely be much more widely spoken and you’ll want to take advantage of that.
Whereas, those of you interested in software development, automation, or robotics, may find yourselves immersed in the Python community.
R vs Python: Advantages
Let’s get straight to it…
R:
- An excellent choice if you want to manipulate data. It boasts over 10,000 packages for data wrangling on its CRAN (Comprehensive R Archive Network).
- You can make beautiful, publication-quality graphs very easily; R allows users to alter aesthetics of graphics and customise with minimal coding, a huge advantage over its competitors.
- Perhaps its most powerful tool is its statistical modelling, creating statistical tools for data scientists and being the forerunners in this field, preferred by experienced programmers.
- Users benefit from its interface to Github’s large platform to discover and share better software.
Python:
- It’s very easy and intuitive to learn for beginners (unlike R, Python was developed by programmers, and its ease of use makes it a favourite for Universities across the board).
- It is appealing to a wide range of users, creating an ever-growing community in more disciplines and increased communication between open-source languages.
- The strict syntax will force you to become a better coder, writing more condensed, legible code.
- Python is faster at dealing with large datasets and can load files with ease, making it more appropriate for Big Data handlers.
Conclusion…
With all this in mind, choosing a language to begin with highly depends on what you want from it. If you are the kind of data scientist who specialises in statistical analysis or you work in research, you may find R works best for you.
However, if you are someone who sees themselves branching across multiple disciplines, you could make use of Python’s generality and diverse network.
You may also agree that it would benefit you to eventually learn both (at least enough to be able to read the other’s syntax) as you get to know each for their respective strengths. This will undoubtedly open more doors for you in terms of landing jobs, and more importantly, give you that clarity to decide what career path you want to take.
But, don’t be overwhelmed; learning the second language will be easier than the first! You no doubt will also find yourself excited about opening up a whole new community to immerse yourself as you grow as a data scientist.
Good luck and happy coding!
P.S. Please forgive the sketches — they looked a lot cuter pre-digitisation.
Resources:
https://www.r-project.org/about.html
https://www.python.org/about/ https://codeinstitute.net/blog/what-is-python-used-for/
https://www.superdatascience.com/blogs/top-6-data-visualization-libraries-for-python
https://www.superdatascience.com/blogs/r-and-python-in-the-workplace https://www.quora.com/Which-is-better-for-data-analysis-R-or-Python-Is-R-still-a-better-data- analysis-language-than-Python-Has-anyone-else-used-Python-with-Pandas-to-a-large-extent-in- data-analysis-projects
https://data-flair.training/blogs/r-vs-python/ https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages
https://bigdata-madesimple.com/top-20-python-libraries-for-data-science/ https://rstudio.com/products/rstudio/features/ https://www.datacamp.com/community/tutorials/data-science-python-ide