Machine Learning Programming Languages — which is the best and why?
By- Rachit Kumar Agrawal
Machine Learning’s theoretical foundation dates to 1763 when Bayes Theorem was discovered. The first learning machine was also discovered as early as year 1950 by Alan Turing. But, this decade is truly the decade of Machine Learning/Artificial Intelligence. This is purely because of adoption of various machine learning techniques to make machines intelligent (artificially for now). This adoption is possible purely because of three reasons: 1. Huge amount of data available 2. Enough computational power 3. New techniques like Deep Neural Network making machine better than humans in lot of tasks like image classification, audio synthesis etc.
With machine learning gaining more and more adoptability, almost every popular language is adding their support to ease the ML tasks. These are the four popular languages that comes to my mind when thinking about doing machine learning.
There can be multiple metrics to decide on the right programming language among the above. Those are:
- Learning Curve: R is more functional whereas Python is more object-oriented. So, if you have more exposure to object-oriented programming you may find learning python to be easier than R and vice-versa if you come from functional programming background. Matlab and octave are more like writing mathematical equations which again is very easy to pick up. So, there is a tie here based on users’ experience with programming.
- Speed: R was built as a statistical language, and hence it has more statistical/data-analysis support built in where as python relies on packages. Due to these built-in packages, R is slightly faster in statistical related tasks.
- Community Support: Octave being less popular than other languages, it has slightly lesser community support than R, python and Matlab. In this metric, octave looses to other three languages.
- Cost Effective: Matlab is a proprietary software that needs a license for it’s use whereas other three are free/open-source software and has no-cost involved for it’s usage. This is where Matlab looses a little bit in-comparison to other three languages.
- DNN Frameworks support: Caffe and Tensorflow (probably more on this in another blog) are two popular frameworks as of today.
a. Caffe supports Python and Matlab.
b. Tesnsorflow supports Python and R.
Also, among other less popular DNN frameworks (like Theano), python is the only language that has universal support. This gives python a clear edge over other languages.
6. Production Ready: R is more suited for statistical analysis. Matlab and octave are more suited for computer vision related tasks, but python is more suited for any generic tasks — like data pre-processing, results post-processing. Also, python being generic enough makes it more suitable if there is a need to integrate ML with other software.
Taking above metric into consideration if we give 1 point to the language where it has an edge over others, the summary looks like the table below:
Based on the summary, it’s evident that Python is a better choice over other languages mainly because it being generic enough that it not just good for statistical/machine-learning related tasks but other generic tasks and having better support for all DNN frameworks like Tensorflow, Caffe. But, R can be very handy for a quick prototype that doesn’t need to use DNN frameworks. So, to summarize, R is the choice of language for a quick prototype but for long term python is the most preferred language.
About the Author | Rachit Agrawal
A Deep Learning Researcher by profession who loves to “train” his mind by learning new stuffs and loves to watch Formula 1 and do social work in free time.