Beyond Python and R for Data Science — Part 1
Try Scala, Julia in the new year for additional fun in data science
We all know that R and Python are both used for data science. Machine learning can be done with both. They are probably the two beginner’s programming languages for your foray into data science. There are plenty of software engineers who are either transitioning into data science by becoming data scientists, data engineers, and machine learning engineers, or they are working on AI software projects.
If you are a programmer or a software engineer on this path, then this article is for you.
Before you go down the path of R and Python, here are a few things you should know about these two other languages. If you are already a skilled data scientist or data engineer who participated in Kaggle competitions and aced your first job, you may want to give Scala and Julia a try as we enter a new year.
Scala is a statically typed language. It’s object-oriented and runs on the JVM. Scala is interoperable with Java. It is designed to be both object-oriented and functional. The name is short for scalable.
Scala is ten times faster than Python. It’s often used in machine learning and large-scale data science projects. Many data scientists use it in conjunction with Apache Spark.
Many organizations such as Twitter and LinkedIn use it to increase programmer’s productivity. Functional programming in Scala allows programmers to write less code and more readable code for data science projects. Often, with organizations that have a large amount of Java programmers, picking up Scala is easier for these programmers than foraying into Python. Instead, these programmers work on projects that use Scala for functioning programming.
Here are some resources for Scala:
Julia is a fairly new programming language (came about in 2009) that was designed for high-performance scientific computing. It is dynamically typed. Programming in it feels more like programming in R. The reason that Julia is fast (ten to 30 times faster than Python) is because it is compiled and not interpreted. Programmers often use Julia in conjunction with Python. Python’s PyCall library makes it very easy to support Julia in Python projects. Python programmers choose Julia for cleaner implementations of parallelism.
In recent years, the choice of Julia as a programming language to use in large-scale data science projects came to the forefront of discussion.
The major drawback most programmers cite for Julia is that it’s still not as robust in support or development as Python. The language itself needs more time to mature.
Here’s a great article explaining the main features of Julia vs. Python.
The major selling point of Julia these days is in crafting differentiable algorithms (data-driven code that neural networks use in machine learning) in Flux (machine learning library for Julia).
Here are some resources for Julia:
Are you ready to learn a new programming language in the new year?
Picking up either one of these two languages in the new year can add to your stack of data science skills. They can also prime you for data science jobs that are different from the ones you’ve had.
What are you waiting for?