Initial impressions of Scala from a Java and Python data engineer

Matt Hagy
3 min readFeb 18, 2019

--

I’ve been learning Scala over the last two months for a future role. Previously, I only had a passing understanding of Scala code from reading the Spark source code. Instead, I have historically developed data engineering workflows using Java Spark and MapReduce as well as Python PySpark.

Going into Scala, I had high expectations based on the great strengths highlighted on scala-lang.org. At present, I’m a big initial fan of Scala and hope to see it replacing Java and Python in my future data engineering work.

Here are highlights of my initial impressions…

Love the concise syntax

I find Scala to be the most concise language I’ve used professionally. I find small amounts of Scala code can contain a lot of information. I’m particularly a fan of the concise case class definitions.

I find Scala to be readable language

Similar to Python, I find Scala to be highly literate language that can be read in a structure similar to English. Python is a little stronger here — with a more basic vocabulary — but I still consider Scala to be a highly readable language in contrast to say Java and C/C++.

Type inference is great

I like the combination of requiring compile-time types to catch bugs and having the compiler automatically figure out the types for many variables and functions. It also adds to code brevity.

Immutable, persistent data structures are cool

In general, I’m a big fan of making all data immutable to cut down on the complexities of state modification. I first encountered persistent data structures in Clojure and I always thought they were interesting. I like the idea of minimizing memory usage and accelerating the creation of modified versions through persistent data structures.

Scala types seem powerful

I’m used to the concepts of covariance and contravariance from Java’s type system, but Scala’s types seem to be more powerful than Java’s. I’m still learning the type system and look forward to mastering it.

Pattern matching is awesome

I’ve used pattern matching in my limited work with Common Lisp and Clojure and always wanted to use it in other languages. I’m so happy that Scala includes this feature and I’ve enjoyed using it to making my code simpler. I like how this behavior can be extended in companion objects to allow for custom pattern matching.

Objects are an interesting alternative to static members

Seems to work well from my perspective.

Implicit parameters and conversions look like magic

I think I simply don’t understand these powerful concepts yet. Will focus on better learning them soon.

Seems like some developers want it to be more functional

In learning about Scala from the community, I regularly encounter developers that criticize Scala for not being functional enough. Personally, I don’t yet have an opinion here. I am hoping Scala can further my proficiency with functional programming so I’m ready to move to Haskell or Eta if those languages ever become relevant for my work.

Glad it runs on the JVM

I love the JVM and think it’s solid engineering infrastructure. Further, glad that I can access the massive Java ecosystem from Scala.

Love how everything is an expression

Likes this Lispy idea of having everything in the language being an expression that returns a value. In practice seems to simplify my code.

Like the syntax for working with tuples

As someone who’s had to use Pair numerous times in Java, I’m glad Scala has built-in support for working with typed tuples.

Scala is well supported in IntelliJ

Over the years I’ve come to appreciate using a powerful IDE for development and I’m glad to see IntelliJ has robust Scala support.

Overall I’m a fan

I’ve enjoyed learning Scala and look forward to using it in my future work. Will update as I learn more and develop more opinions.

Prediction

I hypothesize that more data engineers and data scientists will increasingly use Scala in place of Java and Python for complex ETL, analysis, and applying machine learning at scale.

--

--

Matt Hagy

Software Engineer and fmr. Data Scientist and Manager. Ph.D. in Computational Statistical Chemistry. (matthagy.com)