Adventures in Scala-land — Worth the Investment

As part of my job at ACL, it seemed like a good time to investigate the Scala Programming Language. I’d heard about it several times in recent years, but knew very little about it. I wasn’t even sure I was pronouncing the name correctly. In the interests of keeping up with recent advances in the analytics industry, I challenged myself to learn Scala.

My own software background is quite varied. I started with BASIC and Assembly Language (that’s all we had in the 1980s), but have spent most of my professional career with a range of imperative and object-oriented languages, such as C, Java, Perl, Python, and Ruby. Back in university, I also gained experience with lesser-known functional languages such as Scheme and SML.

I’ll share my perspectives on learning Scala (pronounced “Skahlah”). I’d previously heard that Scala is hard to learn, but so far I’ve enjoyed the experience. However, I can certainly appreciate why parts of the language can be challenging to comprehend at first. I’ll be focusing on my first impressions, both of things that I particularly liked, as well as features that took a while to grasp.

Not to spoil the surprise, but my conclusion is that your effort to learn Scala will definitely pay off, especially if you’re focused on developing efficient and high-performance software, such as for data analytics. However, your personal ramp-up time will be impacted by the range of programming languages you’ve already used in the past.

It’s like Java, but Not Really

On my very first day of learning Scala, my impression was that it’s very similar to Java. However, this impression didn’t last very long. It was largely due to two key parts of the language.

First, Scala executes on the same virtual machine as Java, which over the last 20 years has been widely ported to different host machines, and widely optimized for run-time performance. This commonality helps developers get up and running, and explains why both Java and Scala are heavily used for “Big Data”.

To use the JVM (Java Virtual Machine), the Scala compiler generates the familiar *.class files and *.jar files. It’s a well-advertised fact that Scala programs can link with existing Java classes, allowing heritage code to be reused. This is a big win for companies with a large Java investment.

The second reason I felt Scala and Java were quite similar is the use of static typing of variables. Having worked with Ruby and Python, I’d got into the habit of using methods and variables without declaring data types. In those languages, types are instead associated with the values that are assigned or passed at run-time.

Here’s an example in Ruby:

def square_area(side_length)
side_length * side_length
end
square_area(5)        # returns 25
square_area("Hello") # throws TypeError at run-time

Dynamic typing is a convenient feature, but I’m often left worrying the program might crash with a TypeError at some point in the distant future (perhaps after running for several hours). However, with Scala’s static typing, I get a warm fuzzy feeling that my code will always work at run-time:

def squareArea(sideLength: Int): Int = sideLength * sideLength
squareArea("Hello")    // Won't compile, so can't fail at run-time.

Of course, static checking comes at the expense of an upfront compile phase, and purists say it doesn’t completely avoid run-time failures (there are certainly edge cases), but for high-performance computing, static types make a lot of sense.

As I learned more about Scala, the similarities with Java became less apparent. Let’s see why…

Type Inference != Dynamic Typing

Experienced Java programmers admit that Java can be verbose, requiring what seems like an excessive amount of keyboard work. Here’s an example of creating a “hash” in Java, mapping string keys to integer values:

Map<String, Integer> map = new HashMap<String, Integer>();
map.put("Apple", 10);

and here’s the equivalent in Scala:

val map = Map[String, Int]("Apple" -> 10)

Note that in Scala we didn’t specify the data type of map, simply because it’s obvious from looking at the right side of the statement. This can save a bunch of keyboard clicks, especially for long declarations.

In fact, given that the Map has been initialized with "Apple" and 10, we can make this code even shorter by not declaring the “key” and “value” types:

val map = Map("Apple" -> 10)

With both these simplifications, the Scala compiler has used “Type Inference” to determine what the types should be, rather than requiring the programmer to be explicit.

Note that this is NOT the same as “Dynamic Typing”, where the type can change at run-time. For example, the following is legal in Ruby:

a = 10
a = "Hello"

But in Scala, once the type has been inferred by the compiler, the type can not change:

var a = 10
a = "hello" // compile error: type mismatch

In the end, we still get the warm fuzzy feeling that our code won’t report type errors at run time.

Object Oriented, with More Features Than Java

In general, object-oriented programming in Scala is very similar to other languages. There are some advanced features, but overall it’s conceptually similar to Java, C++, Ruby, or Python.

Perhaps the biggest challenge was remembering the language-specific details, such as the order of type declarations, how the primary constructor is defined, and the exact details of the scoping rules. Nothing unfamiliar, but just enough difference to reduce my coding speed as I got started.

Here’s an example of a simple class definition:

class Cat(name: String, age: Int) extends Pet {

private var toysCount: Int = 3

def makeNoise(): String = {
"Meow"
}

private def acquireToy(newToy: String): Int = {
println(s"Thanks for the new $newToy")
toysCount += 1
    return toysCount
}
}

In this example, perhaps the oddest thing is the primary constructor arguments being declared on the first line, rather than using a separate constructor method. Everything else should be quite familiar from other languages.

Beyond the basics of object-oriented development, Scala has some interesting features, such as “traits”. These are similar to interfaces in Java, since a class can conform to more than one of them, but they’re also like abstract classes that contain fully implemented methods. In essence, traits provide “multiple inheritance”, which a lot of object-oriented languages have chosen not to implement.

Moving on, let’s focus on functional programming, which is one of Scala’s strong points.

It’s Functional Too, but Mutants Are Still Allowed!

Scala has strong support for Functional Programming, which is likely the reason it’s common in the “Big Data” world, and the basis of the Spark Analytics Framework. The functional style allows algorithms to be expressed concisely, without being concerned with the low-level implementation detail.

For example, here’s a very simple algorithm for doubling the numbers in a list:

val doubleNumbers = myNumbers.map(n => n * 2)

What isn’t apparent from this code is whether myNumbers is a small list of 100 numbers, or whether it contains 100 billion numbers, with the computation being spread across a large cluster of compute nodes. The implementation details don’t need to be specified.

One misconception about Scala is that you’re forced to write code in functional style. In reality, Scala allows both functional and imperative programming, or a mix of the two. I won’t discuss the pros and cons of functional versus imperative programming (there are whole books about that topic), but many seasoned Scala developers have strong opinions about the “correct” way to write code.

To illustrate the difference, here’s an “imperative” version of the famous factorial function, using variables and sequencing/looping to compute the result.

def factorial(n : Int): Int = {
var total = 1

for (i <- 1 to n) {
total *= i
}

return total
}

On the other hand, here’s the “functional” version of the same algorithm, which looks more like a pure mathematical function.

def factorial(n: Int): Int = if (n == 0) 1 else n * factorial(n - 1)

I personally had no problem adopting Scala’s functional style, but that’s largely due to my background with Scheme, Ruby, and SML. If you’ve never seen functional languages before, and you prefer to use variables, sequencing, and other “side-effects”, you might struggle with Scala at first.

Syntactic Sugar Makes It Short and Sweet

To make life easier, Scala provides syntactic shortcuts (or “sugar”), allowing experienced developers to reduce their overall code size. These shortcuts are equivalent to their longer counterparts, but reduce the number of characters you need to read or write.

As an example, the following code iterates over a list, and prints out the values on the console:

list.foreach(n => println(n))
1
2
3
...

Note that n => println(n) is an anonymous function that takes a single parameter n and prints the value on the console. The foreach method then applies that anonymous function to each element in the list (1, 2, 3, etc).

As a simplification, we can eliminate the n => part by using _ as a placeholder for the nparameter.

list.foreach(println(_))

We can then go one step further and recognize that println also requires a single parameter that we don’t really need to provide:

list.foreach(println)

These shortcuts are certainly a benefit, but only when you gain enough experience to mentally map back to their longer version. Unfortunately, my first impression was to fear these shortcuts, especially with my flashbacks to Perl’s “default” variable $_, widely known for being the cause of vicious bugs.

Collections, but with Different Names

Collections in Scala (such as Lists, Maps, and Arrays) are similar to those in other languages. The main challenge for me was learning the exact syntax and semantics of each operation (such as ++ versus +:). As a result, the Scala Collections API is frequently left open in my web browser.

Perhaps the most significant difference is that Scala supports both mutable and immutable collections. The mutable variant is the “traditional” collection allowing data to be modified in place.

val mutableMap = mutable.Map(1 -> "One")
println(mutableMap) // Output: Map(1 -> One)
mutableMap += (2 -> "Two")
println(mutableMap) // Output: Map(2 -> Two, 1 -> One)

In contrast, immutable collections are used more with functional programming, where instead of modifying the existing collection, an entirely new collection is created.

val immutableMap1 = immutable.Map(1 -> "One")
immutableMap1 += (2 -> "Two") // ERROR: += not supported
val immutableMap2 = immutableMap1 + (2 -> "Two")
println(immutableMap1) // Output: Map(1 -> One)
println(immutableMap2) // Output: Map(1 -> One, 2 -> Two)

It also interesting that Scala collections are usually not compatible with their Java namesakes. However, Scala does provide conversion methods:

val jList = List(1, 2, 3).asJava

Implicitly Shortening Your Code

Another road block I encountered was learning about implicit values in Scala. When declared as implicit, a value doesn’t need to be explicitly passed into a method, therefore reducing the amount of boilerplate code.

For example, if I declare a value in the following way:

implicit val database = new DbConnection()

and if I have methods that accept an object of type DbConnection:

def readData(colName: String)(implicit dbConn: DbConnection) = {
...
}

Then I can call a method without explicitly passing in that parameter:

readData("myColumn")   // compiles correctly!

The benefit here is we avoid the excessive boilerplate code associated with passing the same parameters to a large number of method calls. Imagine having many different calls to readData(), as well as calls to writeData() and flushData() that all require the sameDbConnection object. The code is much shorter with implicits!

Scala also has implicit classes, effectively allowing new methods to be added to existing classes, somewhat like Ruby’s “monkey patching”.

And There’s More…

Even after six months of programming in Scala, I’m continuing to learn more great things about the language. Each time I wrap my mind around the Scala way of doing things, I’m glad I made the effort.

Here are some other concepts I’ve learned:

  • Case Classes - A fast and simple way to create compound immutable values. For example: Person("John", "Brown")
  • Case Statements - An improvement over traditional switch statements, allowing pattern matching based on data types, as well as matching case class values. For example: case p: Person => println("Matched Person")
  • Lazy Evaluation - Evaluates a potentially complex expression only when the value is actually needed.
  • Streaming Data - Allows data to be processed as it’s being generated, or as it arrives from the network or a database, rather than waiting until it’s fully in memory.
  • Some and None - A replacement for “null pointers”, allowing the absence of data to be indicated in a type-safe way. Whereas invoking.map on a null value causes an exception, None.map(...) safely returns an empty list.
  • The Either class - Allows a choice of two different return types in a single return value. Often used to indicate a success value, or alternatively an error message.
  • Futures - Allows segments of code to execute asynchronously. Somewhat like creating a new thread, but with a more elegant programming style.
  • The Play Framework - A fully-fledged web application framework, similar to Ruby on Rails, or Django.

And the list goes on… rather than repeating all the literature, I highly recommend you read Learning Scala for a gentle introduction, and then Scala Cookbook for more detail. Finally, once you’re up to speed, Scala’s own documentation is an excellent resource.

Summary

As I mentioned at the start of this blog post, Scala can be an intimidating language at first, and your ramp-up time is definitely impacted by your previous language experience. In the end, I’m glad I took the time to read the books and take on Scala projects, as I’m now able to benefit from everything Scala has to offer. This is especially true about the world of high-performance computing, including data analytics.