Why Scala is Coursera’s primary serving language
By Brennan Saeta
Coursera is a rapidly evolving platform for education at scale. Our Massive Open Online Courses, or MOOCs, enable millions of students to take quality academic courses from top universities around the globe. In the two years since the company’s founding, we’ve grown our user base and engineering team exponentially, experimenting with several languages and frameworks along the way. We’ve come to the conclusion that Scala should be our primary–if not exclusive–serving language. Going forward, all new services in our service-oriented architecture will be written in Scala using the Play framework.
We love Scala because it provides a type safe language with powerful concurrency primitives on top of a mature technology platform. A handful of Stanford students wrote the first lines of Coursera using PHP. As the engineering team grew, we began searching for our future technology platform. After experimenting with many technology platforms including Python and Go, we have settled on Scala and the Play Framework because it meets our needs best.
Coursera is committed to supporting new innovations in pedagogy and to providing a consistently outstanding user experience. Thus our platform evolves rapidly–code is written, rewritten, and rewritten again. Refactoring a statically typed language is easier than refactoring a dynamic one; modifying existing PHP, and even Python, is a difficult chore that engineers shy away from because modifications are likely to create more bugs than they fix. Refactoring Scala is not only possible, but done regularly keeping our code base healthy.
Although Python (running with uwsgi) and PHP support heavyweight process forking, neither of them support lightweight concurrency such as true multithreading. As our infrastructure becomes more diverse and complex (e.g. talking to multiple databases per request), we need the capability to aggregate across multiple databases and services. Play’s reactive core and asynchronous libraries (e.g. WS) integrate seamlessly with other powerful concurrency primitives in the ecosystem such as Akka’s actors. Combining for-comprehensions with composable futures makes asynchronous concurrency look like straightforward synchronous code. Instead of an indecipherable cobweb of callbacks, developers write readable, maintainable, and efficient services.
The JVM ecosystem is more mature than any other available today. Scala builds on top of the JVM and the Java ecosystem, taking advantage of the platform’s robust tooling and libraries. Many modern “big data” tools — such as the Hadoop ecosystem, Apache Spark and Cassandra — are built on the JVM, and so we can use their primary client libraries. Furthermore, in sharp contrast to PHP (and even Python to some degree), Scala and Play projects require only a JVM installed on the serving cluster, vastly simplifying deployment and administration of our fleet of servers. Finally, the JVM is a fast and feature complete virtual machine. From the heap instrumentation and the powerful garbage collection algorithms to the Just-In-Time compiler, no other runtime comes close in terms of features.
“as a front end developer i’ve come to appreciate the value of a healthy developer community like nodejs. our team is already seeing many of those advantages as we are plugging into the scala + play community.” — eleith, ascii architect, learning experience engineering lead
Concerns with Scala
No platform, framework or language is perfect. We had many concerns before we began using Scala (explored below), but fortunately, only a couple of these have proven to be serious issues for our developers.
“I personally found compilation and reload times pretty acceptable (not as tight as PHP’s edit-test loop, but acceptable given the type-checking and other niceties we get with Scala).” — Frank, Coursera Infrastructure Engineer
Scala’s compiler is very sophisticated–it runs over 25 phases–and is known to compile slowly. Fortunately, compile time has not been a significant issue for us because SBT’s incremental compiler works efficiently. Even though we have more lines of Scala than either PHP or Python, compilation typically takes only a couple seconds. SBT’s incremental compilation combined with Play’s hot reloading mean developers maintain a rapid edit-refresh rhythm. Although this is still a concern as we grow our code base, our worst fears of long compile times slowing us down have been unfounded.
scalacis slow. On the other hand, dynamic languages require you to incessantly re-run or test your code until you work out all the type errors, syntax errors and null dereferencing. I'd rather have a sip of coffee while scalac does all this work for me.” — Nick, Coursera Infrastructure Lead
It is not hard to find examples of Scala that look like line noise. Worse, some libraries abuse Scala’s ability to define arbitrary symbolic operators, making it difficult for the uninitiated to understand Scala code. We evaluate libraries carefully, with an eye towards readability, when deciding which to include in our code base.
“Python probably takes the cake for being regular. PHP gets cake in the face… Scala is somewhere in-between.” — Daniel, Coursera Infrastructure Engineer
Although Scala has a flexible syntax, it is a fairly regular language with many fewer special cases or gotchas than other languages we use. In fact, given some familiarity with the rules, Scala can be easier to understand than languages with fewer advanced features. We have been pleased to discover that, in general, our developers do not have trouble reading existing Scala code. Finally, operator overloading is not a knock against the language itself. In many cases, operators actually make code more readable. For example,
a + b is much easier to read than
“scala allows you to use unicode for function/variable names like <|***|>, ⊛, or ☆. It’s like this big gun, and you can blow down the walls of scalability or you can blow off your own foot. I trust my team with this power, but can you?” — eleith, ascii architect, learning experience engineering lead
Scala is a deep and powerful language with many advanced features such as macros, implicits and existential types. We were concerned that in experimenting with these features, we would weave a tangled web of code undecipherable to even its authors.
“You can write unreadable code in any language. Scala’s concise and powerful language features let our developers write clear, simple code; style guides and peer code reviews make sure someone else can read that code, too.” — Josh, Coursera Product Engineer
We’re very careful in our use of these advanced features in our code. For example, the business logic of our services is written without defining new implicits or macros. That said, we do take advantage of these powerful features in important libraries and frameworks. Without them, Scala would be less useful, and we would have to resort to code generation or suffer significant amounts of boilerplate.
A compiled language’s development overhead can be offset by the supporting tools, most importantly the IDE. In comparison to Java IDEs–some of the most feature complete IDEs available for any language–Scala is quite lacking. Although we see consistent improvements in the Scala IDE for Eclipse, many features, such as “type hierarchy” and “call hierarchy” are still missing, while others (such as “move”) are buggy.
“The IDE is miles behind compared to Java, but it’s actively being improved (I actually fixed two issues so far).” — Daniel, Coursera Infrastructure Engineer
We expect this feature gap to close over time, but Scala’s IDE is a current weakness of the ecosystem. Fortunately, Coursera engineers have been able to contribute to the ecosystem by fixing some bugs.
Compared to the Python, Java, and Ruby communities, the Scala community is modest in size. There are far fewer developers available to hire, which has meant that almost every Scala developer at Coursera learned Scala at the company.
Scala is a sophisticated language and takes time to learn. However, while learning Scala does not happen overnight, Coursera’s culture deeply values learning and education both for our students and our employees. We enjoy teaching each other new concepts and stretching the boundaries of our knowledge. While we picked Scala because it’s intrinsically a productive and robust programming language, we also picked it because it teaches us new programming paradigms. Thus, we invest in both our code base and our team by learning Scala. In fact, many Coursera engineers have learned Scala together (and dogfooded our own product) by taking the Functional Programming and Reactive Programming courses on our platform.
Originally published on February 18, 2014.