Machine Learning with Scala in Google Colaboratory

Last week was the TensorFlow Dev Summit, where the latest announcements in all the TensorFlow libraries were announced. In these talks, almost all the demos were in Google Colaboratory (Colab), a free product from Google that gives you access to a Jupyter notebook running in the cloud with the option to connect to powerful GPUs to accelerate machine learning operations!

Inspired by the power of these interactive demos, I set forth to make it possible to run Scala code inside Colab. In this blog post, we’ll see how to set up Scala to run inside Google Colab, and then take a look at a few examples of machine learning (including with GPU acceleration) with Scala inside Colab notebooks.

Getting Started

If you want to follow along with the examples in this post, head over to the notebook, connect to the environment that you want to run the later notebooks with (CPU or GPU in Runtime > Change runtime type), and hit Runtime > Run All

The installer loads Almond, a Jupyter kernel that enables Scala support. In addition, it slightly tweaks the kernel definition to preload the Python native libraries, which makes it possible to use NumPy and TensorFlow through ScalaPy as we’ll see in a bit!

Hello World in Colab

Because the Scala kernel is installed, Colab automatically knows to run all the code blocks in the template with it, so when you hit the run button next to the first code block you should see “Hello, world!” printed below!

Before we jump into machine learning, let’s try out the different features offered by Almond. First, we can import a library. Almond internally uses Ammonite, so we get a nice import syntax for loading libraries in our notebook.

For example, we can import Circe, a library for manipulating JSON in Scala, with the Ammonite import syntax

import $ivy.`io.circe::circe-core:0.10.0`, $ivy.`io.circe::circe-generic:0.10.0`, $ivy.`io.circe::circe-parser:0.10.0`import io.circe._,, io.circe.parser._, io.circe.syntax._

With this loaded, we can then convert a Scala object to JSON. As you’re typing this in, you’ll see that the notebook has full code-completion support. For example, hit Ctrl-Space after typing Qux(13, Some(14.0)). and you’ll see code completions that include the asJson method.

case class Qux(i: Int, d: Option[Double])val json = Qux(13, Some(14.0)).asJson.spaces

Here’s a complete notebook that uses Circe to convert a Scala object to JSON.

Thanks to Almond, we get a natural experience for writing Scala code inside the notebook. Almond features even more advanced features, such as displaying custom HTML, which can be used to plot charts. Take a look at the Almond docs for more information.

Using TensorFlow

Using TensorFlow from Scala is easy with ScalaPy, a library that enables seamless interop between Scala code and Python libraries. ScalaPy is designed so that any Python library can be used, even if they have native dependencies. And libraries like TensorFlow and NumPy are no exception — we can use them from ScalaPy while still having access to high performance native computations. To learn more about ScalaPy, check out this blog post.

One of the primary features of Scala is its rich type system, and ScalaPy makes it possible to write type safe code even when working with Python libraries. When using TensorFlow, we can simply add .as[TensorFlow] and all later usages will be checked against the type definitions in the scalapy-tensorflow library.

First, let’s start with a simple example of performing linear regression with TensorFlow. The following notebook starts by installing ScalaPy, then loads up TensorFlow, and finally performs some linear regression.

Linear regression isn’t exactly the most complex machine learning application, especially when we’re using a library as powerful as TensorFlow, but it does a good job of demonstrating what the experience of writing Scala code for TensorFlow is like!

Learning Faster with GPUs

In this example, we train a CNN based classifier on the MNIST dataset (which contains images of handwritten digits), achieving a test accuracy of over 99% with just 5 epochs of training! All the code to load the dataset, set up the model, and perform training are written in Scala, but through ScalaPy we are able to use TensorFlow to perform the training on a GPU.

With the power of GPUs, training this relatively complex model is super fast: training with GPUs took 1 minute and 23 seconds while with CPUs it took over 6 minutes. Pretty impressive!

And More!

To learn more about the tools used in this blog, check out:

  • Almond — the kernel implementation that lets us write Scala code in Colab
  • ScalaPy — the bridge that made it possible to use TensorFlow in our Scala code
  • ScalaPy Tensorflow — static typings that make using TensorFlow natural in Scala

Exploring the boundaries of programming, usually with #scala. EE/CS student at UC Berkeley. Intern at {@apollographql x 2, PayaLabs, @khanacademy, @coursera}

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store