Setting up a Spark machine learning project with Scala, sbt and MLlib

Pedro Costa
Jan 8, 2019 · 2 min read

In this tutorial, we will set up a Spark Machine Learning project with Scala, Spark MLlib and sbt.

sbt is an open-source build tool for Scala and Java projects, similar to Java’s Maven and Ant.

sbt requires the Java Development Kit 8 (JDK 8), so if you don’t have it installed follow this link to install.

Installing sbt:

On Ubuntu:

$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823$ sudo apt-get update$ sudo apt-get install sbt

For other Linux distributions: https://www.scala-sbt.org/download.html

On Mac using homebrew:

$ brew install sbt

Creating the project

To quickly start your project we will use a Gitter8 bootstrap template. This will create the necessary folder structure and project files.

$ sbt new sbt/scala-seed.g8

You can quickly check if everything is working by changing directory into your newly created project and running sbt:

$ cd [my-project]$ sbt

Inside the sbt shell use the command run to run the template project:

$ sbt:myproject> run

This should return a simple hello message.

Adding Spark and Spark MLlib

The default template already includes a scalaTest dependency. Now we will add Spark core and Spark MLlib.

In your project folder root you can find your build.sbt configuration file.

Add the last two lines,

libraryDependencies += sparkCore,
libraryDependencies += sparkMLlib

, to include the spark core and the spark MLlib dependency.

Then we need to specify what these dependencies are in ./project/Dependencies.scala

// https://mvnrepository.com/artifact/org.apache.spark/spark-core
lazy val sparkCore = “org.apache.spark” %% “spark-core” % “2.4.0”
// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
lazy val sparkMLlib = “org.apache.spark” %% “spark-mllib” % “2.4.0”

That’s it!

Now you can code your machine learning project with Spark and MLlib in the source folder and run with sbt.

$ sbt

Inside the sbt shell use the command run.

$ sbt:myproject> run

👋 Thanks for reading!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store