Running local Jupyter (and JupyterLab) env with Docker

Tomer Levi
Sep 3, 2018 · 3 min read

Docker is a great choice for development runtime hosting. It makes it easier to keep your development components such as Spark, Python, Scala and could offer data science libraries out-of-the-box.

Fortunately, Jupyter Project offer various docker images in their Github account. In this short guide I will walk you through the process of running your local Jupyter/JupyterLab.

Note: the steps describe the process on Mac machines. Windows/Linux users will have slightly different process.

Steps:

  1. First we will need to create a docker hub account (free). Just load https://hub.docker.com/ and register.
Docker Hub registration

2. Download docker from docker store and install it on your machine.

3. Once you finished installation, login using your docker hub credentials.
*Use your docker hub id and NOT your email.

4. Open terminal and execute this command (it will take some time to download docker images).

Jupyter:

docker run -it — rm -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace jupyter/all-spark-notebook

Or JupyterLab:

docker run — rm -p 8888:8888 -p 4040:4040 -e JUPYTER_ENABLE_LAB=yes -v ~:/home/jovyan/work jupyter/all-spark-notebook
Terminal output

5. An http address will appear in your terminal output, copy this address into your browser and your’e ready to code!

Jupyter UI

6. Let’s create a simple PySpark notebook.

Click on New -> Python 3

*as you can see Scala and R are also available

7. Inside the notebook, paste in the following code into the first cell. This will create a simple Spark DataFrame, filter it and show filter results:

from pyspark.sql import SparkSessionspark = SparkSession.builder.master("local").appName("Hello World").getOrCreate()l = [('Alice', 1),('Bob', 3)]df = spark.createDataFrame(l, ['name', 'age'])df.filter(df.age > 1).collect()

Put your mouse at the end of the cell and press: Shift + Enter to execute it.

PySpark notebook output

Spark UI is available as we mapped port 4040 (default Spark UI port into our machine), to load it just open http://localhost:4040

More to know:

Jupyter Project offer several docker images, from basic image, data science image to the one we used in this tutorial.
Each of which composed of different libraries and Jupyter kernels.
Take a look at the image selecting page for more details.

Thanks for reading!

Tomer Levi

Written by

Data engineer @Fundbox, https://twitter.com/Tomer_Levi

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade