Running local Jupyter (and JupyterLab) env with Docker
Docker is a great choice for development runtime hosting. It makes it easier to keep your development components such as Spark, Python, Scala and could offer data science libraries out-of-the-box.
Fortunately, Jupyter Project offer various docker images in their Github account. In this short guide I will walk you through the process of running your local Jupyter/JupyterLab.
Note: the steps describe the process on Mac machines. Windows/Linux users will have slightly different process.
Steps:
- First we will need to create a docker hub account (free). Just load https://hub.docker.com/ and register.

2. Download docker from docker store and install it on your machine.
3. Once you finished installation, login using your docker hub credentials.
*Use your docker hub id and NOT your email.
4. Open terminal and execute this command (it will take some time to download docker images).
Jupyter:
docker run -it — rm -p 8888:8888 -p 4040:4040 -v ~:/home/jovyan/workspace jupyter/all-spark-notebookOr JupyterLab:
docker run — rm -p 8888:8888 -p 4040:4040 -e JUPYTER_ENABLE_LAB=yes -v ~:/home/jovyan/work jupyter/all-spark-notebook
5. An http address will appear in your terminal output, copy this address into your browser and your’e ready to code!

6. Let’s create a simple PySpark notebook.
Click on New -> Python 3
*as you can see Scala and R are also available
7. Inside the notebook, paste in the following code into the first cell. This will create a simple Spark DataFrame, filter it and show filter results:
from pyspark.sql import SparkSessionspark = SparkSession.builder.master("local").appName("Hello World").getOrCreate()l = [('Alice', 1),('Bob', 3)]df = spark.createDataFrame(l, ['name', 'age'])df.filter(df.age > 1).collect()
Put your mouse at the end of the cell and press: Shift + Enter to execute it.

Spark UI is available as we mapped port 4040 (default Spark UI port into our machine), to load it just open http://localhost:4040
More to know:
Jupyter Project offer several docker images, from basic image, data science image to the one we used in this tutorial.
Each of which composed of different libraries and Jupyter kernels.
Take a look at the image selecting page for more details.
Thanks for reading!
