How to start Jupyter in Google Cloud — the Python way

Michal Mrázek
Google Cloud - Community
5 min readApr 18, 2022

Did you know that you can easily start a Jupyter notebook in Google Cloud with Python SDK? And automatically mount GCS buckets, add GPUs or use your own containers?

Photo by Su San Lee on Unsplash

This is the first article in a series on Manipulating GCP Vertex AI with Python SDK. Subscribe to get the next blog post.

Google launched its new machine learning platform Vertex AI in May 2021, succeeding the previous AI Platform. They also released SDKs for multiple languages. Let us focus on Python as that would be the first choice for many data scientists.

Install Python SDK

Google currently maintains two Python libraries for Vertex AI services:

  • google-cloud-notebooks — for manipulating Vertex AI Workbench (aka Jupyter Notebook on GCP)
  • google-cloud-platform — for manipulating everything else

So for now, we just install the notebook lib:

pip install google-cloud-notebooks

Documentation on Python libraries for GCP can be sometimes hard to find, notebooks SDK is described at https://googleapis.dev/python/notebooks/latest/index.html.

Authorization with Google Cloud

I will assume you already created GCP Project and Vertex AI is enabled. If not, follow the official documentation.

There are multiple ways to authenticate to Google Cloud. For our purposes, we will leverage the gcloud CLI tool. We will log-in in the terminal and then the Notebook Service in Python will be able to automatically pick up the credentials.

  1. Install the gcloud CLI tool
  2. Login to gcloud with your user account
gcloud auth login

Alternatively, you could use a service account:

gcloud auth activate-service-account --key-file=$KEY_FILE

GCP permissions

To be able to start and delete notebooks, you will need roles/notebooks.runner. However, for the notebook deletion, roles/notebooks.admin is required.

Python Notebook Service

Google offers two ways to manipulate with notebooks — Async Client and Sync Client. It is entirely up to you which to choose. For the sake of simplicity, I will use the Sync Client for the rest of the tutorial. Both are documented here.

Now we can initiate the Notebook Service Client:

from google.cloud import notebooks_v1client = notebooks_v1.NotebookServiceClient()

Define the notebook

Now the funny part. You have many possibilities in defining your notebook environment and adding as much computing power as your budget allows.

Choosing environment and machine type

You can either base your notebook on a virtual machine image that Google has offered for you or use your docker image. I will cover building images for notebooks in some of the next articles, so let's first focus on the VM images Google offers.

There are multiple Deep Learning VM images available in GCP and for our convenience, many of them also come with preinstalled Jupyter and configured port settings. They have images for Tensorflow, Pytorch, R, and other frameworks. The full list can be found here. In one of the next articles, I will cover how to also easily set up a Julia notebook.

Assuming we want to use Tensorflow 2.8 with GPU support, I will choose VM image family tf-ent-2-8-cu113-notebooks.

Vertex AI Workbench is just a Compute Engine instance with running Jupyter. You can find more information on available Compute Engine machine types in the documentation. For our tutorial purposes, I will choose n1-standard-8 — standard machine type with 8 vCPUs and 32 GB memory. However, you can go as crazy as you wish.

GPUs and disks

There are many restrictions on GPU usage regarding machine types and locations. I will choose NVIDIA Tesla P100. Read the Google page on GPU choosing. It is also possible to attach more GPUs (usually 1,2,4 or 8) but I will go with only one.

When you create a Notebook in GCP, Google creates two disks for you.

  1. Boot disk — where the OS/libraries/initialization scripts live
  2. Data disk — which is mapped to /home/jupyter folder

For each, you can choose between standard / balanced / SSD Persistent Disk. And each can vary in size from 100GB to 64 TB. For sake of simplicity, we can choose balanced disks with 200GB for both options.

Creating the notebook request

With the information we already have, we can build our notebook request like this:

from google.cloud.notebooks_v1.types import Instance, VmImage

notebook_instance = Instance(
vm_image=VmImage(
project="deeplearning-platform-release",
image_family="tf-ent-2-8-cu113-notebooks",
),
machine_type="n1-standard-8",
accelerator_config=Instance.AcceleratorConfig(
type_=Instance.AcceleratorType.NVIDIA_TESLA_P100, core_count=1
),
install_gpu_driver=True,
boot_disk_type=Instance.DiskType.PD_BALANCED,
boot_disk_size_gb=200,
data_disk_type=Instance.DiskType.PD_BALANCED,
data_disk_size_gb=200,
)

That is the basic stuff. You can add more parameters such as labels, tags, metadata, or instance owners. Go here to read all parameters.

Sending the request to GCP

Python SDK often uses something called parent. Parent is just a string containing information about your GCP project and the location you want to use.

Now we are ready to send the request to the GCP endpoint.

project_id = "PROJECT_ID" # Put your own project id here
location = "europe-west1-a" # Put your own location here
parent = f"projects/{project_id}/locations/{location}"
request = notebooks_v1.CreateInstanceRequest(
parent=parent,
instance_id="my-first-notebook",
instance=instance,
)
op = client.create_instance(request=request)
op.result()

The result from client.create_instance is Google's Long-running operation — read the docs. The simplest way to wait for the operation to finish is the method op.result(), which also gives you information about the created notebook or errors during creation.

Here is the complete code:

TIP #1: Mount GCS buckets using startup script

As we said before, Jupyter on GCP is just a configured Compute Engine(CE) virtual machine. Every CE virtual machine can have a startup script — bash or non-bash file executed during the machine startup process. This gives you endless possibilities to boost your Jupyter notebook. I will only show you how to mount GCS buckets with gcsfuse automatically.

Be aware that the startup script runs as the root user. But when you connect to Jupyter you will be jupyter user.

gcsfuse is already preinstalled in Google's provided images. In other images, you will have to install it yourself.

#!/bin/bashLOCAL_PATH=/home/jupyter/mounted/gcs
BUCKET_NAME=my-super-bucket # Change this to your bucket
BUCKET_DIR=notebook_data

sudo su -c "mkdir -p $LOCAL_PATH"
sudo su -c "gcsfuse --implicit-dirs --only-dir=$BUCKET_DIR $BUCKET_NAME $LOCAL_PATH"

I am using two tricks here.

  • —-implicit-dirs for implicit existence of directories, this allows you to see all objects in the bucket, but it comes with several drawbacks, mainly it's more costly
  • —-only-dir to mount only one “directory” of the bucket

This script must be either saved in GCS or on publicly available URL (e.g. Github). I will show you how to upload a string directly to GCS and save as a text without saving locally.

pip install google-cloud-storage

Now only add one more argument to the notebook_instance before sending to GCP:

notebook_instance.post_startup_script = f"gs://{blob.bucket.name}/{blob.name}"request = notebooks_v1.CreateInstanceRequest(
parent=parent,
instance_id="my-first-notebook",
instance=notebook_instance,
)
op = client.create_instance(request=request)
op.result()

TIP #2: Managed notebooks

Google offers two solutions for Jupyter notebooks. So far we used user-managed notebook that allows a lot of customization. The second option is managed notebook where you can actually write code first and choose hardware later. The API is very similar and you can find the documentation here.

TIP #3: Read the article by Lak Lakshmanan

How to use Jupyter on a Google Cloud VM is an excellent article by Lak Lakshmanan (ex-Google) on starting Jupyter on GCP with the gcloud CLI tool. It is a bit older but contains some great tips.

Please subscribe to my channel to get the next blog post on how to use Vertex AI Python SDK. Apart from showing other Vertex AI services, I will soon publish a post showing also how to easily start Jupyter with Julia with Vertex AI Workbench and Cloud Build.

--

--

Michal Mrázek
Google Cloud - Community

Machine Learning Engineer technology enthusiast and passionate learner, currently mainly focused on Apache Airflow na Google Cloud Platform.