How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform

Welly Dwi Putra
Blibli.com Tech Blog
4 min readJul 27, 2020
Photo by StartupStockPhotos on Pixabay

Once upon a time in the XYZ123 Company,

Anton: “Hi boss, I’ve finished this churn prediction model. We’ve exceeded our accuracy target!”

Budi: “Wow cool, Anton! We could generate a list of customers that has high churn probability and send it to our CRM systems on a daily basis. Let’s start this next week!

Anton: “uhm, I’m not sure we could do that next week”

Budi: “Why? you said that you’ve already finished it”

Anton: “Yes I’ve finished it, but it’s only the modeling part. I need to talk with our ML Engineers to convert my Jupyter Notebook code to Production Ready code. Not only that, we still need to create the scheduling, monitoring, and …”

Budi: “Okay okay, so when will all of that be ready?”

Anton: “If we’re lucky, maybe a month from now”

Budi: “… Could we just run that notebook thing periodically?”

Is the conversation above familiar to you?

Are you still in the place where it’s hard to deploy Machine Learning code to Production?

Photo by Nicolas Hoizey on Unsplash

As Data Scientists, we often lose momentum when our code stuck in the long deployment process. The situation/business context might change in the future, which cause our model needs to be tuned again. For example, in the Fraud Detection field, Fraudsters could adapt & change their behavior later, which then makes our production model unable to catch those Fraudsters (with new behavior).

Because of that, we need a pipeline that could enable us to deploy, schedule and run Machine Learning code easily (and fast!).

The simplest thing is we could just schedule our notebook directly. This approach is already made possible by Netflix. Netflix had created an open source tool called Papermill, which is a tool for parameterizing and executing Jupyter Notebooks.

This also become much easier if we are in the Cloud Environment. We could even have Serverless Architecture for this case.

Inspired from this Google Cloud Blog: https://cloud.google.com/blog/products/ai-machine-learning/let-deep-learning-vms-and-jupyter-notebooks-to-burn-the-midnight-oil-for-you-robust-and-automated-training-with-papermill, here is what we have in Blibli:

Cloud Scheduler

We could setup cron job scheduler using Google Cloud Scheduler. The job will send message to Pub/Sub topic.

Cloud Pub/Sub

Messages from Cloud Scheduler are stored here. It will be listened by Cloud Function.

Cloud Function

We could setup function in Google Cloud Function. The function will be triggered when there is any new event/message in Google Cloud Pub/Sub Topic.

This Function will create Compute Engine (based on our configuration, which consist of Deep Learning VM Image name, Zone, Machine Type, GPU Type, etc.).

Here is the code example in our Google Cloud Function:

note: you should add google-api-python-client dependency to requirements.txt

Deep Learning VM (on Google Compute Engine)

Google provides preconfigured VMs for Machine Learning / Deep Learning applications. Papermill is already installed in that VM too. For more information related to Deep Learning VM, you could check on its documentation: https://cloud.google.com/ai-platform/deep-learning-vm/docs/introduction.

After Compute Engine is created, startup script will be executed(we have defined where to get the startup script, from STARTUP_SCRIPT_URL configuration in Cloud Function).

For starter, you could use the startup script example that is provided by Google: https://raw.githubusercontent.com/GoogleCloudPlatform/ml-on-gcp/master/dlvm/tools/scripts/notebook_executor.sh. Note: We found that startup script from Google is not enough for our cases, so we modified some parts for internal use.

Basically, the startup script will install all required libraries. Then it will run notebook there. After that, it will upload all the results to Google Cloud Storage. At the end, the script will shut down the compute engine itself.

Conclusion

We are aware that this pipeline is not perfect. There are a lot of things that need to be built around it, such as Workflow Management, CI/CD tools, proper ML Monitoring tools, etc. However, having this pipeline has helped us in speeding up our process in doing POCs, deploying simple ML tasks, and even for doing hyper parameter tuning in parallel.

--

--