Scale ML experiments from JupyterLab to the cloud

Optumi
3 min readMay 25, 2022

--

Starting in JupyterLab

Data professionals love Jupyter notebooks. This is no secret and no surprise — we understand why! They are tried-and-true tools for interactive data exploration, visualization and analysis.

Jupyter notebooks have also become the de facto standard for machine learning. Data scientists have found them very useful for testing different algorithms and hyperparameter values. In a recent survey we conducted in the LinkedIn group Machine Learning Community, over 50% of ML practitioners picked JupyterLab as their go-to IDE for ML experimentation.

LinkedIn survey
LinkedIn poll

Outgrowing your laptop

However, the story does not end there. It used to be common to run entire ML workflows locally, but that is starting to change. A growing number of data scientists are exceeding the computational limits of their laptops.

Why? Because state-of-the-art machine learning is becoming accessible to the average data scientist.

  • Frameworks like Keras and Fast.ai are making it easy to use deep learning algorithms
  • Services like Hugging Face are offering free access to pre-trained models (~50k at the time of this writing)
  • Techniques like transfer learning are making it easy to tailor pre-trained models to new tasks

The democratization of machine learning is very exciting but creates new problems to solve. Data scientists are now more likely to run out of RAM, require one or more GPUs, and wait hours-days for model training to finish.

If your laptop doesn’t cut it, where do you turn?

Leveraging the cloud

A natural answer is — you guessed it — cloud computing. Public clouds like AWS, Azure, and GCP have seemingly unlimited computational power that you can rent on-demand. This includes machines with TBs of RAM, the latest GPU models, and even custom accelerators purpose-built for ML.

So, problem solved? Not quite.

As it turns out, the cloud infrastructure was not tailor-made for data science professionals. Ask any data scientist who has been forced to provision and manage their own EC2 instances — it’s not fun and often why GPUs are left idle for days at a time.

The second issue concerns the fragmentation of workspaces. Typical data science workflows include more than just compute-intensive modeling and there are still many tasks well-suited for JupyterLab. A lift-and-shift to a permanent cloud environment is often unnecessary, expensive, and undesirable (many data scientists like their local setup!). At the same time, splitting the work between different environments can get messy and hard to maintain.

Leveraging the cloud *without leaving JupyterLab*

We set out to create a better solution and built Optumi: a JupyterLab extension for unified laptop-to-cloud experimentation.

The core experience is simple:

  • Start an ML project in local JupyterLab
  • Click launch when you need more powerful resources
  • Optumi sets up a cloud instance to mimic your local environment and executes your notebook
  • Optumi lets you know when your notebook finishes, stores the results and cleans up the instance

That’s it! Lightweight and unintrusive. We want to empower you to focus on data science and only enter when you need a hand with infrastructure. We want you to have your cake and eat it too.

We are also, of course, continuing to iterate on the Optumi solution and greatly encourage feedback. Please feel free to sign up and give it a try at optumi.com or reach out to us at cs@optumi.com.

Until then — happy experimenting 🧪

--

--

Optumi

Experiment orchestration for easy MLOps 🚀 Sign up for the beta 👉 bit.ly/3dEF5lN