Hyperparameter Optimization on Kubernetes With Argo

David Rosin
Feb 25 · 3 min read

Leverage your Kubernetes orchestration tool to tune machine learning models.

Authors: Sebastian Wanner, David Rosin

Photo by Joshua Sortino on Unsplash

At idealo, we use Argo Workflows & Pipelines to orchestrate our Kubernetes deployments. As machine learning engineers, we found that Argo is in addition to this main purpose also useful to perform compute-intensive machine learning jobs. This allows us to leverage our cluster’s spare capacity for hyperparameter optimization.

Hyperparameter optimization is a way of improving machine learning models, where many modified model versions are trained and then evaluated to identify the best performing candidate. For example, an artificial neural network could be trained with different numbers of neurons to identify the ideal network size. Because hyperparameter optimization usually requires many repetitive training cycles, it makes sense to parallelize the individual training jobs.

How We Use Argo for Hyperparameter Optimization

We start our parallel training jobs through the orchestration tool Argo and manage the hyperparameter optimization via the Python package Optuna. The Argo workflow first starts containers that download the training data and create a database to log results. Then, it starts a study in Optuna with several parallel workers that perform the hyperparameter optimization. Each of these parallel workflow steps repeatedly trains and evaluates the model for different hyperparameters. Optuna picks the hyperparameters for the training in an intelligent way by considering promising past runs. In a final step, the best parameter combination is printed.

The Argo Workflow for hyperparameter optimization

The Argo workflow steps that perform the hyperparameter optimization repeat until a condition is met, such as the maximum training time. The stopping condition could also be the load on the cluster, which would allow to optimally use the spare capacity of the cluster.

Note that most Kubernetes clusters do not support GPU or TPU instances. Therefore, running machine learning jobs on Kubernetes makes the most sense for models that perform well on CPUs.

The Argo workflow definition for hyperparameter optimization of an XGBoost model can be found here: https://github.com/drosin/argo-hyperparameter-tuner.

Two Ways of Starting Training Jobs

In the repo, we include two Argo workflows that differ in how training jobs are scheduled

  1. The workflow hyperparameter-tuner-argo-level.yaml starts a new pod for each training job. This is preferred if you want a clean environment for each run or you have problems with memory leakage.
  2. The workflow hyperparameter-tuner-optuna-level.yaml runs all training jobs of one parallel branch within a single pod. Here, Optuna handles the starting of training jobs. This is preferred when individual training jobs finish fast compared to the pod creation time.

The Bottom Line

You can combine a Kubernetes orchestration tool, such as Argo, with Optuna to run hyperparameter optimization jobs on your cluster. This allows taking advantage of a cluster’s spare to optimize CPU-based machine learning models.

You love machine learning? Have a look at our vacancies.

idealo Tech Blog

👨‍💻👩‍💻 idealo tech and product folks writing about…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store