Kubeflow 0.4 Release: Enhancements for Machine Learning productivity

Published in

kubeflow

6 min readJan 30, 2019

An update from the Product Management team on what shipped with Kubeflow 0.4, including more details on Pipelines and examples on building, training and deploying models from Jupyter notebooks.

Kubeflow is building the leading Kubernetes-based open source community for Machine Learning (ML) software application development. Today, we’re pleased to announce that Kubeflow 0.4 is available and includes many important enhancements to speed the development of ML applications. Here we’ll review the new Pipelines functionality, take a look at Kubeflow 0.5 planning, and introduce two new community initiatives: the Kubeflow User Survey and the upcoming Kubeflow Day in Los Angeles.

Over the last year, the Kubeflow community has effectively organized and delivered 90-day software releases that have significantly increased AI researcher productivity, while improving the platform’s stability, fit and finish. Thanks to the rapid innovation made possible by our community’s 100+ contributors across 20+ organizations, Kubeflow users are gaining strategic advantages in the race to deliver production-quality AI applications.

Kubeflow 0.4 Overview, Benefits and Details

As an overview, Kubeflow 0.4 delivers these key features and updates:

An updated JupyterHub UI that makes it easy to spawn notebooks with PVCs.
An alpha release of fairing, a library that simplifies the process for data scientists to build and start training jobs directly from a notebook.
An initial release of a CRD for managing Jupyter notebooks, which enables creating notebook containers with more control on CPU/Memory/GPU resources.
Kubeflow Pipelines for orchestrating ML workflows, which speeds the process of productizing models by reusing pipelines with different datasets or updated data.
Katib support for TFJob, which makes it easier to tune models and compare performance with different hyper-parameters.
Beta versions of the TFJob and PyTorch operators, which enable data scientists to program their training jobs against a more stable API and to more easily switch between training frameworks.

Per the Kubeflow community’s software release process, Kubeflow’s Open Issues are tracked in GitHub. In addition, the community uses a KanBan Board (GitHub-based) to group features into themes and to track the development progress of their associated release work streams.

A list of our 0.4 KanBan themes:

PyTorch/TFJob beta
Train Deploy From Notebook
More robust inference management
Make ML easier to manage — Model Tracking + HyperParameter (HP) tuning
Kubeflow infra improvement — Doc/stabilization

New: Kubeflow Pipelines

Kubeflow 0.4 has several new deliveries, including Pipelines. Kubeflow Pipelines provides tooling to compose, deploy and manage end-to-end machine learning workflows. Pipelines speed up experimentation by enabling researchers to re-use components and to simplify the end-to-end orchestration of pipelines, ensuring reproducibility of ML experiments.

Pipelines, and their containerized components, can be defined and managed using the Python Pipelines SDK in Notebooks or a local IDE. This simplifies the steps and parameters required for pipelines submission and monitoring. Additionally, Pipelines provides a single console to manage all your machine learning experiments, as well as an intuitive UI to compare experiments.

The Pipelines code is available on GitHub.

Kubeflow in Action

With Kubeflow’s new Jupyter notebook spawner, creating notebooks is easier than ever. Choose from a curated selection of our images with the most popular machine learning libraries already installed, or bring your own custom image.

Kubeflow makes it easy to build, train, and deploy directly from the notebook. You can write your training functions and launch them in a variety of ways — from one-off experiments to full Kubeflow Pipelines. Here’s a simple function that prints out environment variables using TensorFlow.

Now we’ll show how to use that function to leverage the power of Kubeflow. Using this same function, we can instantly add it to a Kubeflow pipeline and launch that pipeline. The Kubeflow Pipelines SDK comes pre-configured in our curated notebook images.

Using the Kubeflow Pipelines UI, we can see detailed information about the pipeline execution status and output.

For one-off jobs, we can use the new fairing library to launch TensorFlow training jobs (TFJobs) directly from the notebook. The output is streamed straight to the output cell in our Jupyter notebook, even though the job may be distributed across multiple workers in the Kubeflow cluster.

What’s Coming

Provide input on Kubeflow 0.5 Planning

The Kubeflow community has started its work on Kubeflow 0.5. The Kubeflow 0.5 themes have been loaded into the 0.5 Kanban Board. Each theme has an owner (or Hero), who project manages the development work associated with their theme. In addition, we are defining advanced functionality in Customer User Journeys (CUJs). Please find two examples of our CUJs below. Please feel free to review the CUJs and provide input before February 5.

Build/train/deploy from a notebook

Multi-User Kubeflow

Participate in the Kubeflow User Survey

The 2019 Q1 Kubeflow User Survey is open! We are asking for feedback from the community to help us prioritize our development efforts and planning. Please take a few minutes to provide your valuable input.

Take the Kubeflow Survey here!

Save the Date — Kubeflow Day LA on March 7

The Kubeflow community will host a Kubeflow Day on Thursday, March 7 as part of the Southern California Linux Expo (SCaLE) in Pasadena.

At Kubeflow Day LA, attendees will enjoy a series of talks from machine learning experts. Attendees will learn about the latest efforts to simplify and scale machine learning solutions and engage in deep dives on the Kubeflow solutions and their components. Kubeflow Day is a great opportunity for local Los Angeles area industry leaders to review the latest in machine learning developments and to network with Kubeflow community members, vendors, and end-users.

Learn More and Join the Kubeflow community

Our community continues to grow quickly because it adds value to individuals, research projects, and corporations alike, and all are welcome to participate in building, testing and documenting Kubeflow. Below please find details on how to get started and how to contribute.

Kubeflow Getting Started

Contributing to Kubeflow

For those eager to join in, we’re listening. Please tell us about the feature (or features) you’d really like to see that aren’t there yet. Some options for making your voice heard include:

Kubeflow Slack channel
The Kubeflow-discuss Mailing list
Kubeflow Twitter
Our weekly community meeting
Please download and run Kubeflow, and submit bugs!

In closing, we’d like to offer a wholehearted thank you to everyone who’s contributed so far! We are getting closer to realizing our vision: Making it easy for everyone to develop, deploy and manage portable distributed ML on Kubernetes. With your support, we believe we can build a powerful and open ML platform to democratize AI for everyone.

Thanks to Matt Rickard (Google), Josh Bottum (Canonical), Abhishek Gupta (Google), and Thea Lamkin (Google).