Kubeflow in 2018: A year in perspective

Published in

kubeflow

4 min readJan 2, 2019

The Kubeflow Product Management Team

Just a year ago, we released Kubeflow 0.1 at KubeCon Austin. Since then, the project and its community have grown significantly, both in members and contributions. As of December 18th, there are 100+ active Kubeflow members with the support of over 30+ organizations. In the last 6 months alone, the number of unique PR authors per month has nearly doubled from 40 to 80. (You can page through all these stats yourself here). Even better, many supporting projects have collaborated with the Kubeflow community to extend and expand the value of the ecosystem. We could not be happier to help the machine learning community grow!

*Fig 1. Momentum around Kubeflow has grown both in terms of commits and contributing organizations. Credits: David Aronchick and Jay Smith*

Here’s a summary of those efforts (both native Kubeflow projects and related K8s integration with Kubeflow):

Argo for managing ML workflows
Arrikto’s product integration with Kubeflow, for collaboration and workflows that span hybrid and multi-cloud environments
Caffe2 Operator for running Caffe2 jobs
Horovod & MPI Operator for improved distributed training performance of TensorFlow
Istio Integration for TFServing
Kubeflow Pipelines for tooling to compose, deploy and manage end-to-end machine learning workflows.
Katib for hyperparameter tuning
Kubebench for benchmarking of HW and ML stacks
Kubeflow on MicroK8s enabling Mac users to easily install default Kubeflow components on a single-node Kubernetes cluster running in a native MacOS hypervisor
MXNet Operator for running MXNet Jobs
Nvidia Rapids to accelerate Kubeflow pipeline with GPUs on Kubernetes.
PyTorch operator for running PyTorch jobs
Pachyderm for managing complex data pipelines
Seldon Core for running complex model deployments and non-TensorFlow serving

Major Releases

In the past year alone, there have been three major Kubeflow releases: 0.2, 0.3 and, more recently, the 0.4 release candidate, keeping our cadence of one major release every quarter.

These are the key features introduced in the 0.4 RC:

Click-to-Deploy UI
Kubeflow Pipelines (details below) for orchestrating ML workflows
An updated JupyterHub UI that makes it easy to spawn notebooks with PVCs
Katib support for using TFJob
An alpha release of fairing, a library that makes it easy for data scientists to build and start training jobs directly from a notebook
An initial release of a CRD for managing Jupyter notebooks.
TFJob and PyTorch reaching beta

Towards a Kubeflow Roadmap to 1.0

We are working diligently to get Kubeflow to its first major version, release 1.0, and plan to have this ready in early half of 2019. This will be a significant milestone for the project. Here are some critical areas for the release:

Stabilized APIs for training (TFJob/PyTorch operators) and serving.
PyTorch issues
TFJob issues
Robust support for monitoring and logging
Scale and load testing
Integration with hyperparameter tuning with Katib
Advanced data management

On a high level, the major themes we’ll be focusing on include:

Enterprise Readiness
Deployment and Development Experience
Data Science UI
Advanced ML Platform
Test Release Infrastructure

A detailed roadmap with information about these features can be found here.

2018 Contributor Summit

In September 2018, the first Kubeflow contributor summit was held. We were joined by over 60 participants in person along with 10 remote contributors from four continents.

Fig 2. Attendees at the first Kubeflow contributor summit, September 2018

To all our contributors, we want to share our deep appreciation for your support! The planning for the next contributor summit is already in works and is planned for the first quarter of 2019.

KubeCon Seattle 2018 and Other Conferences

Kubeflow community members have been actively advocating various components, their features, and their use-cases across several conferences in 2018. A complete list of conferences where the community has represented Kubeflow is available here. More recently during Kubecon Seattle — December 11–13, 2018, the following talks were delivered:

Deep Dive: Kubeflow BoF: Jeremy Lewi, David Aronchick (Slides | Video)
Eco-Friendly ML: How the Kubeflow Ecosystem Bootstrapped Itself: Peter MacKinnon (Slides | Video)
Machine Learning as Code: Jay Smith, David Aronchick (Slides | Video)
Natural Language Code Search for GitHub Using Kubeflow: Jeremy Lewi, Hamel Husain (Slides |Video)
Workshop: Kubeflow End-to-End: GitHub Issue Summarization: Amy Unruh, Michelle Casbon (Codelab | Slides | Video)

Learn More

If you’d like to try out Kubeflow, we have a number of options for you:

Use sample walkthroughs hosted on Katacoda
Follow a guided tutorial with existing models from the examples repository. We recommend the GitHub Issue Summarization for a complete E2E example.
Start a cluster on your own and try your own model. Any Kubernetes-conformant cluster will support Kubeflow, including those from contributors: Alibaba Cloud, Caicloud, Canonical, Cisco, Dell, Google, Heptio, Intel, Mesosphere, Microsoft, IBM, Red Hat/Openshift and Weaveworks.
Several companies are incorporating Kubeflow into their value-adding solutions, including Arrikto and others. Please engage with these community members to give feedback on their use of Kubeflow.
Keep up-to-date on project news on our blog and Twitter.

Join the Leading Community for Kubernetes-based ML

As always, we’re listening! Please share the features you’d really like to see that aren’t in Kubeflow today. Some options for making your voice heard include:

Kubeflow Github
Kubeflow Slack channel
Kubeflow-discuss Mailing list
Our weekly community meeting
Please download and run Kubeflow, and submit bugs!

A wholehearted thank you to everyone who’s contributed so far! We are getting closer to realize our vision: Making it easy for everyone to develop, deploy and manage portable distributed ML on Kubernetes. With your support, we believe we can build a powerful and open ML platform to democratize AI for everyone.

Thanks to Ankit Bahuguna for authoring this post, and to Jeremy Lewi, David Aronchick, Thea Lamkin, Josh Bottum, Pete MacKinnon, and Constantinos Venetsanopoulos for contributing.