Kubeflow in 2018: A year in perspective

Thea Lamkin
Jan 2 · 4 min read

The Kubeflow Product Management Team

Just a year ago, we released Kubeflow 0.1 at KubeCon Austin. Since then, the project and its community have grown significantly, both in members and contributions. As of December 18th, there are 100+ active Kubeflow members with the support of over 30+ organizations. In the last 6 months alone, the number of unique PR authors per month has nearly doubled from 40 to 80. (You can page through all these stats yourself here). Even better, many supporting projects have collaborated with the Kubeflow community to extend and expand the value of the ecosystem. We could not be happier to help the machine learning community grow!

Fig 1. Momentum around Kubeflow has grown both in terms of commits and contributing organizations. Credits: David Aronchick and Jay Smith

Here’s a summary of those efforts (both native Kubeflow projects and related K8s integration with Kubeflow):

  • Argo for managing ML workflows
  • Arrikto’s product integration with Kubeflow, for collaboration and workflows that span hybrid and multi-cloud environments
  • Caffe2 Operator for running Caffe2 jobs
  • Horovod & MPI Operator for improved distributed training performance of TensorFlow
  • Istio Integration for TFServing
  • Kubeflow Pipelines for tooling to compose, deploy and manage end-to-end machine learning workflows.
  • Katib for hyperparameter tuning
  • Kubebench for benchmarking of HW and ML stacks
  • Kubeflow on MicroK8s enabling Mac users to easily install default Kubeflow components on a single-node Kubernetes cluster running in a native MacOS hypervisor
  • MXNet Operator for running MXNet Jobs
  • Nvidia Rapids to accelerate Kubeflow pipeline with GPUs on Kubernetes.
  • PyTorch operator for running PyTorch jobs
  • Pachyderm for managing complex data pipelines
  • Seldon Core for running complex model deployments and non-TensorFlow serving

Major Releases

In the past year alone, there have been three major Kubeflow releases: 0.2, 0.3 and, more recently, the 0.4 release candidate, keeping our cadence of one major release every quarter.

These are the key features introduced in the 0.4 RC:

  • Click-to-Deploy UI
  • Kubeflow Pipelines (details below) for orchestrating ML workflows
  • An updated JupyterHub UI that makes it easy to spawn notebooks with PVCs
  • Katib support for using TFJob
  • An alpha release of fairing, a library that makes it easy for data scientists to build and start training jobs directly from a notebook
  • An initial release of a CRD for managing Jupyter notebooks.
  • TFJob and PyTorch reaching beta

Towards a Kubeflow Roadmap to 1.0

We are working diligently to get Kubeflow to its first major version, release 1.0, and plan to have this ready in early half of 2019. This will be a significant milestone for the project. Here are some critical areas for the release:

  • Stabilized APIs for training (TFJob/PyTorch operators) and serving.
  • PyTorch issues
  • TFJob issues
  • Robust support for monitoring and logging
  • Scale and load testing
  • Integration with hyperparameter tuning with Katib
  • Advanced data management

On a high level, the major themes we’ll be focusing on include:

  • Enterprise Readiness
  • Deployment and Development Experience
  • Data Science UI
  • Advanced ML Platform
  • Test Release Infrastructure

A detailed roadmap with information about these features can be found here.

2018 Contributor Summit

In September 2018, the first Kubeflow contributor summit was held. We were joined by over 60 participants in person along with 10 remote contributors from four continents.

Fig 2. Attendees at the first Kubeflow contributor summit, September 2018

To all our contributors, we want to share our deep appreciation for your support! The planning for the next contributor summit is already in works and is planned for the first quarter of 2019.

KubeCon Seattle 2018 and Other Conferences

Kubeflow community members have been actively advocating various components, their features, and their use-cases across several conferences in 2018. A complete list of conferences where the community has represented Kubeflow is available here. More recently during Kubecon Seattle — December 11–13, 2018, the following talks were delivered:

Learn More

If you’d like to try out Kubeflow, we have a number of options for you:

  • Use sample walkthroughs hosted on Katacoda
  • Follow a guided tutorial with existing models from the examples repository. We recommend the GitHub Issue Summarization for a complete E2E example.
  • Start a cluster on your own and try your own model. Any Kubernetes-conformant cluster will support Kubeflow, including those from contributors: Alibaba Cloud, Caicloud, Canonical, Cisco, Dell, Google, Heptio, Intel, Mesosphere, Microsoft, IBM, Red Hat/Openshift and Weaveworks.
  • Several companies are incorporating Kubeflow into their value-adding solutions, including Arrikto and others. Please engage with these community members to give feedback on their use of Kubeflow.
  • Keep up-to-date on project news on our blog and Twitter.

Join the Leading Community for Kubernetes-based ML

As always, we’re listening! Please share the features you’d really like to see that aren’t in Kubeflow today. Some options for making your voice heard include:

A wholehearted thank you to everyone who’s contributed so far! We are getting closer to realize our vision: Making it easy for everyone to develop, deploy and manage portable distributed ML on Kubernetes. With your support, we believe we can build a powerful and open ML platform to democratize AI for everyone.

Thanks to Ankit Bahuguna for authoring this post, and to Jeremy Lewi, David Aronchick, Thea Lamkin, Josh Bottum, Pete MacKinnon, and Constantinos Venetsanopoulos for contributing.

kubeflow

Official Kubeflow Blog.

Thea Lamkin

Written by

open source strategy @kubeflow, formerly @apollographql @docker & @newrelic. makes a mean frittata.

kubeflow

kubeflow

Official Kubeflow Blog.