The Making of Kubeflow 1.0: Designing for stability and adoption

Josh Bottum
kubeflow
Published in
4 min readMar 2, 2020

An update from the Kubeflow Community Product Management Team

The broad market adoption of Kubeflow 1.0 demonstrates that an open source, machine learning community can effectively develop, iterate and sustain the operating structure and processes that are required for stable code and documentation delivery. Kubeflow furnishes researchers with the flexibility of a portable and composable software stack that builds, trains and deploys ML models efficiently. With Kubeflow 1.0, researchers leverage optimized workflows that speed model iteration and time-to-market. In addition, Kubeflow’s Kubernetes foundation provides industry-validated operational and infrastructure efficiencies, which yields your best-in-class Total Cost of Ownership (TCO).

As part of the Kubeflow Product Management Team, I have the good fortune to learn from and work with ML leaders, who are changing the world. Supporting this group is both challenging and rewarding. Technically, Kubeflow represents a complex and rapidly improving project — new releases and improvements in Kubeflow and Kubernetes are being integrated into the software stack every 90 days. From a product management standpoint, our ability to make priority calls on direction has been simplified by the community’s 1.0 Application Guidelines template and process. 1.0’s processes have delivered a foundation for application stability and quality. 1.0 is a significant milestone that fulfills the initial 2017 vision created by Jeremy Lewi (Google) and David Aronchick (Microsoft).

Kubeflow 1.0: A journey in open source processes

Kubeflow is a collection of applications that automate end-to-end machine learning workflows. These applications are maturing on different timelines. The community has defined the process to graduate an application to a “stable” / 1.0 version. This process includes a Versioning Policy (stable, beta and alpha) and the Application Requirements template, which was originally proposed by Google’s Michelle Casbon. To reach stability, an application will satisfy the requirements defined in the template.

Kubeflow’s 1.0 Project Board

To track Kubeflow applications’ progress against these requirements, we introduced the Kubeflow 1.0 Kanban Board, a GitHub project, along with 1.0 boards for individual components (see all the Kubeflow boards here). This tool enables users and engineering leaders like Google’s Abhishek Gupta, Jessie Zhu, and Ellis Bigelow to organize and remind the community’s contributors on the important deliveries for each release.

The community also publishes a roadmap and takes regular User Surveys, which helps users to provide input and direction. Google’s Sarah Maddox, whose efforts in solving difficult documentation issues and organizing Doc Sprints, has been instrumental in leading the community to maturate and refine the Kubeflow docs. Vangelis Koukis (Arrikto), Debo Dutta (Cisco), Kam Kasravi (Intel), Animesh Singh (IBM), Jiaxin Shan (AWS), Pete MacKinnon (RedHat), Dan Sun (Bloomberg) and many members of their teams have contributed thought leadership, documentation and extensive code deliveries.

These contributors, along with many, many others, have matured critical components such as multi-user isolation and authorization, kfctl, Jupyter Notebooks, KFServing, and Hyperparameter Tuning (Katib). We’d also like to highlight Jeff Fogarty’s (USBank) efforts to test new Kubeflow 1.0 versions and triage new GitHub issues. Below please find more details on Kubeflow components and the community’s processes below.

Important details

Here are the components being released as stable and beta as part of the Kubeflow 1.0 release (see our versioning policy docs for a regularly updated list of component and their versions).

  • Stable Components: Central Dashboard, Notebooks, kfctl, Profile Controller, Docs, TFServing, Training Operators (Tensorflow, PyTorch)
  • Beta Components: Kubeflow Pipelines, Metadata Store, HP Tuning, KFServing, Fairing, Kale, Training Operators (XGBoost, MxNet)

To graduate to a stable version, the community validates that an application satisfies requirements around the following component attributes:

  • Configuration and deployment
  • Custom Resources
  • Logging and monitoring
  • Docker Images
  • CI/CD
  • Docs
  • Ownership/Maintenance
  • User Adoption

Each application 1.0 has a Kanban Board, which tracks its progress. The community may define additional requirements for individual Kubeflow applications on an as-needed basis.

Here are the key documents that define Kubeflow Project’s 1.0 processes:

Supporting broad adoption

The collaboration of Kubeflow code contributors and users is critical to the community’s development of end-to-end machine learning workflow automations and best practices. Kubeflow’s machine learning benefits are materially enabling companies to achieve their goals. See our Kubeflow 1.0 announcement for user testimonials from Chase, Volvo, US Bank, and others.

The Kubeflow Community is growing at a rapid rate. There are now hundreds of contributors from over 30 participating organizations. Per the recent Kubeflow Survey, 40% of those respondents are new to Kubeflow. This growth is driving new ideas and deliveries, which continue to fuel advantages for users and contributors alike. We look forward to welcoming you into the community and collaborating on your goals.

Here’s how to get involved:

Try Kubeflow today:

If you have questions or run into issues, please leverage the Slack channel and/or submit bugs on GitHub. As always, we truly appreciate the support of the many users and contributors who delivered on Kubeflow 1.0, and we look forward to building Kubeflow 1.1 with you!

--

--

Josh Bottum
kubeflow

Kubeflow Community Product Manager & VP of Arrikto