Kubeflow 0.7: Beta functionality in advance of v1.0

Josh Bottum
Nov 11 · 9 min read

We’re excited to announce Kubeflow 0.7, which is enabling industry leaders to build and scale their machine learning workflows. This release delivers beta functionality for many components as we move towards Kubeflow 1.0. With 0.7, we are also delivering an initial release of KFServing to handle inference and complete the build, train, deploy user journey.

We’re happy to see users already benefitting from our Kubeflow 0.7:

“Kubeflow v0.7 provides valuable enhancements that simplify our ML operations, especially on hyperparameter tuning with the new suggestion CR.” Jeremie Vallee, AI Infrastructure Engineer, BabylonHealth.

“We appreciate the Kubeflow Community’s on-going efforts to mature the project, and v0.7 enables us to deliver a wide range of ML models to our diverse customers and their analytical use cases.” Tim Kelton, Co-Founder, Descartes Labs, Inc.

Read on to find out what’s new, what’s changed, and what’s coming.

What’s new in v0.7

Kubeflow 0.7 adds significant enhancements in KFServing 0.2, Kubeflow’s model serving component now officially in alpha.

Model serving and management via KFServing

KFServing enables serverless inferencing on Kubernetes and provides performant, high-abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX to solve production model serving use cases.

KFServing:

  • Provides a Kubernetes Custom Resource Definition (CRD) for serving ML models on arbitrary frameworks.
  • Encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments
  • Enables a simple, pluggable, and complete story for your production ML inference server by providing prediction, pre-processing, post-processing and explainability out of the box.
  • Is evolving with strong community contributions, and is led by a diverse maintainer group represented by Google, IBM, Microsoft, Seldon, and Bloomberg.

Please browse through the KFServing GitHub repo and give us feedback!

Get Started

Once you have Kubeflow 0.7 deployed, you can easily create a KFServing resource to deploy a model. First, create a YAML file model.yaml with the following spec:

And then deploy it:

KFServing also offers a breadth of examples showing usage for different frameworks and use cases. Check them out below:

What’s changed in v0.7

Kubeflow 0.7 features several improvements to major components, including kfctl, monitoring, notebooks, operators, metadata, katib, pipelines, role-based access control (RBAC), Kubeflow SDKs, and documentation.

Kfctl

v0.7 includes several improvements to kfctl, Kubeflow’s command line interface for building and deploying Kubeflow clusters:

  • A simplified syntax to build and deploy Kubeflow configurations, for example, to install on an existing Kubernetes cluster:
  • A v1beta1 KFDef API in preparation for Kubeflow 1.0. This API allows users to fully describe their Kubeflow deployments declaratively using a Kubernetes-style resource.

Kubeflow Applications and Monitoring

All Kubeflow applications now have application resources defined, making it easy to see which version of which applications are installed:

See application details:

Notebooks

Kubeflow 0.7 adds the foundations for improved operations, scalability, and isolation for Kubeflow notebooks, one of our most popular features. The notebook custom resource has graduated to v1beta1 in preparation for Kubeflow 1.0. We’ve also added and improved notebook images, including images for TensorFlow 2.0 images with GPU.

ML Operators

Kubeflow is built on custom resources, which are handled in Kubernetes by automation software known as operators. These operators simplify cluster configuration, performance tuning, and the lifecycles of distributed training jobs. In v0.7, several Kubeflow operators are graduating towards v1.0 status and updated to support kustomize manifests:

Hyperparameter Tuning and Metadata

Kubeflow improves MLOps workflows and reduces development costs by simplifying analysis and iteration, especially for parameter configuration. Comparing metadata and hyperparameters in Katib (our native hyperparameter tuning component) and Pipelines speeds up model parameter analysis and parameter dependencies, as demonstrated in a recent Kubeflow Summit presentation (video to be released).

The Katib v1alpha3 API includes a new Suggestion CR which tracks the suggestions generated for each trial of an experiment by the corresponding algorithm service. The system architecture has been redesigned to make it more Kubernetes-native with better support for events. The v1alpha3 API also comes with a new metric collector implementation. Users can specify the metric collector type in the experiment config. Currently supported types are StdOutCollector(default), FileCollector and TFEventCollector. The internal database implementation is also refactored to support multiple database backends. This release also supports Prometheus metrics that includes controller runtime metrics and creation/deletion/success/failure counters for experiments and trials.

Kubeflow 0.7 also delivers these improvements to Metadata:

  • A logger for automatic logging of resources to metadata.
  • A ML-MetaData gRPC server in addition to the current backend server for Kubeflow 0.7. (In the long run, we want to use MLMD server directly instead of making our own backend server.)
  • A watcher for Kubernetes resources as a way to extract & log metadata. (This is an initial proof of concept (POC) and won’t installed with Kubeflow v0.7 by default. We want to further integrate with other Kubeflow CRDs in the future.)

Kubeflow Pipelines

Per the Fall 2019 Kubeflow User Survey, Kubeflow Pipelines is the most popular Kubeflow component, and the community has been busy adding new scale, performance, and functional improvements. Kubeflow 0.7 adds several metadata enhancements, samples and integrations with other ML components to Pipelines:

Multiuser Support

The profile resource in Kubeflow allows users to easily manage namespaces in which they consume Kubeflow. In v0.7, the profile resource is beta and we have made the following improvements.

Aggregated RBAC

Kubeflow 0.7 now defines three aggregated cluster roles that can be used to grant admin, edit, and view permissions across all Kubeflow applications

As the sample output below shows, the kubeflow-viewer role combines permission across different resources corresponding to different applications, like notebooks and tfjob.

Granting Permission to Use Kubeflow

The new aggregated roles make it easy to grant permissions to consume all the different applications in Kubeflow.

Using kubectl:

Kubeflow also includes a UI which makes it easy to share access to an existing namespace.

The UI will create role bindings to role kubeflow-edit for all contributors; granting those users the ability to consumer all Kubeflow resources in that namespace.

SDKs

To provide a more consistent experience, the fairing and metadata python SDKs are now subpackages in the kubeflow python namespace. This means they can now be imported as follows:

The fairing and metadata packages still need to be installed separately.

Workload Identity

On Google Cloud Platform (GCP), we’ve switched to using workload identity to make GCP credentials available to pods. Workload identity allows Google service accounts(GSA) to be associated with Kubernetes service accounts(KSA). Pods running with the KSA can then authenticate to GCP APIs using that GSA. This eliminates the need to manually download and store credentials for GSAs as Kubernetes secrets. The Kubeflow profile controller takes care of automatically binding the Kubeflow service accounts to suitable Google service accounts. More information is available in the docs.

Documentation

The Kubeflow 0.7 docs include:

Doc contributions are welcome!

What’s coming

The Kubeflow community’s primary focus will now be defining and delivering Kubeflow 1.0, especially for kfctl, notebooks, pipelines, operators and metadata. We expect further maturing of the other components and the release upgrade functionality, which is described below.

Installation and Configuration Improvements

We’ve worked to simplify Kubeflow installation and configuration for both users and maintainers. In a near term release, i.e. 0.7.x, we’ve identified the need for a config in the lines of existing_arrikto_config, and decided to refactor it into a community-supported config. This configuration is “batteries included”, which simplifies installation.

The proposed new name is kfctl_istio_dex.yaml.

What is included:

  • Kubeflow applications: the intention is for this config to be a batteries-included config, with all the Kubeflow applications (Notebooks, Pipelines, TFJobs, etc.).
  • Exposing Kubeflow: Kubeflow is exposed on HTTPS, with a LoadBalancer with a self-signed cert by default.
  • Restrictions: Must not depend on any cloud-specific feature. This is targeted at existing, on-prem clusters with no access to cloud-specific services.
  • Auth: Istio with Dex as the OIDC Provider and the oidc-authservice as the OIDC Client. Basic Auth use Dex basic auth by default, and kfctl will use environment variables KUBEFLOW_USERNAME and KUBEFLOW_PASSWORD for configure basic auth.

Spark operator

Future versions of Kubeflow will include the Apache Spark operator. This will make it easy for users to run spark jobs on Kubernetes for preprocessing, batch inference and other ML related tasks.

Kubeflow upgrades

The community has worked to design and lay the foundation for Kubeflow upgrades. The requirements and functionality are defined in this Upgrade Design document. We plan to implement an initial version of the upgrade functionality in Kubeflow v1.0, though a preview of the upgrade functionality may be available in an upcoming v7.x release as well.

Community

Finally, thanks to all who contributed to Kubeflow 0.7! Community is at the heart of our project, and we are very thankful to 150+ contributors from 25+ organizations working together to build a Kubernetes-native, portable and scalable ML stack.

Contributors on day 2 of our recent Kubeflow Summit

In the lead-up to Kubeflow 1.0, we will need even more help, especially around testing beta functionality, reporting bugs, and updating documentation.

Here’s how to get involved:

Thanks to Animesh Singh (IBM), Johnu George & Krishna Durai (Cisco), Constantinos Venetsanopoulos (Arrikto), Jeremy Lewi, Richard Lui, Lun-Kai Hsu, Ellis Bigelow, Kunming Qu, Pavel Dournov, Sarah Maddox, Abhishek Gupta, Holden Karau, Jessie Zhu, and Thea Lamkin (all of Google) for contributing to this post.

kubeflow

Official Kubeflow Blog.

Josh Bottum

Written by

Kubeflow Community Product Manager & VP of Arrikto

kubeflow

kubeflow

Official Kubeflow Blog.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade