Enabling Kubeflow with Enterprise-Grade Auth for On-Premise Deployments

Krishna Durai
kubeflow
Published in
7 min readFeb 27, 2020

Thanks to Kunming Qu, Lun-Kai Hsu (Google), Kam Kasravi (Intel), Yannis Zarkadas (Arrikto) and Krishna Durai (Cisco) for contributing these features.

Kubeflow started in late 2017 from being a single Kubernetes Operator for Tensorflow and has grown into a suite of ML applications built on top of Kubernetes. Today, Kubeflow is actively used by Babylon Health, JP Morgan and Chase, Snap Inc., Spotify, Lyft, to name a few. Kubeflow’s widespread adoption is also reflected in the latest Kubeflow user survey.
With the Kubeflow 1.0 release in January 2020, we expect even more enterprises to adopt Kubeflow for their ML needs.

Almost 50% of our users are enterprise users and 60% of them have deployed Kubeflow in on-premise Kubernetes clusters. Over the last year, we have seen an increase in demand for an authentication and authorization solution within Kubeflow.
This would allow multiple users to access the same Kubeflow cluster, in a secure and isolated manner.
In this blog post, we present how our community designed and implemented Authentication and Authorization in Kubeflow.

Kubeflow Components Overview

Kubeflow is composable. Kubeflow allows users to incorporate their own ML applications to work with the standard Kubeflow application suite. For the purpose of this blog, we’ve divided the applications into two broad categories:

  • High-Level Applications
  • Low-Level Applications

High-Level Applications

These are the applications in Kubeflow which have a web UI exposed to the user for operation. Example applications include:

  • Jupyter Notebooks: these are normally used by data scientists to experiment and build ML models.
  • Kubeflow Pipelines: Pipelines is an orchestrator for ML workflows

High-level applications in turn use low-level applications to execute their workloads. An instance of Jupyter Notebook might invoke a TensorFlow Job by requesting the TensorFlow Operator, which is a low-level service.

Low-Level Applications

These applications are defined through Kubernetes Custom Resources which are the basic building blocks of the Kubeflow application suite. Example applications include:

These resources are accessed through kubectl, since they are custom resources.

Components of Kubeflow

Multi-User Isolation in Kubeflow

One of the core aspects of Kubeflow is to provide its Data Scientists with an isolated view of Kubeflow, called Profile, based on Kubernetes namespaces. This allows a Data Scientist to have a self-service environment within Kubeflow. Profile owners have the option to share their profiles with other users.

To enable isolation and access control we had to ask ourselves:
How do we authenticate and authorize users?

Design Decisions

To implement authentication and authorization on these applications and support it in an on-premise environment, we took the following decisions:

  1. Isolate resources using Kubernetes namespaces, as the namespace is the isolation primitive of Kubernetes. The Kubeflow Profile CRD is a wrapper around a Kubernetes namespace.
  2. Leverage the Istio Service Mesh in order to authenticate requests, multiplex http paths into different services and authorize access to network services.
  3. Build authentication using the OIDC standard: this allows a standard, interoperable way to interpret OIDC claims from multiple Identity Providers.
  4. Use Dex as an OIDC Provider. Dex helps present an OIDC compliant interface to OAuth2 Providers and LDAP/AD databases.
  5. Design low-level applications as Custom Resources in Kubernetes.

High-Level Applications

Let’s take a look at the user journey of a user accessing the Kubeflow Web UIs with their browser.

  1. The user goes to the browser and visits the kubeflow address. If the user is not logged in, they are redirected through Dex to the configured identity provider. It can be an LDAD/AD database, an OAuth2 IdP like Github or LinkedIn or just a static user file.
  2. After the user logs in, a session cookie is saved in their browser. Once the request enters the cluster, the Istio Gateway will authenticate the request and add a special KUBEFLOW_USERID header to the request. This special http header reveals the identity of the end-user making the request. Authentication is done once at the Gateway and then every Kubeflow application can learn of the user’s identity just by reading that header.
  3. The request arrives at the backend application. To find out if the user has the permission to execute the requested action, the backend application performs a SubjectAccessReview to Kubernetes. That way, Kubernetes is used as the authorization database.
  4. For network services, such as a user’s Notebook, Istio RBAC is used for authorization.

To enable the Istio Gateway to authenticate users with OIDC, we had to explore, design and implement an interesting solution. You can read more in this blog post.

We provide each Profile with Jupyter Notebooks which are isolated by namespaces. Access to a notebook is then exposed by a service in the profile’s namespace. The HTTP routes to access the notebooks’ web UI are subject to rules in Istio’s RBAC definitions. In order to achieve isolation, we create Istio ServiceRoles and ServiceRoleBindings which restrict a user based on the profile namespaces they are allowed to access.

A profile admin’s Istio ServiceRole and ServiceRoleBinding definitions would look like the following:

Istio Service Role for a Profile
Istio Service Role Binding for a Profile

Low-Level Applications

Being designed as a Custom Resource in Kubeflow immediately gives us the ability to use the Kubernetes API layer for operating on these resources. If one wants to access, create or delete a TensorFlow Job, for example, one can do it by using a Kubernetes verb: get, create, or delete using kubectl.

Apart from the ease of use factor, the Custom Resource model enables Authentication and Authorization using the Kubernetes API.

To enable authentication on Kubernetes API layer, any of the following methods can be used:

  1. Using the Kubernetes OIDC Authentication Plugin
  2. Kube-OIDC-Proxy
  3. ServiceAccount JWT to RBAC

Methods 2 and 3 allow for Authentication to be configured in managed Kubernetes clusters, where access to modify Kubernetes API configuration is restricted.

Defining Access Control to Low-Level Applications

Starting from v0.7.0, we introduced three new ClusterRoles for Kubeflow:

  1. kubeflow-admin
  2. kubeflow-edit
  3. kubeflow-view

This allows us to manage access control to users effectively. You will see why soon.

We decided each low-level service should have an RBAC definition for each role. As an example, here’s kubeflow-tfjob-view:

Kubeflow TF Job View RBAC Rule

And by using Aggregated Cluster Roles, we combine each such definition into our Kubeflow ClusterRoles as shown in the diagram below:

These Kubeflow ClusterRoles can then be assigned to a user.

Applying Kubeflow ClusterRoles in Usage

Each Aggregated ClusterRole (admin, edit and view) has a service account for each kubeflow role defined in a profile’s namespace. While a user accesses low-level resources through operations in a Jupyter Notebook, the notebook mounts and uses a service account associated with the user’s role.

A user’s Jupyter Notebook is bound to that user’s role’s service account using a RoleBinding, i.e., if a user has edit privileges, the notebook is associated with the service account with edit permissions thus ensuring all operations from the notebook are consistent with a user’s privilege level.

For access to Kubeflow operations through kubectl we do something similar. Profiles controller in Kubeflow creates a RoleBinding for each user using the profile. For example, a user having edit privileges will have the following RoleBinding created:

Role Binding to Associate a User to a Kubeflow Aggregated ClusterRole

Using this RoleBinding, a user accessing Kubeflow resources through kubectl is restricted.

Final Notes

Over the past few months, we built an end-to-end Auth system for Enterprise deployments in a complex stack with many components. This implementation of Auth is generic enough to be leveraged by any platform built on top of Kubernetes.

The challenges around Kubeflow’s Auth system are still young and require a continuous contribution from the community. Few of our challenges include having user groups for profile isolation, programmatic access to APIs and further extensibility for other Kubernetes applications to leverage our Auth system.

Further Reading

  1. Enabling Kubeflow with Enterprise-Grade Auth for Kubeflow On-Premise, Kubecon San Diego 2019 — Yannis Zarkadas, Krishna Durai (video, presentation)
  2. Kubeflow: Multi-Tenant, Self-Serve, Accelerated Platform for Practitioners — Kam Kasravi, Kunming Qu (video)
  3. Kubeflow v0.7.0 release blog
  4. Kubeflow User Surveys (Spring Survey, Fall Survey)

Authors

Krishna Durai, Yannis Zarkadas

--

--