JupyterHub setup on multitenant Kubernetes cluster

Multitenancy on Kubernetes with Istio, External Authentication Server and OpenID Connect (Part 2— Authorization)

Maksym Lushpenko | brokee.io
HAL24K TechBlog

--

In the previous blog post, I discussed how an External Authentication Server is used at HAL24K, together with OIDC and some parts of Istio, to perform user authentication and direct users to a shared application within a specific Kubernetes namespace. The blog ended with the receipt of an id_token from our Identity Provider. Today, I will look at how an id_token allows us to make decisions about whether a given user has access to the specific application instance. This token contains information about thecurrent_tenant, name of the user and permissions to access some platform modules. The token itself is passed by the EAS service to our application via Authorization header.

The id_token that we get from the Authorization header is still encoded as a JSON Web Token and we need some component to parse it. Istio has a concept of End User Authentication which basically works by extracting the JWT token from the Authorization header (other custom headers are possible as well), validating it with the Identity Provider (OIDC) and parsing it into the request.auth object that can be used by other Istio components.

Now, having all the information about the user in the proper format, we have to make a decision about allowing or denying the request. Istio is a quite complex and powerful piece of software and can make such decisions with its Authorization functionality that works both on HTTP and TCP services. It is pretty impressive as well as flexible: ServiceRole allows you to specify which service you want to protect inside the cluster and exactly how (methods and paths), while ServiceRoleBinding specifies who can use a given ServiceRole, and that's actually the place where we make a decision to allow or deny the request based on request.auth claims described above. So, essentially, for every module inside the tenant namespace, we define ServiceRole and then in ServiceRoleBinding, we check if the current user is part of the tenant and has the appropriate permissions.

Multi-user configuration

So far, what I’ve covered makes sure that the user can log in and how we can decide if the user is allowed to access a specific application while being part of the given tenant. Even though we know the user has access to JupyterHub in general, we have to make sure that access is limited to the specific Jupyter notebook. That’s where the real difference in set up between our example application (JupyterHub) and a generic solution will be visible. I will first consider JupyterHub as I referred to it as our main example and I’ll then take a look at the generic set up for any application.

JupyterHub

I mentioned in my first blog (see “High-level solution overview”) that JupyterHub has some integration with Kubernetes. Kubernetes-specific tasks in JupyterHub are handled by the kubespawner: it manages the life-cycle of single-user Jupyter notebooks. To make Kubespawner work in conjunction with EAS (basically, to make JupyterHub aware of our authenticated users), my colleague used this project. With the following two lines of code in thejupyterhub_config.py file:

JupyterHub authentication config

we are able to propagate current user info to JupyterHub via the X-User-Id header. The only question is: how can we pass this custom header and how can we populate it with the actual username? Istio to the rescue, again. We are using an Istio policy rule to append the X-User-Id header based on request.auth.claims["name"] value. This becomes possible only after enabling Istio Policy Enforcement which is handled by the component called Mixer. So, this allows us to complete multi-user separation within a single tenant for JupyterHub. The complete diagram of this is shown below:

JupyterHub setup with Istio

Generic solution

To make new applications multitenant on Kubernetes without writing your own authentication/authorization, you will need to find a way to manage each application’s instance access and routing outside of your application (JupyterHub manages access and routing for single-user notebooks in the previous example). This kind of access management can be done via path-based or even header-based request routing. That’s actually what JupyterHub does under the hood, but I will show how to do it with Istio.

Header-based routing can be done via matching a specific header to the username and then routing the request to a user-specific instance like jupyter-maksym. The drawback here is that you still need an Istio policy functionality to get that header and that involves extra work and complexity.

Path-based routing can be done by creating a VirtualService that rewrites a request from http://module_name.tenant_name.example.com/maksym to http://module_name-maksym that will reach the module_name-maksym pod in the tenant_name namespace.

You may say that anyone can change the URL path to another user and get access to the application instance where they don’t have permission to, but that’s where Istio ServiceRole and ServiceRoleBinding are needed to make sure that only the user maksym has access to the module_name-maksym service.

The drawback for both approaches (header-based and path-based routing) is that you have to pre-create pods and roles for each user upfront. But that’s something you have to do anyway if your application doesn’t have some management layer and Kubernetes integration as JupyterHub does. To clarify this point — when users go to JupyterHub and launch their notebook, JupyterHub will take care of pod creation and routing. At the same time, for your application without the management layer, several Istio components have to be in place before the user hits the application URL in the browser. One solution could be to have some webhook that creates those components when the user gets specific permissions in your identity provider database.

This is what a generic set up could look like for the path-based routing approach:

Jupyter Notebooks with Istio

As you can see, it is very similar to the previous diagram, but without Istio Rule. Another difference is that Istio roles, bindings and auth policies are user-specific, not a single policy/role/binding per platform module.

I assume that with some user-specific labels that you will apply to all application instances which belong to that user (pods datalab-maksym, dataflow-maksym labelled with user=maksym), you could reduce the number of ServiceRole and AuthPolicy resources. But I haven't tested it myself.

Final remarks

I included a very information-intensive diagram in the article banner while describing simplified versions of it throughout the blog. Knowing Istio’s complexity, I would be happy to spend more time referring to that diagram and posting full code snippets of Istio configuration. Please let me know in the comments if you would like to see a more detailed JupyterHub set up and I can follow up on that in the next blog post.

Special thanks to Samuel Hessel, Tim Stokman andTravis Hansen for their collaboration in making this work.

--

--

Maksym Lushpenko | brokee.io
HAL24K TechBlog

Cloud/DevOps/SRE/buzzword engineer :) I enjoy writing about complex problems solved at work or simple tricks that may be useful on a day-to-day work.