Multitenancy on Kubernetes with Istio, External Authentication Server and OpenID Connect (Part 1 — Authentication)
Before we dive into any technical details, it makes sense to note that multitenancy is a complex topic and often understood differently, depending on the task you are trying to achieve and the person you are talking to. To set the stage for this article, let me explain what I mean by multitenancy.
At HAL24K we provide our clients with AI-based decision support on a daily basis. We combine data science services and solutions with Dimension, our SaaS-based data science platform. When you subscribe to our platform, you can choose which platform modules are relevant to your business: data processing (dataflow), model training (datalab), dashboards and more. You can give all your users access to those modules or only specific modules. This kind of multitenancy can be represented in a standard matrix permissions structure as depicted by the diagram below.
I will focus on an application-level multitenancy on Kubernetes (how people access applications in the browser), not on the Kubernetes-level multitenancy (how people perform tasks on the Kubernetes cluster)
High-level solution overview
To base our story on a concrete example, we will be showing how JupyterHub can be used in a multitenant way on Kubernetes. We take JupyterHub because our data scientists use it a lot and it has some integration with Kubernetes to spin up Jupyter notebooks as pods, so we don’t need to write our own management software for that. If you manage some other application as independent application instances (without management layer), I will show how multitenancy can be achieved with the same tooling as well. Take a look at the following diagram:
We have Alice and Bob working at different companies and both using our platform. When either of them tries to access JupyterHub, they will have to go through the authentication and after that their request will be routed to their own instance of Jupyter notebook (or another application that they request and have access to, JupyterHub is just an example). This is possible due to three core components:
- OpenID Connect (OIDC): used to verify the identity of the end-user and obtain the user permissions described above
- External Authentication Server (EAS): a service that will perform the actual authentication and can use various authentication schemes (OIDC in our case but it has many more)
- Istio: a service-mesh used for routing user requests to appropriate backends, securing traffic inside Kubernetes and enforcing policies. Put another way: it checks which applications the user wants to access, checks their permissions and allows or denies the request.
You can also see that each tenant has a separate namespace which is needed for resource management, billing, security and routing (discussed later in more detail).
Let’s talk a bit more about how the actual separation between tenants, their users and platform modules happens. First, we separate the tenant and platform modules via DNS with the following convention
module_name.tenant_name.example.com as our tenant may have only some modules enabled and we don't want to provision resources if they are not in use.
We are using Istio Gateway and VirtualService resources to route traffic to the specific module. The gateway allows the specification of hostnames, ports and TLS certificates for incoming requests, while VirtualService handles URL paths, request methods and destination backends.
Before requests can reach any endpoint inside the cluster, we have to make sure they are authenticated, so we added an authz envoy filter to the Istio gateway. Envoy filters allow extending proxy functionality with custom logic. In our case, incoming requests are redirected from the gateway to the
EAS service that in turn talks to our identity provider using tenant-specific client credentials.
One great thing about
EAS is that with a single instance we can configure multiple OIDC client connections (one for each tenant). In
EAS all information about the connection to the identity provider is embedded in the
config_token that has to be generated in advance and then provided to your reverse proxy (via EnvoyFilter in our case). Embedding those tokens in URL makes the config ugly and you may hit URL length limits. So, luckily for us,
EAS has a notion of server_side tokens that allows tokens to be stored on the backend and only puts a reference to that token in the proxy configuration. Nevertheless, we didn't want to configure that reference manually for each tenant, so my colleague edited
EAS code a bit to fetch token references dynamically based on a domain name regex and we are now discussing how this feature can be added to the main repository. More specifically, if the domain name is
*.tenantA.example.com, we will get
config_token from the backend and use it to communicate with our identity provider.
EAS is a great project and its sole maintainer
Travis Hansen is super responsive, so I encourage you to check it out.
Now, after the authentication process is done, we have to check if a given user is allowed to access the requested module. Information about the user is provided via claims in OIDC id_token which is encoded as JSON Web Token ( JWT). If I decode such a token for my user, it will look similar to this stripped-down version:
The important bits are the information about
permissions and this info should be enough to make a decision to allow or deny a user request. Our
EAS token configuration in generate-config-token.js for each tenant looks as follows:
If you are familiar with OIDC, we use authorization code flow. Things to notice:
login_infois our custom scope which signals our single sign-on (SSO) server to provide information about the login session
redirect_urihas a specific host and port
EASservice should not be protected with authentication itself and we already have
EnvoyFilterthat enforces authentication for all https traffic on port 443
authorization_tokenis set to
id_token, so we get all relevant info in the
Authorizationheader that will be used later for request access decision making
nbf: falseis set because we have a time difference between our Identity Provider service and
not beforevalidation fails if we set it to
trueand authentication doesn't work. It's not ideal, but I've encountered the identical issue at multiple companies already, so there is a good chance you may run into that as well.
- The cookie domain is set to
example.com, so we can have SSO for all applications on subdomains like
This set up completes the authentication flow and allows us to deploy simple applications per tenant that will be shared between tenant users. Example architecture is shown below:
Next week, we will publish the second part of this blog, in which I will explore how to limit user access within a single tenant to ensure users can only access their copy of the application.