A Modern Approach

API Authorization with Kubernetes, Traefik and Open Policy Agent

Protecting your City Walls in the Cloud Native Era

Santiago Ignacio Poli
etermax technology

--

AAPIs are everywhere, as lots of companies use them to facilitate the development of new technologies, both for internal use and for third parties. While there are many types of APIs, in this article we will focus exclusively on Web APIs. These are the kind of APIs that are made available through a Web Protocol (mainly HTTP), using a common Language (JSON) and a well-known architecture (for example, REST).

As you may imagine, exposing functionality over a wire can be both a good and a bad idea. It’s a great idea because the web is a well understood concept, but a very bad one if you don’t take security seriously. Without security, anyone can potentially exploit your system in unprecedented ways.

In this article we are going to talk about how the Platform Team at Etermax decided to tackle this issue by using Open Source Technology.

A little context before starting

At Etermax, we develop Mobile Games. As we developed more and more games, we came to the conclusion that there are pieces of functionality which are shared among all of them. This includes things like user management, matchmaking, rankings or friends, to name a few. That’s why we developed the Etermax Platform, which serves all these through an API.

Authentication vs Authorization

This article will focus mainly in Authorization, but first we have to explain the difference between Authentication and Authorization, as lots of people tend to have a misconception about the two.

Authentication

Authentication is the act of confirming the truth of something claimed as true by some entity. In layman terms, it means confirming a person is who he is claiming to be. In APIs, this can be confirmed by passing a set of credentials (usually in form of a token) on every request.

A passport is a means of authentication when traveling

The process of generating that Token is out of scope for this article, but it usually involves having a login process, in which the user exchanges a set of credentials (username and password, for example) for a token. In our case, our login process generates a JWT (JSON Web Tokens). This kind of token is great because it contains a Base64 Encoded JSON with all kinds of information about the user. It is signed with a Private Key to prevent forging, and a Public Key can be used to ensure validity.

Authorization

Authentication can validate the identity of a user, but it does nothing to prevent that user from accessing sensitive or private data. A user can be authenticated, but being an authenticated regular user is not the same as being an authenticated admin. In fact, being authenticated as User A is not the same as being authenticated as User B, as both users can have access to different pieces of data.

You may not be authorized to visit some countries without a VISA, even if you are authenticated correctly

You can think of Authorization as the set of processes involved to answer the following question:

Can Entity X perform Action Y on the Resource Z?

Where to put Authorization Logic?

Data can be spread among a lot of services. Since authentication is directly tied to data, and each service has its own data, it seems logical to put authorization logic on every service.

In the Monolith Era, this wasn’t very problematic since a single codebase meant we could centralize Authorization in a single place. But in the Microservice era, having each service handling its own authentication can be pretty difficult to pull off. Imagine you add a new Role to the system: now you need to potentially modify (and redeploy) all your services.

So, is there a better way to achieve this? To answer this, let’s talk about architecture.

The API Gateway Pattern

If you have a bunch of Services, having each exposing its own API, you should think about how to unify the API Experience. Monoliths can be difficult to maintain, but they provide a great API Experience, given there is a single entrypoint for all the available functionalities.
A way to achieve this same kind of experience with Microservices, is by using the API Gateway Pattern. This consists in placing a facade in front of all your services with the purpose of dispatching incoming requests. This is usually implemented in the form of a Reverse Proxy (HAProxy, NGINX, Kong, Traefik, to name a few). You get a single point of failure, true, but also a unified API.
Our team went with Traefik as an API Gateway, because of its speed, reliability, configurability, extensibility and its great integration with Kubernetes. An overview of our architecture and pattern can be seen below.

Requests are first intercepted by the Gateway and then forwarded to a Service. Routing can be done by matching any property of the request (Path, Hostname, Headers, Source IP) with a defined set of rules.

As you may imagine, an API Gateway is a great place to put authorization logic. As its name implies, it acts as the gateway between the outside world and your internal network of services.

Implementing Authorization in an API Gateway

A common way to introduce authorization without too much effort is by using Middlewares: pieces of code that can alter or drop requests before reaching a service. This means we can intercept all requests, and either drop them or forward them to a service, depending on whether the user is authorized or not.

In Traefik, Middlewares are implemented as Webhooks, meaning that requests will be first forwarded to the Middleware and then to the underlying service. This adds an extra hop to the system, which can be a deal breaker for some people. We decided to implement our Middleware in Golang and as a Sidecar to Traefik to reduce latency. This added a few microseconds at best, and 10 milliseconds at worst (99th Percentile), so it was pretty acceptable for us.

This animation shows how a request can be dropped or forwarded to a service depending on the result of the Authorization process.

Powering our middleware with Open Policy Agent

As stated in the previous section, we’ve coded our middleware in Golang. But what does the middleware actually do?

In our quest for implementing a reliable Authorization solution, we discovered Open Policy Agent (OPA for short). It’s a CNCF Backed Open Source tool used to validate policies, given an input and (optionally) some pieces of data. This may sound confusing, so let’s explain:

A Policy is a set of rules (or a single rule) you want to enforce
An Input is the Context in which you are evaluating the Policy (i.e the Request)
Data is a set of predefined values

You can use OPA for anything: from Authorizing HTTP requests, to SSH Authorization, to Admission Controllers in Kubernetes. This guy used OPA to validate the rules of a role playing game.

The Rego Language

To define Policies, OPA uses a Language called Rego. While its purpose is simple, it’s a very powerful language. An example Policy can be seen below:

In the first example, the policy takes the (splitted) path of the request, the HTTP method and the user as parameters. Statements inside the “allow block” are boolean expressions. If ALL the expressions are true, then the entire block is true.

If you look at lines 10 and 11, you will notice something: in line 10, the variable employee_id doesn’t exist, so Rego matches it with the second element in input.path. In the next line, the variable exists, so an equality match is performed.

The second example uses data. In this case, OPA was pre-filled with the data of who is the manager of whom.

Rego will evaluate all blocks until one evaluates to true (in reality, blocks don’t need to be boolean, but let’s keep it simple). If none matches, it will return the default value, which is false in the example.

The return value of the Rule is object containing the evaluation of all the blocks in the Rule. In this case, allow is the only block defined. You can name the blocks any way you want.
Aan example with two blocks with different names
The output of the evaluation of the previous example

To learn more about Rego, I personally recommend reading its documentation and playing with the Rego Playground.

Combining the Authorization Middleware with OPA

As you may imagine, we can rely on OPA to do the heavy lifting. The actual middleware implementation is as easy as generating the input, pass it to OPA and then return 200 if the evaluation succeeded, or 401 if not. Rego has native support for validating and inspecting JWTs, so that came in handy.

An input can be any arbitrary JSON. In our case, it contains the HTTP Method, the Path and all the Request Headers, including the JWT in the Authorization Header

You can use OPA as a standalone service (via a HTTP API) or as a library. As both our middleware and OPA are implemented in Golang, the latter was a no-brainer. Using OPA as a library enabled us to reduce a potential hop in our system. In terms of speed, it is pretty fast, with evaluations taking a few microseconds to complete.

Where do Policies and Data come from?

This is a very valid question: When you ask OPA to evaluate a policy, you need to tell it the name of the one you want to evaluate. This means OPA needs to know about the policies beforehand.

When OPA is initialized, you provide a bundle, which is simply a .tar.gz file including all the policies and the data. Luckily for us, you can point to a remote URL and OPA will fetch it over and over again for an interval.

OPA Bundle Service to the Rescue

Now that we can tell OPA to periodically fetch a bundle from a URL, we need a way to dynamically generate bundles.
That’s why we developed OPA Bundle Service (OBS for short). OBS is simply a tool that fetches information from remote sources and bundles it in a single .tar.gz. That means that when we add or remove a Policy in a Git repository, OBS can spot the change and regenerate the bundle accordingly. The same can be done with data. For example, some of our resources are protected with an IP whitelist. IPs can change over time, so we can store them in a repository and OBS will keep the bundle in sync. There is also a /download endpoint which serves the bundle, that one being the one OPA fetches from.

Key Takeaways

  • Use OPA as a Library when you can: using OPA as a Service is the easy way to get started, but the speed of local execution cannot be matched.
  • Use the Sidecar Pattern: this applies for both your middleware and for OPA as a Service. Doing this reduces a lot of latency.
  • Use Rego’s partial evaluation: I cannot stress this enough. Rego has the ability to precompile some evaluations. In our tests, the difference in speed between standard and partial evaluation was substantial. This is as simple as setting a boolean flag in the API.

Conclusion

Using the API Gateway pattern in conjunction with OPA allowed us to reliably implement Authorization throughout our Services. While it’s still a very young project, its potential is huge. There are a lot of big companies using OPA in production right now, Netflix being the most notorious one.
Although we are pretty happy with the final result, there are some things we would want for the future:

  • Native OPA Integration in Traefik: including OPA Support in Traefik could potentially remove the need of using the Middleware, thus removing a network hop and reducing complexity overall. Giving a thumbs up to the Github issue I’ve created for this will be much appreciated: #4894
  • Lua Middlewares in Traefik: proxies like NGINX or Envoy allow to attach Lua scripts to every request, which is great because they are executed locally. The best of both worlds will be having both native support for OPA and the option to code middlewares in Lua. #1336
  • Kubernetes-Native Implementation of OPA Bundle Service: it is in our plans to allow the creation of policies, static data and dynamic datasources via CRDs in Kubernetes.
  • Open Sourcing OPA Bundle Service: currently, OBS is tightly coupled with some of our systems. There are plans to improve this and release it to the community.

Recommended Videos

Open Policy Agent Intro @ KubeCon 2019
Open Policy Agent Deep Dive @ KubeCon 2019
How Netflix is Solving Authorization Across Their Cloud @ KubeCon 2017

Thanks for reading! 🌈

~ If you liked this article, give some ❤ and recommend it!. That way, more people will be able to read it. Thanks!

~ This is a publication from the Engineering Team at Etermax, the leading mobile gaming company in Latin America.

Follow us on Medium to receive our latest publications! 🦄
You can also follow me on Twitter at @santiagopoli_ 🙋‍♂

--

--