API Gateway Scope
Before I go ahead and talk about the main subject of this article, I first want you to consider two definitions of completeness:
“Something is considered to be complete if there’s nothing left to be added.”
“Something is considered to be complete if there’s nothing left to be removed without breaking it.”
I am in favor of the second school of thought. However, most API gateways that I am aware of seem to be from the first school of thought. I think Thoughtworks is with me on this when they warn for over-ambitious API gateways:
“Overambitious API gateway products […] encourages designs that continue to be difficult to test and deploy.
API gateways do provide utility in dealing with some specific concerns — such as authentication and rate limiting — but any domain smarts should live in applications or services.”
So the subject of this article is not how to build an API Gateway that features every possible function currently supported by the gateways that are available. Instead, it’s about an API Gateway that is capable of doing just the things for which I consider a gateway to be useful:
- Authentication (but just a small portion of it)
- Rate limiting (in some cases)
- Connection handling and SSL/TLS termination (absolutely)
- Caching (sure, why not)
- Routing (implicitly)
What I aim to prove
So here are the things that I will try to get across in this article:
- You only need a fraction of what is normally offered through an API gateway.
- Most of these things can be addressed by your endpoint yourself, and it helps understanding how to do it.
- Nonetheless, having the ability to offload your service and move some of these concerns into a dedicated “gateway” can be a good idea which does not need to be hard and can be just as lightweight, simple and testable as any other ordinary service. That is if you allow me to rely on Cloudflare workers.
In this article, I will focus on authentication only. Now authentication and API access control is a fairly large topic. So, therefore, I will focus on one particular approach only: an approach that is relying on OpenID Connect for clients to grab a JWT based access token, and presenting that.
So, just to refresh your memory: in this particular scenario, a user is authenticating against an identity provider such as Auth0. Auth0 presents the user a login screen, and the user enters a username and password known to Auth0. Auth0 will return an ID token and an access token as a result.
In OpenID Connect, that access token could be an opaque string, but it can also be a JWT, a JSON Web Token, including a bunch of claims, including claims on the permissions of the user, encoded in so-called scopes.
I’m in favor of that type of access tokens. The beauty of it is that you can present the access token to the service, and all the service has to do is compare the scopes in the access token with the scopes associated with the endpoint. And since it is such a common pattern, every web framework has the extensions to take care of that.
As you can tell, I did not need an API Gateway so far. I moved the responsibility of checking the scope inside the access token into the endpoint which is exactly where it belongs.
However, checking the scope is not the only thing that needs to happen when an access token is presented. If that was all, then anyone would just be able to modify the access token and slam in some additional scopes. In order to prevent that, JWTs tend to be signed. That way, you can longer tamper with it. With a signed JWT, a roundtrip to the Identity Provider is no longer necessary. The JWT can be expected to be valid by just checking if it hasn’t expired yet and if the signature is valid.
Now, the signature validation bit is taking time. And you would probably want to avoid having your service spending too much time on checking digital signatures, and ideally have it spend its precious computational resources on delivering the service itself.
So, if signature validation is something we would want to move out of the service itself, and into something else sitting in front of the actual service, then we need to drill a little deeper to understand what that usually means.
JWTs can be signed either with a secret (based on HMAC) or with a public/private key pair (mostly relying on RSA). I’m leaning towards the RSA based signatures since that means you only need to share the public key, which reduces the chance of anything tampering with a JWT significantly. (You would need the private key to sign the JWT, and that key never leaves the identity provider. Win!)
However, even in the case of the RSA-based JWTs, the thing in charge of signature validation needs to have access to the public key. That public key is not guaranteed to stay the same during the lifetime of your service. Every time the private key changes, the thing doing the signature validation also needs an updated public key.
The good news is: there is a standard for receiving the public keys. It’s called the JSON Web Key specification, and it allows you to pull down the latest public keys through an HTTP request, like this one: https://staged.eu.auth0.com/.well-known/jwks.json
So any mechanism that needs to validate the signature of a JWT can just grab that resource, find the corresponding public key by comparing the
kid (key identifier) in the JWT with the
kid in that JSON file and use that public key to validate the signature.
Cloudflare sits on the edges of the network. It has servers scattered across the globe to handle incoming HTTP traffic, and take care of DoS attacks, caching and the like. Sounds a bit like what you would like to have in an API Gateway, doesn’t it. So would it not be cool if they would also be able to do signature validation?
Currently, Cloudflare does not offer the ability to verify JWT signatures. They do offer something else though. They introduced a mechanism for running code at the edges of the network, and they managed to make it really, really cheap, as FAAS type of solution. (I will not get into the details, but I think what they managed to pull off is really clever: read more about it here.)
With Cloudflare workers, we are able to implement the API Gateway logic that we’re looking for: to validate signatures of the JWT based Bearer tokens before they hit your API, and — surprise surprise— it only requires a little over 70 lines of code.
In those 70 lines of code, the worker will check if it has the latest version of the public keys cached locally, by checking for its presence in a key-value store associated with the worker. If it’s there, then it will find the public key by searching for it using the
kid inside the JWT. If it finds the public key, then it will validate the JWT’s signature with it. In case it’s invalid, it will return a
401. If it is valid, then it will forward the request to the API.
Wait, is that all?!
It’s almost all. Ideally, you would want access to your APIs to be restricted to Cloudflare only. The robust way of achieving that is to set up a trusted connection using client certificates. Now, even though Cloudflare fully supports it, many of the current serverless and cloud solutions do not support it. (I’m looking at you AWS and others. Alibaba Cloud does seem to support it.) That means you’re ultimately left with hiding the actual API servers through obscure names and whitelisting only Cloudflare IPs, which is less than ideal. If you would go down that path, then you probably still want to have signature validation built into your services as well, which is not a bad idea anyway, since it simplifies testing.
I have tested the entire setup with an Auth0 based identity provider, a Nuxt based client and a Hapi based service, and it works like a charm. Getting a worker running on Cloudflare is surprisingly easy.
I am pretty excited about this approach. True, the current version is far from perfect, but it does show that you don’t need a complex service to take care of the responsibilities of an API gateway. Your API itself can be a simple set of functions running on a FAAS platform, but the same goes for your API gateway.
And if you rely on Cloudflare workers, you unlock a whole slew of goodness already baked into Cloudflare itself. Anything calling out from your “gateway” to the actual API would already benefit from the caching capabilities of Cloudflare itself. With Cloudflare, you are already protected against DoS attacks, and if you want to implement rate-limiting, it could be easily implemented in your gateway worker by relying on Cloudflare’s KV store.
I’m curious to hear from others how they feel about the approach. If you have some thoughts on the subject, please leave them in the comments section below.