Stateless authentication for Microservices
As an API owner, you must consider not only the good design of your APIs, but the non-functional aspects as well.
Think about a bank, it could be the easiest one to make a transfer or pay a bill, but if it is not secure and safe you’re not going to use this bank anymore. This same rules applies for API providers.
Safety in this context means that your API (Microservice) is protected from misuse and security means that your data is protected from malicious activities.
One of the ways to bring safety to your API ecosystem is by using Throttling policies. For creating a security platform not only for API providers, but also for API consumers, you should consider to implement an authentication process.
Throttling is a process used to control the usage of APIs by consumers during a given period.
Imagine a scenario where the Service A is consuming data from a Service B, with a throttling mechanism in place you as a service owner must define the limits of this relationship. In other words, the number of calls (requests) the Service A could make to the Service B in an interval of an hour, for example.
Giving you a real example: what happens if the Service A starts to call infinitely the Service B because a new bug was introduced in the code? Like a service call inside a for-loop without an exit condition.
You never know how often your APIs will be called since it depends on many aspects, from business rules to fallback mechanisms like circuit-breakers.
You can not control what your consumers do, but you can control how your system reacts from that.
Also, with a throttling mechanism, you have data points that could be useful for decision making related to scale up your API.
To throttle your consumers efficiently, you need to know who they are. You don't want to penalize good consumers, you just need to control the possible offenders.
Design your system for failure and considering safety aspects as well.
Resource access restriction
Microservices are autonomous blocks of code that do one thing well.
Consumers do things through operations, that are API's interface with the external world. It's pretty common to have different kinds of operations supported by a single API. For example: some consumers could have permissions to CREATE resource(s), but couldn't DELETE them.
A good case is Amazon's marketplace API to manage orders. In this API, a very specific seller could only manage their orders, not the orders of another seller.
As an API provider, you need to cover these use cases ensuring the security aspects of your APIs and the only way to achieve that is through an authentication and authorization layer between your APIs and their consumers.
A Blockchain is a good case that illustrates the importance of authenticate and authorize users to do operations or even transactions in your systems.
Designing an authentication layer
This is a pattern for cloud architectures.
A sidecar proxy, is a proxy running in the same host where your Microservice, or container, is deployed. Basically, a different process of your Microservice process. This proxy has some built-in functionalities such as: authentication/authorization, retry policies, routing, errors handling and circuit-breakers.
This approach is interesting, because your application is still a Microservice. All the non-functional aspects are not implemented in the service itself, but in the sidecar proxy.
One of the downsides about this approach is because you have more components in your architecture to monitor and deploy. To reduce this complexity, you can use a service mesh solution like Istio and deploy it with Kubernetes.
Other alternative, could be about using an API Gateway.
An API Gateway runs in front of your APIs and the advantage is to delegate the authentication and authorization capabilities to this layer making your Microservice only responsible for the business logic.
One of the differences between this approach and the sidecar proxy approach, is because you have just one component, instead of many like in the sidecar proxy approach. It could be easier to deploy and monitor, but apparently less resilient.
An API Gateway is a single entry point for all of your APIs, naturally it becomes a potential bottleneck or a single point of failure in your architecture.
This approach is totally independent of your cloud-infrastructure model, so you can use an API Gateway for the on-premise model running solutions such as Kong, or also use a managed API Gateway like AWS API Gateway.
For some use cases, mostly when you don't need a common authoritative entity for all of your API's, personally I prefer this approach.
For example, when you need to authenticate a subset of APIs related to a specific domain like pricing.
The advantage of this approach is because you could have the best of both worlds: the authentication/authorization rules at a certain level of isolation without having a single point of failure in your architecture.
The disadvantage of this approach, is to lose partially the technology agnosticism inherent to an architecture based on Microservices. As it is a shared library, you have to run all of your Microservices using the same language that the library uses, or the same runtime like in JVM-based applications.
Breaking down the shared library approach
The pricing platform is the platform used by the commercial team of B2W to price all the products in their websites. There are several ways (models) to price a product and this led to a lot of business logic.
Microservices are a good alternative for cases like that, because you can split all the business rules in well-defined boundaries through a good API contract definition.
In the Pricing platform, we have around 20 Microservices and all of them implemented with Java. Those Microservices manage many resources demanding different access patterns.
JWT and stateless authentications
JSON Web Tokens (JWT) are an open industry standard method for representing claims securely between two parties.
It's a way to implement stateless authentication, that's when you don't need to persist the state of authenticated users (sessions) in any data store.
Using JWT, the session state is persisted as part of the token and you can also persist other relevant data to not only authenticate, but authorize API consumers as well. That means you don't need to retrieve this data in a database making your service faster and consuming less computational resources.
A JSON Web Token (JWT) looks like this:
I'm not going to dive deep in the details of JWT, all what you need to know for this post is:
- A JWT is a JSON string encrypted with a secret key;
- This string is also encoded using base64;
- One token represents a user session;
- It's secure because you encrypt/decrypt the token using a secret key;
- This token is part of each request (Header) from one API to another, like in Basic Auth.
In the diagram below, the server is one API/Microservice.
Shared library, also known as the security module
This library is loaded by each Microservice and abstracts all the business logic to authenticate and authorize API consumers.
We are using spring-security to implement business rules related to the authorization process. With spring-security, you can write less complex code benefiting from the use of annotations and POJOs.
The only requirement of the security module is to intercept all HTTP requests. We're using interceptors since we are using spring boot to build our services, but you can also use Servlet Filters for that.
When a new request arrives in a Microservice, the typical flow in the security module is:
- intercept the request;
- get the token value (string) through the HTTP header X-PRC-TOKEN;
- decode the base64 string;
- decrypt the JWT token resulting in a JSON string with all the consumer session data;
- applies the authorization rules using the data represented by the JSON string.
Renewing the validity of the token
The security module is also responsible for token renewing.
One of the information embedded in a token is it's expiration time. If a consumer is constantly interacting with the system through API calls, the security module renews the validity of the token automatically.
The process related to renewing a token is made in an async manner by another thread. As it is a best effort process, the security module doesn't need to wait for the response of this operation for a long time.
If the thread for renewing the token expiration is executed until the execution of the main thread, the new token is returned in the response headers.
We have an agreement with our API consumers to use the returned token for the following calls.
A new token is generated, because the validity is one of their attributes.
With this feature, we offload calls to the renew operation drastically reducing chattiness in our system.
Protect your domain layer
In the first versions of this module, we just protected the domain layer by intercepting the Controller methods. In case of a scheduling task (worker), the task could have access to the domain layer by injecting Service classes.
We made a change in our application model to avoid situations like that by introducing a proxy layer.
This new layer is the only one that could be injected by Controllers and Tasks and the only with access to the Domain layer as well. All the interceptors are now intercepting the methods of this layer.
Don't include your JWT secret key in the library code
Another important consideration is about the secret key used by the security module to encrypt/decrypt a JWT token.
This secret key must not be a part of the library code, it must be a dynamic variable and overwritten by using an environment variable or a.env file.
You could store the value in a data vault solution and replace the value during continuous integration builds.
Stateless authentication also means you don't have control to revoke tokens
Stateless authentication is a good alternative when scalability and availability are major requirements for your application.
You don't have a strong dependency, you don't have a new network hop and also you don't have to query a database to retrieve the session state. But there are trade-offs. One of them is to lose the control of the active sessions.
As the expiration time is an information embedded in the token, it means you can't revoke the expiration of active sessions. One strategy to deal with this trade-off, is by defining short expiration times for each token.
It was what we did.
As the security module is responsible to renew the token each request, it shouldn't be a problem for API consumers.
The more information your token has, the larger it will be
Another situation that you must be aware. If you have a bunch of data related to a user session and you include them as a part of the token, the larger in bytes it will be.
One of the downsides is because some proxies and load balancers define a limit for HTTP headers.
We also faced this situation and what we did was define the token as a first-class citizen in the message body of each request.
For HTTP methods like GET that don't have a message body (it has, but is useless), we changed the operation method to be a POST instead.
Don’t use the token value as a query string parameter in GET operations. This is a security violation, once the token could be accessed in access.log files.
This is definitely a decision that compromises the semantic aspect of the API and concepts like idempotency, so choose the best considering your context.