Stateless authentication for Microservices

As an API owner, you must consider not only the good design of your APIs, but the non-functional aspects as well.

Think about a bank, it could be the easiest one to make a transfer or pay a bill, but if it is not secure and safe you’re not going to use this bank anymore. This same rules applies for API providers.

Safety in this context means that your API (Microservice) is protected from misuse and security means that your data is protected of malicious activities.

One of the ways to bring safety to your API ecosystem is by using Throttling policies. For creating a security platform not only for API providers, but also for API consumers, you must implement an authentication process.

Throttling

Throttling is a process used to control the usage of APIs by consumers during a given period.

Imagine a scenario where the Service A is consuming data of a Service B, with a throttling mechanism in place you as a service owner must define the limits of this relationship. In other words, the number of calls (requests) the Service A could make to the Service B in an interval of a hour, for example.

Giving you a real example: what happen if the Service A start to call infinitely the Service B because a new bug was introduced in the code? Like a service call inside a for-loop without an exit condition.

You never know how often your APIs will be called since it depends on many aspects, from business rules to fallback mechanisms like circuit-breakers.

You can not control what your consumers do, but you can control how your system reacts from that.

Also, with a throttling mechanism, you have data points that could be useful for decision making related to scale up your API.

To throttle your consumers efficiently, you need to know who are them. You don't want to penalize good consumers, you just need to control the possible offenders.

Design your system for failure and considering safety aspects as well.

Resource access restriction

Microservices are autonomous blocks of code that do one thing well.

Consumers do things through operations, that are API's interface with the external world. It's pretty common to have different kinds of operations supported by a single API. For example: some consumers could have permissions to CREATE resource(s), but couldn't DELETE them.

A good case is Amazon's marketplace API to manage orders. In this API, a very specific seller could only manage their orders, not the orders of another seller.

As a API provider, you need to cover this use cases ensuring the security aspects of your APIs and the only way to achieve that is through an authentication and authorization layer between your APIs and their consumers.

Designing an authentication layer

Sidecar proxy

This is a pattern for cloud architectures.

A sidecar proxy, is a proxy running in the same host where your Microservice, or container, is deployed. Basically, a different process of your Microservice process. This proxy has some built-in functionalities such as: authentication/authorization, retry policies, routing, errors handling and circuit-breakers.

This approach is interesting, because your application still a Microservice. All the non-functional aspects are not implemented in the service itself, but in the sidecar proxy.

One of the downsides about this approach, is because you have more components in your architecture to monitoring and deploy. To reduce this complexity, you can use a service mesh solution like Istio and deploy it with Kubernetes.

Sidecar proxy pattern (source: Microsoft)

API Gateway

Other alternative, could be about using an API Gateway.

An API Gateway run in front of your APIs and the advantage is to delegate the authentication and authorization capabilities to this layer making your Microservice only responsible for the business logic.

One of the differences between this approach and the sidecar proxy approach, is because you have just one component, instead of many like in the sidecar proxy approach. It could be easier to deploy and monitoring, but apparently less resilient.

An API Gateway is a single entry point for all of your APIs, naturally it become a potential bottleneck or a single point of failure in your architecture.

This approach is totally independent on your cloud-infrastructure model, so you can use an API Gateway for the on-premise model running solutions such as Kong, or also use a managed API Gateway like AWS API Gateway.

API Gateway pattern (source: Nginx)

Shared library

For some use cases, mostly when you don't need a common authoritative entity for all of your API's, personally I prefer this approach.

For example, when you need to authenticate a subset of APIs related to a specific domain like pricing.

The advantage of this approach, is because you could have the best of both worlds: the authentication/authorization rules at a certain level of isolation without having a single point of failure in your architecture.

The disadvantaged of this approach, is to lose partially the technology agnosticism inherent to an architecture based on Microservices. As it is a shared library, you have to run all of your Microservices using the same language that the library uses, or the same runtime like in JVM-based applications.

Breaking down the shared library approach

The pricing platform is the platform used by the commercial team of B2W to price all the products in their websites. There are several ways (models) to price a product and this led to a lot of business logic.

Microservices are a good alternative for cases like that, because you can split all the business rules in well-defined boundaries through a good API contract definition.

In the Pricing platform, we have around 20 Microservices and all of them implemented with Java. Those Microservices manages many resources demanding different access patterns.

JWT and stateless authentications

JSON Web Tokens (JWT) are an open industry standard method for representing claims securely between two parties.

It's a way to implement stateless authentication, that's when you don't need to persist the state of authenticated users (sessions) in any data store.

Using JWT, the session state is persisted as part of the token and you can also persist other relevant data to not only authenticate, but authorize API consumers as well. That means you don't need to retrieve this data in a database making your service faster and consuming less computational resources.

A JSON Web Token (JWT) looks like this:

Anatomy of a token: encoded and decode (source: jwt.io)

I'm not going to dive deep in the details of JWT, all what you need to know for this post is:

  • A JWT is a JSON string encrypted with a secret key;
  • This string is also encoded using base64;
  • One token represents a user session;
  • It's secure because you encrypt/decrypt the token using a secret key;
  • This token is part of each request (Header) from one API to another, like in Basic Auth.

In the diagram below, the server is one API/Microservice.

Flowchart for creating a new user session with JWT. (Source: Toptal)

Shared library, also known as the security module

This library is loaded by each Microservice and abstracts all the business logic to authenticate and authorize API consumers.

We have using spring-security to implement business rules related to the authorization process. With spring-security, you can write less complex code benefiting from the use of annotations and POJOs.

The only requirement of the security module is to intercept all HTTP requests. We're using interceptors since we are using spring boot to build our services, but you can also use Servlet Filters for that.

Intercepting an API request made by a consumer

When a new request arrives in a Microservice, the typical flow in the security module is:

  1. intercept the request;
  2. get the token value (string) through the HTTP header X-PRC-TOKEN;
  3. decode the base64 string;
  4. decrypt the JWT token resulting in a JSON string with all the consumer session data;
  5. applies the authorization rules using the data represented by the JSON string.

Renewing the validity of the token

The security module is also responsible for token renewing.

One of the information embedded in a token is it expiration time. If a consumer is constantly interacting with the system through API calls, the security module renew the validity of the token automatically.

The process related to renew a token is made in an async manner by another thread. As it is a best effort process, the security module doesn't need to wait for the response of this operation for a long time.

If the thread for renewing the token expiration is executed until the execution of the main thread, the new token is returned in the response headers.

We have an agreement with our API consumers to use the returned token for the following calls.

A new token is generated, because the validity is one of their attributes.

With this feature, we offloading calls to the renew operation drastically reducing chattiness in our system.

Lessons learned

Protect your domain layer

In the first versions of this module, we just protected the domain layer by intercepting the Controller methods. In case of a scheduling task (worker), the task could have access to the domain layer by injecting Service classes.

We made a change in our application model to avoid situations like that by introducing a proxy layer.

This new layer is the only one that could be injected by Controllers and Tasks and the only with access to the Domain layer as well. All the interceptors are now intercepting the methods of this layer.

Application layers: business rules protected by the proxy pattern

Don't include your JWT secret key in the library code

Other important consideration, is about the secret key used by the security module to encrypt/decrypt a JWT token.

This secret key must not be a part of the library code, it must be a dynamic variable and overwritten by using an environment variable or a.env file.

You could store the value in a data vault solution and replace the value during continuous integration builds.

Stateless authentication also means you don't have control to revoke tokens

Stateless authentication is a good alternative when scalability and availability are major requirements for your application.

You don't have a strong dependency, you don't have a new network hop and also you don't have to query a database to retrieve the session state. But there are trade-offs. One of them, is to lose the control of the active sessions.

As the the expiration time is an information embedded in the token, it means you can't revoke the expiration of active sessions. One strategy to deal with this trade-off, is by defining short expiration times for each token.

It was what we did.

As the security module is responsible to renew the token each request, it shouldn't be a problem for API consumers.

The more information your token has, the larger it will be

Another situation that you must be aware. If you have a bunch of data related to a user session and you include them as a part of the token, the larger in bytes it will be.

One of the downsides is because some proxies and load balancers define a limit for HTTP headers.

We also faced this situation and what we did was define the token as a first-class citizen in the message body of each request.

For HTTP methods like GET that doesn’t have a message body (it has, but is useless), we changed the operation method to be a POST instead.

Don’t use the token value as query string parameter in GET operations. This is a security violation, once the token could be accessed in access.log files.

This is definitely a decision that compromises the semantic aspect of the API and concepts like idempotency, so choose the best considering your context.