Authorization Solutions for Microservices Architecture

Olga Kogan
AppsFlyer Engineering
7 min readSep 1, 2021

--

Data security and privacy is one of the main concerns for any company today, in particular for companies such as AppsFlyer which process huge amounts of data on a daily basis and serve hundreds of thousands of customers and partners.

AppsFlyer provides advanced analytics solutions and data insights via SaaS application and public APIs. All data accessible in the application UI or via the APIs must go through the authorization process before being shared externally. Authorization decisions are driven by multiple rules which take into consideration privacy regulations and policies, partners data sharing policies and customers data consents, on top of the usual user action permissions.

There are a few widespread approaches for handling authorization in microservices architecture. One approach is to make the authorization decisions in an API Gateway since all the incoming requests usually go through it before being routed to other product-specific microservices. One possible downside of that approach is that it typically allows for authorization decisions based only on the REST endpoint level. If the same endpoint serves different flows then more intimate knowledge is required of the payload and the parameters of the endpoint creating code dependencies between API Gateway and all the microservices.

Another approach that has been used at AppsFlyer is having a centralized service that can provide up to date permissions settings to each microservice on demand. Each microservice can then authorize its requests by coding its own authorization decision logic based on acquired permissions definition. However this approach also has a downside:
AppsFlyer deploys a microservices architecture with hundreds of daily deployments, constantly evolving the existing microservices and adding new services in order to provide the most innovative products to our customers and partners.

The challenge in this setup is making consistent and correct authorization decisions when hundreds of microservices consume and share the data while the codebase of every such microservice is modified daily to include new functionality. On top of that, as data privacy regulations evolve and new regulations appear, authorization logic needs to adapt from time to time in order to comply with new regulations. When you have hundreds of microservices that must modify the authorization logic, each change becomes very expensive and takes lots of time from all the development teams. It is also very difficult to verify that all the services make correct and consistent authorization decisions when the business logic of the authorization decision is coded separately in every microservice.

An old saying comes to mind: “When the wrong man uses the right means, the right means work in the wrong way.” Yes, all microservices need to authorize the requests before sharing the data but putting the authorization logic in each and every microservice is a perfect example of using the right means in the wrong way.

Before we dive into our new authorization solution, let’s talk about the policy-based access control (PBAC) which can be viewed as an evolution of the attribute-based access control (ABAC). PBAC is a method of managing resource access and authorization decisions using policies that rely on different pieces of the information such as resources attributes but also the environment conditions and possibly set of circumstances at the time of the authorization request. This provides more accurate context and more flexibility when making the authorization decision compared to ABAC. Because PBAC is a method and not a protocol or a language, any format can be used for managing the policies.
Open Policy Agent (OPA) is a unified framework for managing policies and has graduated from the Cloud Native Computing Foundation (CNCF) after meeting the foundation’s criteria for community growth and project adoption. It provides a very flexible scripting language, Rego, for defining the policies and general purpose rules engine for evaluating those policies. The evaluation engine uses both policies and data (aka facts) to evaluate the provided policies.

OPA can be used to manage different kinds of policies that can be related to the operations of cloud services, management of CI\CD pipelines, services routing and various other use cases. When using OPA as an authorization decision engine, the Data contains the pre-loaded permission definitions and the Policy contains the business logic that uses the Data and additional pieces of information controlled by the policy rules.

Now we are ready to discuss our new solution that controls the authorization logic in one place and uses flexible policy-based rules to provide an authorization decision for every action that is performed in all AppsFlyer products.
When thinking about the authorization process, it is important to distinguish between the authorization decision of what can or cannot be done (speaking the policy language, “what is allowed”) and the enforcement of the decision that may include restricting the access to certain data, masking or transforming the data or any other data manipulation that is required to prepare the data before sharing it with the requester.

This simple separation of concerns allows the microservices to delegate the authorization decision to the decision engine by querying whether a specific identity is allowed to perform an action and if so, what are the data restrictions that must be applied. Then the microservice’s responsibility is to enforce the authorization decision by preparing the data and sharing only what is allowed.

Our authorization solution implementation is split into Permission Management that is responsible for the permissions configuration and the Decision Engine that makes the authorization decision at runtime. We used Open Policy Agent (OPA) as a rule engine for all our authorization policies.

The split between the Permission Management and the Decision Engine follows the pattern of the separation of the responsibilities of Control Plane and Data Plane. The Permissions Management is the heart of the solution and contains the definition of all the permissions and data consents. The Decision Engine is the Data Plane that actually processes the authorization requests according to the defined permissions and policies.

The permissions configuration is managed in a Permission Management repository (the control plane) and then exported and loaded to OPA (the data plane) for quick and efficient runtime authorization decisions which use flexible policies defined with OPA’s scripting language Rego.

Since the permissions configuration may be modified by a user at any time and the change has to be applied immediately, we manage the permissions configuration separately per each customer and partner account and update it dynamically in OPA when a change occurs and new authorization rules have to be applied.

Note also that the permissions configuration is tightly connected to the resources or entities that are managed by your application. When a new entity is created, the existing authorization policies should be smart enough to apply on the newly created entity as well. We manage our entities in a graph database Neo4j and chose to store the linked permissions definition as part of the extended graph stored in the same repository. This is very helpful when managing the dependencies between the lifecycle of the entities and the permission settings on those entities.

So, when the permissions change as a result of user configuration changes or entities lifecycle changes, the new permissions definition is exported immediately and is propagated to all the nodes of the decision engine cluster. Currently we deploy the Decision Engine cluster as a centralized service that all other microservices talk to. The other option that we have considered is deploying it as a sidecar for each microservice. We chose to deploy the centralized service solution because it supports our capacity requirements without the overhead of managing the sidecar and propagating the policies and permissions data to all the sidecars. But we keep our options open considering constant rapid growth of AppsFlyer customers and partners base and may re-think this in the future.

One notable challenge of this approach is that having a centralized service poses memory consumption scale challenges. OPA is an in-memory rule engine and since we have hundreds of thousands of customers, loading all of the permissions settings in memory in a single service can become expensive and inefficient. Since OPA is a decision engine that has to make quick decisions for runtime authorization requests, we do not actually need to load all customers permissions data; we instead lazily load just the ones that are active at any given time. We built a lazy loading mechanism for loading on-demand permissions data and caching it in OPA with defined TTL. Eviction from OPA is based on LRU, keeping only the active accounts in-memory.

One of the biggest engineering challenges lies not only in building a new service but also providing smooth adoption of the new service especially when the service is used by almost all other microservices. The new service is not just a replacement of APIs but a change in paradigm — instead of retrieving the permissions definitions and making the decision, delegate the authorization decision to a new service.

We are still in the midst of gradual adoption but we already are seeing the benefits of that approach as the process is accelerated when new authorization requirements are introduced and the owners of the microservices need to choose between the continued tedious and error-prone implementation of the authorization logic and investing once in delegating the authorization decision.

--

--