Fast Authorization with DynamoDB

Peter Smith
Jun 14 · 14 min read

This blog post describes a high-performance micro-service we created at Galvanize to meet the authorization needs of our SaaS platform. This new micro-service, code-named “Authy” (not related to the commercial product of the same name), provides platform-wide management of users and groups, while tracking permissions assigned to all the platform’s domain objects. Amazon’s DynamoDB is used to manage the authorization data in our platform, typically allowing authorization decisions to be made within 20–30ms.

In the past, authorization was handled separately by each service, so different parts of our SaaS platform solved the problem in different ways. With our new Authy service, we’ve unified the approach, providing a consistent model for controlling access to resources. This was especially important as we added new micro-services, and didn’t want all of them to solve the same challenges in their own unique way.

The following diagram shows the authorization workflow. First, a user requests some data from our platform, typically asking for an HTML page, or a JSON response to an API call. The backend services each call upon Authy to validate the current user has access to the resource they’re requesting. Based on Authy’s response, the client either approves or rejects the request.

Note that Authy provides Authorization services, but doesn’t handle Authentication. All HTTP requests from the user are accompanied by a JWT token confirming the user’s identity. Authy then determines whether the resource being accessed (e.g. /projects/1234) is available to the user.

The Requirements

To understand how Authy works, let’s start by learning more about our SaaS product. Galvanize operates in the GRC (Governance, Risk management, and Compliance) space, helping companies be fiscally and socially responsible. This includes fighting accounting fraud, and detecting corporate waste.

From a software perspective, our product is similar to an ERP or CRM solution. It has a beautiful user experience via the web interface, with well-defined REST APIs for programmatic access to data. Multiple tenants access the same platform, but can only see their own organization’s data. Finally, each of our customers potentially has hundreds of users, so we have a powerful user-management system with complex user-defined workflows.

Let’s now learn about the GRC domain model, as well as how users and groups are granted permission to access the domain objects (aka “resources”).

Our SaaS product has a large number of Domain Objects, representing items of interest to the customer. The most important top-level domain objects include:

  • Project — Represents a body of work performed by a user. This includes a schedule for undertaking the project work, Risks identified within the project, Controls to help manage the Risks, Issues identified, and Actions to resolve the Issues.
  • Asset — Represents something of interest to the company, such as a third-party vendor, an IT server, or a piece of software.
  • Collection — A collection of data tables, each containing many rows of data. For example, the access log from a security system showing when people entered/exited a building.
  • Toolkit — A template of product content, such as a pre-defined project with typical risks and controls already defined for common scenarios.

Inside each of these top-level domain objects is a large number of fine-grained domain objects. For example, within a Project, a user would add Risks, Controls, and Issues, each with their own ID number.

The other top-level domain objects, such as Assets, Collections, or Toolkits, are also container-like, holding their own fine-grained domain objects.

Our SaaS platform supports multiple tenants (aka Organizations), where each customer’s data is fully isolated from that of other customers, even though they reside in the same database. All database queries include an orgId field to ensure the correct data is returned.

Within each organization there are multiple Users, which for large customers will number in the thousands. Most customers make use of Groups to help manage those users.

Users and groups can both be assigned permissions to access various domain objects in the system. For example, User John may be assigned write access to a specific Project, whereas the Finance Group may only have read access. Note that Administrators are a special class of Users who automatically have access to everything.

When it comes to granting permissions for users or groups, this can only be done on top-level domain objects (such as Projects) which are containers for fine-grained objects, such as Risks or Controls. It is not possible, nor is it desirable, to grant permission on fine-grained objects.

This is enforced for several reasons:

  1. Experience shows that when customers can grant user permissions on every domain object, no matter how small, they end up with a lot of support challenges when permissions become confusing to manage.
  2. We have billions of fine-grained domain objects in the system, which would explode the number of permissions that Authy must manage.
  3. Likewise, the number of calls from a backend service to Authy would be excessive if every domain object must be authorized.

Therefore, we chose to only allow permissions on top-level domain objects, while requiring fine-grained objects to inherit permissions from their parent container. To make this work in a practical way, the permissions provided on the top-level objects are very expressive.

For example, let’s imagine that John has the following permissions, set on a specific Project:

  • CAN_CREATE_RISKS — This allows John to create a new (fine-grained) Risk object, but only in the context of this specific Project.
  • CAN_CREATE_ISSUES — Similar to the previous example, John can also create Issue objects inside this Project.

Note that John can not create other fine-grained objects, such as Controls, or Actions, since he doesn’t have those permissions on the parent Project.

Next, John may also have permissions to read the existing domain objects in the Project:

  • CAN_READ_RISKS — This permits John to read the content of any Risk object in the Project.
  • CAN_READ_ISSUE_IF_OWNER — This allows John to read Issues in the Project, but only if his name appears in the owner field of the Issue. He doesn’t have permission to read any other Issues that he’s not the owner of.

This second example illustrates how permissions become more fine-grained, while still only being set on the Project itself. Once we’ve moved fully over to Authy, we anticipate having hundreds of possible permissions. At a minimum, this includes create, read, update, delete, and list permissions for each of the possible fine-grained objects.

Now that we understand the domain object we’re authorizing access to, let’s learn about queries that Authy must respond to.

The Queries

An important goal for Authy is to respond to queries within 20–30ms, so as to avoid negative impact on the end-user response time. Authy’s algorithms were optimized to reduce the run time of a single query, but also to reduce the number of calls necessary to validate an incoming customer request.

The following are typical queries to Authy, based on the type of information a backend service needs when authorizing an incoming request. We did a comprehensive review of our platform’s authorization requirements to ensure that Authy would cover them all.

The common queries are:

  1. Give me the personal details for a user — This is needed throughout the platform when displaying a user’s name or sending them email.
  2. Give me the list of groups in this organization, and their members — This is used for display purposes only, but is not used for authorization.
  3. Give me the list of groups that this user belongs to — Also used for display purposes only, but not for authorization.
  4. Can this user perform this operation on this resource (aka domain object)? — This is a very common question, accounting for most of the queries to Authy. The user’s ID comes from the JWT of the original HTTP request, and the resource ID comes from the URL of the request (such as /projects/1234). Finally, the necessary permission (for example, CAN_READ_PROJECT) depends on the semantics of the HTTP operation, such as GET versus PUT.
  5. What operations can this user perform on this resource? — This is similar to the previous query, but rather than providing a single yes/no answer, Authy provides the complete set of permissions a user has for that resource (such as [ CAN_READ_PROJECT, CAN_UPDATE_PROJECT, CAN_READ_RISKS, …]. This reduces the need to send multiple requests to inquire about multiple permissions.
  6. Which users in the organization can perform this operation on this resource? — This is not used as much as the previous queries, but is very useful to identify users in the organization with a base set of permissions. For example, to populate a drop-down list of users who could be asked to review an Issue object, we’ll ask Authy for all users who have the CAN_REVIEW_ISSUE permission. There’s no point in assigning a user to review something if they don’t have permission to do so.
  7. Which resources can this user perform this operation on? — This query is very important for listing the objects a user can act upon. For example, when the user requests a list of Projects, they should only see the Projects they have CAN_READ_PROJECT permission for. Other Projects should not be listed.

When designing Authy, we identified several queries that were commonly asked, but ended up being anti-patterns.

  1. Is the user an administrator? — In our platform, administrators have the power to perform any operation on any resource in the system. However, instead of explicitly asking Authy whether the user is an administrator, the query should only ask about the specific permission, such as CAN_DELETE_PROJECT. If Authy responds with “Yes”, then it doesn’t matter whether the user is an administrator, or whether they got the permission some other way. All that matters is that they have the permission.
  2. Does this group have a specific permission for this resource? — Initially this seemed like an important variation on asking whether a user has permission, but in reality, both users and groups get folded into the same query anyway. For example, although the Sales group might have CAN_READ_ISSUE permission, the request to Authy will actually ask about Frank, the logged-in user who is a member of the Sales group. Therefore, Frank may either get the CAN_READ_ISSUE permission by being part of Sales, or by explicitly being assigned that permission on the parent Project. In either case, he has the permission.
  3. Can this user access this Risk? — As mentioned above, a Risk is a fine-grained domain object that resides inside a Project, and Authy doesn’t track permissions on fine-grained objects. To answer this query, the backend service first determines the ID of the Project containing the Risk, then ask whether the CAN_READ_RISK permission is set on the Project.

Now that we understand the domain object, their permissions, and the common authorization queries, it’s now time to dive into the implementation, using the DynamoDB database.

The DynamoDB Solution

When describing how Authy stores permission assignments in DynamoDB, we’ll focus exclusively on these common queries:

  1. What operations can the user perform on a specific resource?
  2. Who are all the users in the organization who can perform this operation on this resource?
  3. Which resources can this user perform this operation on?

Our DynamoDB table has four main attributes of interest, allowing these queries to be answered:

This is the table’s Partition Key, used to separate one organization’s data from another. This is simply an organization number, such as 47.

This field, the DynamoDB Sort Key, identifies the user or group the permission assignment is for.

  • For a user, the value will be u|<user-id>|<resource>.
  • For a group, the value will be g|<group-id>|<resource>.
  • To assign the permission to everybody in the organization: *|<resource>.

Note the <resource> suffix is only required because DynamoDB needs each record to have a unique Partition Key / Sort Key combination, so <resource> makes them unique. The next section explains the format for <resource>.

This field specifies which resource (aka domain object) the permissions are for. We have multiple top-level domain objects, so we combine the resource’s type with the resource’s ID. The format will be <resource-type>|<resource-id> where <resource-type> is a digit identifying the type of domain object that <resource-id> refers to.

  • 0 = The whole organization.
  • 1|<project-id>= A Project domain object.
  • 2|<collection-id> = A Collection domain object.
  • 3|<asset-type-id> = An Asset Type domain object.
  • 4|<toolkit-id> = A Toolkit domain object.

This list can easily be expanded to include new top-level domain objects.

This is the field in which all the permissions are stored. In other words, the user is granted the set of permissions listed in this field. For the sake of efficiency, we encode all possible permissions into an enumeration:

export enum Permission {
CAN_CREATE_PROJECT = 0
CAN_READ_PROJECT = 1,
CAN_UPDATE_PROJECT = 2,
CAN_DELETE_PROJECT = 3,
CAN_CREATE_RISK = 4
CAN_READ_RISK = 5,
CAN_UPDATE_RISK = 6,
CAN_DELETE_RISK = 7,
CAN_READ_RISK_IF_OWNER = 8,
...
}

There could in theory be hundreds or thousands of permissions, with any combination of permissions being valid. We therefore introduce the concept of Permission Set allowing multiple permissions to be specified.

In essence, a Permission Set is implemented as a set of binary bits. For example, if a user is allowed to read and update projects, but not to create or delete them, their permission set will be CAN_READ_PROJECT (bit 1) + CAN_UPDATE_PROJECT (bit 2) = 0b0110, which is 6 in decimal.

To support potentially thousands of permissions in this Permission Set, we use a DynamoDB List field type, with each entry being a 32-bit number. List index 0 contains bits 0–31, with list index 1 containing bits 32–63, and so on. For practical purposes, we only have a few hundred permissions, so this GrantedSet field is typically quite short.

An Example

Now that we know the format of the DynamoDB table, let’s see an example of how it’s queried. To start with, here’s a subset of the permissions we’d expect to see for a small customer:

  1. Everybody in Organization 47 has permission to read all Projects.
  2. Members of the Sales group can update Project 234.
  3. John can create new Projects and delete all existing Projects.
  4. Mary is an administrator for Organization 47, and can do anything.

Here’s the DynamoDB table, with these rules encoded using the attributes we described above. Before ready further, take a minute to study each line of this table, comparing them to our four permission rules.

Now, let’s run some of our common queries to see how the data is accessed. We won’t discuss the full Authy algorithm, but these examples should give you an idea of how it works.

To answer this query, we need to consider the different ways that Frank could be given access to Project 567.

  1. He might be directly assigned permissions to Project 567 — In this example, no he isn’t.
  2. He might be a member of a group that is directly assigned permissions to Project 567 — He’s in Sales, but that group is not explicitly assigned permissions to this Project.
  3. He might be directly assigned permissions on the whole Organization, which implies he has those permissions for all Projects in the Organization — No, not in this example.
  4. Similarly, he might be in a group that has Organization-level permissions — No, not in this case either.
  5. All members of the Organization might be given permissions to Project 567 — No, not in this example.
  6. All members of the Organization might have the permissions for the whole Organization — Yes, all users have CAN_READ_PROJECT set, as shown on Line 1 of the DynamoDB table.

That’s certainly a lot of things to check in order to determine that Frank has CAN_READ_PROJECT permission for Project 567. Let’s see how’d we’d do this in DynamoDB. Clearly we want to minimize the number of queries we perform.

We start by performing three DynamoDB queries in parallel:

  1. We fetch the list of groups that Frank belongs to. We didn’t show the DynamoDB schema for group membership, but it’s fairly straightforward.
  2. In parallel, request all the records related to Project 567, using a DynamoDB Secondary Index. This returns the list of permission assignments, regardless of whether they’re for Frank (starting with u|frank|), groups (starting with g|), or for everybody in the organization (the * wildcard value). We then use DynamoDB’s query filter to discard user records that aren’t for Frank, but we still need to return all the group records because we don’t yet know which groups Franks belongs to — not until Step 1 (above) completes.
  3. In parallel, we request all records for the organization-level resource. This is where we specify permissions that apply to all Projects, not just for a single Project.

Once these three DynamoDB queries have completed, we perform an in-memory filter on the two resource lists (steps 2 and 3 above) by keeping only the records for Frank, or for any of the groups that Frank belongs to. All other records are discarded.

For all records that we didn’t throw out, we perform a bitwise OR operation to merge the individual permission sets into a single permission set, which is then returned from Authy. In this particular example, only Line 1 of our table is relevant to Frank, so the final answer is [ 2 ] (CAN_READ_PROJECT).

Note that we’re assuming here that each resource will have a limited number of rows in the table, otherwise this operation would be very intensive. This is typically true, since our smaller customers have only a few users, and our larger customers tend to place their users into a small number of groups. The merge algorithm will therefore not be too complex.

This is a similar example, but since Jenny is a member of the sales team, Line 2 of our DynamoDB table will also be used when calculating the final permission set.

  • From Line 1 — Jenny gets CAN_READ_PROJECT [2].
  • From Line 2 — Jenny gets CAN_UPDATE_PROJECT [4].

The bitwise OR of [4] and [2] is [6], so Jenny has permission set [CAN_READ_PROJECT, CAN_UPDATE_PROJECT], so yes, she can update this Project.

For John, there are two lines in the DynamoDB table that provide permissions:

  • Line 1 — John gets CAN_READ_PROJECT [2] on all Projects.
  • Line 3 — John gets CAN_CREATE_PROJECT and CAN_DELETE_PROJECT [9] on all Projects.

Therefore, the bitwise OR is [2] and [9] is [11], so John can do [CAN_READ_PROJECT, CAN_CREATE_PROJECT, CAN_DELETE_PROJECT]. If you’re observant, you’ll realize that CAN_CREATE_PROJECT can only be assigned at the organization level, since you can never assign create permissions directly on a resource that doesn’t yet exist.

Yes, Mary is an administrator, and according to line 4 of the DynamoDB table, she has all the permission bits set. However, our algorithm must still perform all the other queries (including finding Mary’s groups). This is because the DynamoDB queries are performed in parallel so are already in progress (or may have completed) by the time we learn that Mary is an administrator.

Summary

So that’s how Authy works! This blog post has covered the high-level requirements, and the solution using DynamoDB. We saw the optimized DynamoDB schema for storing permission assignments, and learned how to query the database to answer permission-related questions about our SaaS platform’s authorization rules.

Although we’ve covered a lot of material, we’ve barely scratched the surface on all that Authy is capable of doing. We’re constantly adding new features, and enabling new ways for our customers to control access to their domain objects.

If you like what you’ve read, and you’d like to learn about Authy, then come join us at Galvanize!

Build Galvanize

A window to the product, design, and engineering teams at…