User authorization in less than 10 milliseconds

Published in

Building Carta

10 min readSep 3, 2021

The inner workings of Carta’s authorization system

This is part two of a series of blog posts about Carta’s scalable, generic, and fast authorization system, AuthZ, based on Google’s Zanzibar. In part one, I explained why we built AuthZ and discussed some of its core features.

This time, I’m taking you on a technical deep dive using a practical example. You’ll learn how the index algorithm works and how its architecture lets the system scale.

Let’s build a product!

AuthZ uses RelationTuples to create a permissions graph. The permissions graph contains all permissions data in AuthZ. Consumers combine RelationTuples using a domain-specific language and query the graph to check system access.

**RelationTuples create direct access between actors and objects. RelationTuples also automatically grant additional access through implicit connections (indirect paths).**

Carta uses a custom index to make fast queries on the permissions graph. An application creates RelationTuples using domain-specific language, building in AuthZ what we call a permissions model. To understand how the index works, let’s build a permissions model for a simple product: a grade management tool for a school district.

The school has a few different types of users. Students view their own grades. Parents can view the grades of their children. Teachers can view and edit all grades for their classes.

Let’s create a permission model for teachers since they have the most complex use case.

**Teacher of Class:A can edit Grade:X and Grade:Y**

Our hypothetical permission model has three different types of entities: employees, classes, and grades. Employee 1 is the teacher of Class A. Teachers of Class A have edit access to Grade X and Grade Y.

Note: If we want, we can assign many teachers to Class A. We could also make Employee 1 a teacher of many classes.

There are two permissions that make the model slightly more complicated. Grade editors must be able to view the grades they are editing, so editors of Grade X can also view Grade X and editors of Grade Y can also view Grade Y.

Teachers of Class A have direct connections (paths) to edit Grade X and Grade Y. Teachers of Class A also have indirect paths to view Grade X and Grade Y. These extra permissions will help explain how the index works later in the article.

The grade management tool checks a user’s edit access to a grade by calling AuthZ to query the permissions graph with the following arguments:

AuthZ.Check(“Employee:1”, “Edit”, “Grade:X”)

**The *indirect path* from Employee 1 to Grade X Edit in the graph makes the permissions check return *true*.**

The custom index

To help us build the school’s permission model, let’s dig into how AuthZ manages RelationTuple data.

AuthZ stores consumer-created RelationTuples in a datastore. RelationTuples are the source of truth for edges in the permissions graph. They are very easy to update, but slow to query. Queries become increasingly expensive when you have more and more connections, since you have to traverse the edges in the graph. At worst, you might need to traverse all the edges of the graph.

A custom index overcomes this problem. It works by precomputing subpaths and storing this smaller dataset for later lookups. Set operations efficiently combine the results of the subpaths to determine if there is a connection between any two nodes in the graph.

AuthZ currently maintains two sets of subpaths in Carta’s custom index, called Actor Index Set and Object Index Set (Google’s Zanzibar paper calls them Group2Group and Member2Group).

Actor Index Set: This set defines the direct connections from actor leaf nodes. Actor leaf nodes are nodes in the graph with no incoming edges. Each element in the Actor Index Set is a tuple consisting of a subject and an entity:

Subject: the actor leaf node
Entity: a node directly connected to the subject

**Nodes contained within the Actor Index Set**

Object Index Set: This set defines the direct and indirect connections to each object node. Each element in the Object Index Set is, like the Actor Index Set, a tuple consisting of a subject and an entity, but containing different data:

Subject: the object node
Entity: a node directly or indirectly connected to the subject (excluding actor leaf nodes)

**Nodes contained within the Object Index Set**

To efficiently store object nodes in the index, AuthZ concatenates the object type and incoming relation into a single string called an Object-Relation Pair. Object type and relation are separated by a “#” delimiter so they’re easier to parse, for example Class:A#Teacher.

Let’s go back to our theoretical school permission model. Here’s a representation of its custom index:

Actor Index Set and Object Index Set for our permission model. Note that Class:A#Teacher is not a subject in the Object Index Set because it only has one direct connection from Employee:1, which is an actor leaf node.

Here’s where we’re at

We’ve designed a permissions system the school can use to authorize teacher access to grades. Our permission model has two abstractions that make it easier to manage.

One abstraction enables us to assign multiple teachers to the same class. This feature would be useful if we had student teachers or co-teachers.

Our other abstraction composes grade edit permissions of grade view permissions. All grade editors must have access to view the grades they are editing. This feature reduces the amount of connections we need to give a teacher access to a class, thus making it easier to add grades to the class.

AuthZ has built a representation of the permissions graph in the custom index. The custom index contains two permission path sets which AuthZ queries for access checks.

Putting it all together

Let’s check access through the permissions graph. We’re going to determine if an employee has access to view a grade.

AuthZ.Check(“Employee:1”, “View”, “Grade:X”)

Remember, view grade permissions are given through indirect access with the class teacher abstraction, meaning there is no RelationTuple which explicitly defines this connection. Therefore, we must use the index to perform a quick permissions check.

AuthZ makes two queries on the index tables to answer the permission check:

AuthZ queries the subject of the Actor Index Set using the Actor from the permission check arguments. This returns a Set.
AuthZ queries the subject of the Object Index Set using the Object-Relation Pair from the permission check arguments. This returns a Set.

**The highlighted rows include the two sets of index entries that are returned from the index for our AuthZ.Check(“Employee:1”, “View”, “Grade:X”) permissions check.**

AuthZ finds the intersection between entities in each set. If an intersection exists, there is a path from the Actor to the Object.

actor_entity_set = {“Class:A#Teacher”}object_entity_set = {“Class:A#Teacher”, “Grade:X#Edit”}intersection_set = actor_entity_set.intersection(object_entity_set)assert len(intersection_set) > 0  # There is a connection!

In this case, the intersection set is non-empty, so the check returns true!

We store index sets in our database by subject key. If a permission exists in the index, a query always matches a key and only requires two separate lookups. Including network latency, the median permission check takes about 7 milliseconds. At the 95th percentile, we check permissions in 12 milliseconds.

Concurrent workers

AuthZ traverses the graph to create index sets. Since graph traversal is slow, so are index build times.

Several concurrent workers build the custom index asynchronously to keep API calls fast for our consumers. This means that index updates are not applied immediately. They take a few hundred milliseconds to propagate through the system.

**AuthZ updates the index with offline workers**

You can think of AuthZ’s index like Git commit history. Git applies changes (commits) in-order by commit timestamp to avoid conflicts in a repository. AuthZ applies changes (added or deleted RelationTuples) in order by commit timestamp to avoid conflicts in the index.

AuthZ stores a list of all the RelationTuples applied by our consumers and their commit timestamps. The commit history is append-only so AuthZ differentiates added and deleted RelationTuples with a “deleted” column. AuthZ assigns each worker a commit timestamp that corresponds to a RelationTuple change. Workers traverse the graph by querying the RelationTuple commit history. Workers filter out changes after the commit timestamp that AuthZ assigned them to prevent future changes from altering an older version on the index.

In this example, Workers A, B, and C process timestamp t2, t3, and t4 respectively. They query the RelationTuple commit history only up to their assigned timestamp. This prevents newer updates from changing a worker’s computed path results.

AuthZ stores index build results in intermediate storage when workers finish processing. AuthZ associates all sets of index build results with their respective commit timestamps. Index builds might generate hundreds of index set entries, so there are usually several entries in intermediate storage associated with the same commit timestamp.

AuthZ merges build results from intermediate storage into the index when all commits before a timestamp have finished processing.

The importance of timestamps

I’ll demonstrate how AuthZ’s async workers build the custom index by using the grade management permissions model. Let’s say our school administrator has already added two permissions and AuthZ has merged those two RelationTuples into the index with timestamps t0 and t1.

**After processing the initial Actor and Object index sets, AuthZ then has two entries in its commit history, with timestamps t0 and t1.**

Our school administrator now adds three more permissions. AuthZ initiates a bulk update to add those three RelationTuples, giving them timestamps t2, t3, and t4.

**The school administrator adds RelationTuples at t2, t3, and t4. Their changes are appended to the RelationTuple commit history.**

AuthZ spawns three workers to build index changes for each commit. Let’s suppose the worker for the RelationTuple committed at t3 finishes processing first. The worker merges its set entries into intermediate storage since a prior relation tuple (committed at t2) has not finished yet.

The table below shows what our commit history table looks like right now, with color coding to show the status of each row. The two green rows are our original RelationTuples. The worker handling the t3 RelationTuple just finished and its row is highlighted in orange. Our two unfinished RelationTuples with timestamps t2 and t4 are still being worked on by their respective workers, and I’ve left them white.

**Worker for the commit at t3 finishes and merges changes into intermediate storage**

Let’s say that the worker for the RelationTuple committed at t4 finishes next. The worker merges its set entries into intermediate storage since a prior RelationTuple (committed at t2) has not finished yet.

Here’s what our table looks like now. The RelationTuple with timestamp t3 is now green and the RelationTuple at t4 is now waiting for t2 to finish.

**Worker for the commit at t4 finishes and merges changes into intermediate storage**

Finally, the worker for the RelationTuple committed at t2 finishes. All commits through t4 have finished, so AuthZ merges intermediate build results with timestamps less than or equal to t4 into the index.

**Worker for the commit at t2 finishes. AuthZ then merges intermediate results into the index.**

Now that all workers have finished processing, the index reflects changes up until t4 and our school’s teachers have the permissions they need to grade their students.

Maintaining security with tickets

Tickets are opaque tokens that ensure authorization checks made on AuthZ are at least as fresh as the change associated with the ticket. It is important for consumers to call AuthZ with tickets. Stale permissions could cause system vulnerabilities.

During an update, AuthZ issues a Ticket (synonymous with a Zookie in Zanzibar) to the consumer. A Ticket is an opaque token which represents the time at which the change was committed to AuthZ.

A consumer calls AuthZ with a ticket to ensure that their change was applied to the index. If not yet applied, AuthZ will reject the request and the consumer can try again later.

AuthZ maintains a record of unprocessed RelationTuples. AuthZ rejects requests when there are RelationTuples with commit timestamps less than or equal to the ticket provided by the consumer.

Changelog

Commits for deleted RelationTuples can accumulate over time, so AuthZ periodically removes them.

AuthZ maintains the full changelog of commit events in secondary storage. Lookups are slow on the changelog, but they hold a record of all the permissions at any given time. The changelog gives us two significant benefits: user auditing and index health checks.

User Auditing — AuthZ orders the changelog by commit timestamp. AuthZ can determine which permissions consumers granted to a user at a specific datetime. To build the graph, AuthZ traverses the changelog from a user actor and discards any commits before the selected datetime.

Index Health Checks — AuthZ occasionally runs a full rebuild of the index to check the accuracy of the index system. Rebuilding a full index also lets us smoke test new features in our pre-production environments.

Lessons learned

AuthZ’s index system has helped Carta build better products. Prior to AuthZ, developers managed permissions using a legacy system that often forced painful tradeoffs. For example, permission checks could be fast but they lacked fine-grained access controls. Enabling fine-grained permissions brought slowness and made maintenance difficult.

AuthZ’s index system has given us the best of both worlds. Access checks using the index are fast and consistent. Yet, the index doesn’t limit the number of fine-grained controls that developers want to build into their models.

The result: our developers don’t need to make a tradeoff. At Carta, we build permission models that work for our products.

Our next article will show you how internal users interact with AuthZ through a web application we call Concord. Concord uses AuthZ’s API to make it easy for consumers to model and visualize permissions. Stay tuned to our engineering blog to read more about Concord and other great things we’re working on.

Thanks for reading! Leave a comment if you have questions about AuthZ’s index system or architecture. And if you’d like to help us build the next version of Carta, we’re hiring.