Ensuring Only Grace Can View Her Files: An Intro to Access Control Frameworks

Joy Ebertz
Box Tech Blog
Published in
8 min readSep 11, 2017

At Box, one of our major points of differentiation is our ability to do custom and fine grained permissions. By permissions, I mean access control or does the currently logged in user have permission to do <action> to <object>? Does Grace have permission to upload a file to her Pictures folder? Does Ada have permission to edit the file that Grace just uploaded? Does the administrator of Grace’s enterprise have permission to delete that file? And so on. Our existing permissions framework is a homegrown framework that grew up along with Box. It started out fairly simply, but with each new setting and feature it has grown more and more complex.

We recently started a major effort at Box to split our larger monolithic web application into smaller pieces. As one of the first steps of that, we needed our permissions to be available in our new micro-services architecture (since you can do basically nothing in Box without checking permissions). Given that we were building a new permissions service anyway, we decided to evaluate what solutions are out there and ultimately decide if it made sense to leverage one of those or to write our own.

Before looking at individual solutions, we started by evaluating some of the existing frameworks and standards for access control. While there are many types of access control including many homegrown solutions, there are three main types that cover most things: ACLs (Access Control Lists), RBAC (Role Based Access Control) and ABAC (Attribute Based Access Control).

Access Control Lists (ACLs)

ACLs are probably the most common access control mechanism. Virtually every file system uses ACLs and since Box is sort of like a file system in the cloud, it’s easy to think that maybe this makes the most sense for us too. ACLs determine permissions by having a pre-written list of who gets what permission per object. So for file1, I’ll have a list somewhere that says something like ‘{Grace: read, write; Ada: read;}’ where that list will have an entry for every user (or perhaps every user that has anything other than default permissions). This makes a ton of sense for most filesystems where there is often at most a handful of users who are assigned a distinct and small set of permissions. ACLs have a very fast lookup time — assuming the lists are stored in something like a hash table, the lookup should be constant time.

This all seems great, but unfortunately, there are two main problems for Box’s use-case. First, for a given folder, we may have hundreds, if not thousands of users who have access to it — just think about an HR folder for a company like Proctor & Gamble. On top of that, we may easily have thousands if not millions of files for a given account. The write space required to store all of these lists could be enormous. We could maybe do something clever where we don’t always store an entry for each file or re-use entries where things are the same, but this vastly increases complexity and may or may not actually save us much.

The other big problem is updating permissions. Now if those permissions were only set by defaults or by someone updating the permissions on a specific file, this likely wouldn’t be a problem. But now consider if I turned on a sharing restriction setting for my company. We would need to go through and update the permissions for possibly every single item owned by my company. Similarly, if I collaborate a large folder with a group, we would need to update the permissions for every single item in that folder for every single user in that group. The time needed to do some fairly simple things could easily become astronomical with the real possibility of not completing — at least not quickly. In the case of permissions, this is really not acceptable.

Role Based Access Control (RBAC)

Now that we know ACLs are unlikely to work well, lets look at the next most commonly used system, RBAC. At a high level, RBAC allows you to create roles and to assign sets of permissions to a given role and then to assign users to one or more roles. For example, you might have a ‘works in HQ role’ or an ‘Engineering role’. If Ada is an engineer working in HQ, she would be assigned both of these roles. The roles would then be associated with permissions, so they might be something like anyone who works in HQ should have access to the HQ floor plans online or all engineers should have access to Jira. The great thing about RBAC is that it’s very fast to update permissions. For example, if we realize that we accidentally gave engineers access to Salesforce, we can quickly revoke that access by updating a single record. Similarly, if Grace moves from the marketing department to engineering, we can make sure she has access to all of the right things by simply removing her marketing group and adding engineering. Additionally, lookup time is very fast for RBAC, we can just check if any of the user’s roles has access to the permission we’re looking for.

However, the big problem with RBAC is the danger of role explosion. Lookup time is going to suffer if a user has thousands of roles, not to mention the complexity of managing that many roles. Additionally, we’ve lost all of the benefits of RBAC if we’re essentially storing a separate role for each permutation. Unfortunately, although we tried, we struggled to find a way to use RBAC for Box without role explosion. The real problem with our use-case is that we have a very large number of objects that can all have differing levels of permissions across differing sets of users for each item. This makes it difficult to have a clearly defined set of roles that maps cleanly to either sets of permissions or to sets of objects.

Lets say we’re trying to decide if Grace can edit file1. One option is that we could have one role per permission type for Grace. So we’d have one role for all of the files Grace can view and another for all the files she can edit and so on. In this case, we’d look up the edit role, then see if that role contains file1. The look up time is probably going to be okay, but we would need to have one role per user per permission type that we have. Additionally, if Ada were to suddenly un-collaborate Grace from a folder or Grace’s Admin were to turn on a setting that restricted something, we could easily find ourselves in a position where we would need to update the group membership of hundreds if not thousands of items.

Lets say that instead we do one role per file, so we have a role for all users who can view file1 and another for all users who can edit file1. This has largely the same problems as the other one. We now have one role per file per permission type (which is actually going to be more since most users have many files). Similarly, if we now remove a user from a group collaboration or if we un-collaborate a group from an item, we might need to remove a large number of users or remove a user from a large number of files all at once.

To solve the extremely large number of groups problem, we could do something more clever where we combine overlapping sets of permissions — so if we see two users have membership to the exact same groups, we combine them. In this case, we might have a role that is for anyone who can both view and edit file1 or one for users who can view files 1–5. However, this likely wouldn’t materially lessen the number of groups and it would vastly increase the complexity since we would have to do some fancy group manipulation if we were to do something like remove view access to file4 from one user.

To be honest, we likely could make some version of RBAC work, assuming we were okay with a huge number of groups. However, we wouldn’t be using RBAC in a way that allowed its advantages to shine. Additionally, the big stickler for us is that we had a requirement that if a permission is updated for a user, it should have almost immediate effect. If I remove Ada’s access to a file, that should happen immediately, not an hour from now or 10 minutes from now or even 2 minutes from now. This just wasn’t feasible with the large number of operations that might need to happen for some use-cases.

Attribute Based Access Control (ABAC)

The other main framework we investigated was ABAC. ABAC stores a bunch of policies such that when the request comes in, it tries to match a policy. The policy may contain a number of pieces of information that it tries to fill in from the request. For any information that it’s still missing, it will go out and fetch that information using any of the information that it does have. So in this case, when we ask if Grace can edit file1, it would try to match the request against policies it has. There might be a policy governing editing files, so it would match that policy. Then it would try to fill in any missing fields in the rest of the policy, so the policy might say something like approve the request if the requesting user is the owner of the file or if the user has a collaboration with the file. In this case, we know that Grace is the requesting user and we know that the file is file1, so we can call a separate service to find out who the owner of file1 is. If the owner is Grace, we can stop processing. If not, we would call another service to see if Grace has a collaboration with file1.

ABAC has two main advantages. The first is that it allows for very complex policies with a number of different pieces of information being taken into account without much complexity in the system itself. This information could include anything and could be fetched from basically anywhere — maybe we only allow access if the user is currently on a company owned IP or if the outside air temperature is below 85 degrees. The second is that because it calculates the permission on the fly, if any piece of this information changes, the result for the policy will change immediately. This means that if Grace’s admin added a setting preventing deletion, she will immediately be prevented from deleting anything; we don’t have to worry about propagation time.

The primary downsides to ABAC include the fact that it’s much less widely used than the previous two systems, so it has less support and the fact that lookup time for a permission is slower. We’re fully calculating a permission on each call and we’re operating in a distributed architecture where we need to call separate services to get much of the missing information, so lookup time is potentially much slower than the other options.

We decided that the downsides to ABAC were worth the tradeoff given our use-cases and priorities, so we ended up picking an ABAC approach. In a future post, I’ll go into more detail about the architecture of an ABAC system and what we evaluated before finally picking a specific solution. Like many things in engineering, there is no single optimal solution. Instead, different solutions are better for different problem spaces and only by evaluating both the options and our problem space were we able to pick the solution that best fit our needs.

--

--

Joy Ebertz
Box Tech Blog

Principal Software Engineer & ultra runner @SplitSoftware