Frontend Engineer’s Checklist for Implementing RBAC in an Enterprise SaaS Application

Matt Dylan G
Kustomer Engineering
14 min readNov 17, 2020

If you have worked on a frontend web application, it is likely that you have had to restrict a user’s access to specific routes and components based on their given permissions. This concept is more ubiquitously known as Role-Based Access Control (RBAC). There are plenty of great resources available that will guide your team through designing and implementing RBAC functionality in your web application.

Perhaps you already have checks like the one below to differentiate between a user and administrator, restricting access to some core administration pages. In this article, we’ll address the more complicated frontend requirements that come with the additional flexibility we were building for our customers.


if (user.role === 'administrator') {
// render top secret admin only settings
}
return null; // or return an alternative UI for users without access

In our early implementation of RBAC for Kustomer, we provided a list of predefined Role Groups that administrators could pick from to assign to their users. We had an Administrator role which pretty much had access to everything — name a feature, and admins would have access to it. The role with the least privileged access that was commonly assigned to agents handling customer inquiries was the User role. Users could read and update all of the data associated with a Customer profile in our app and send outbound emails. In between Administrator and User, we had other tiers of predefined roles. For example, the Content Administrator role group had access to editing the Knowledge Base and interacting with customers, but did not have all the access that Administrators have.

Our initial implementation of RBAC satisfied many of the use cases for our early adopting customers. However, as we onboarded larger enterprise businesses onto our platform, they rightfully demanded more flexibility for the access that individual users should have to the platform. Some initial requirements that we gathered included:

  • Ensure backwards compatibility with our existing roles system.
  • Flexible control of permissions by resource. An admin probably does not want the content writers for their company’s Knowledge Base articles to be able to reply to customers.
  • Granularity of actions on a resource. I may want to grant a user permissions to create and edit Knowledge Base articles, but restrict article deletion.
  • Allow admins to either assign a set of permissions to a user or a team of many users. If a support team has 1000+ agents, then it’s not scalable to manage permissions on a per user basis.

These requirements pushed us to strike a nimble balance between usability, customization and securing trust in our platform. We embarked on this project, known internally as Object Level Permissions, that spanned six months from planning to GA release. If your engineering team is looking to level up your RBAC implementation, then perhaps you have similar requirements. The frontend work for this project could be split into three categories and we’ll go into the tools we used to accomplish these tasks:

  • Admin experience. For creating, updating and assigning Custom Role Groups.
  • End user experience. How does the UI react to the user’s assigned Role Groups?
  • Developer experience. With only a small subset of our engineering team working on this project, adoption of our new RBAC patterns by the broader engineering org at Kustomer was essential for this project’s success.

The Admin Experience

Defining our new Role Groups model and ensuring backwards compatibility

As mentioned earlier, our existing system has default Role Groups such as Administrator, Content Administrator and User. Let’s take a look at how our backend API returned the Role Groups.

The array of roles are the key here. For our existing RBAC implementation, before allowing a user to open a page or click a button that triggers an API request, we determine if the user’s assigned Role Group contains one of the required role strings. Although the early prototypes of our Role Groups revamp included a new schema, we realized that since both our API and frontend heavily used the existing convention for all resources, it ended up making more sense to build upon what we already had.

For the new Custom Role Groups page we were developing, if an admin were to create a custom set of permissions, it would not create roles with the old org.(admin|user).${resource} pattern. Instead, it would only grant roles with our new convention:
org.permission.${resource}.(create|read|update|delete).

In the example of managing the Knowledge Base, in addition to checking for the old org.admin.kb role, we would also allow users with the new org.permission.kb.update to edit articles. We started defining constants that would define the required access for our web application to interact with our API:

const API_KB_READ = ['org.permission.kb.read', 'org.admin.kb.read'];const API_KB_CREATE = ['org.permission.kb.create', 'org.admin.kb'];const API_KB_UPDATE = ['org.permission.kb.update', 'org.admin.kb'];const API_KB_DELETE = ['org.permission.kb.delete', 'org.admin.kb'];

You don’t necessarily need to follow the same convention. The array of roles strings worked best for us because our system already used it heavily and wanted to opt for a more lazy migration rather than a full on database migration. The main takeaway is to choose the path that works well with your existing roles structure if possible.

Simplifying Setup

We mentioned that flexible and granular control of permissions was a requirement, but without a deep knowledge of how different UI components map to our API, we shouldn’t expect our customers to just create new custom Role Groups from scratch. To help steer our customers in the right direction and minimize frustration, we used a few different methods:

  • Best practice templates

Clicking on the best practice template would pre-generate a form filled out with configurations similar to our default User Role Group, with a few exceptions. We simply stored the data object representing the template Role Group in a static constant that would load as form state if selected. This approach enabled us to easily add role scopes to the template or add new templates in the future.

The blank template was entirely empty. It is indeed possible to create and assign a blank Role Group to Users. We’ll cover what happens in that scenario in the end-user experience section.

  • Duplicating system roles as a base

Similar to the Best Practice Template, users can duplicate one of our existing Role Groups and use it as a base for new custom roles.

  • Hierarchy prompt

Although resources can be accessed via our API individually, there were plenty of pages in the UI that required access to multiple resources. The most important use case was the Customer Timeline that agents used to view a Customer’s Conversations and Custom Objects. Instead of making our users figure out that they need to enable these individually, we give admins the option to have roles trickle down the hierarchy. For more advanced users, we give them the opportunity to opt out of this behavior.

  • Coupling multiple permissions into one setting

Our Conversation feature renders a few additional resources including its messages, notes and file attachments. Instead of allowing users to configure these individual resources, we decided to have the Conversation roles settings grant access to these resources too. If we decide to split these out in the future, then it’s only a matter of adding new checkboxes to the UI that are mapped to the roles.

Managing Settings Form State

Our Edit Permission Sets page was going to be used to manage permissions for dozens of different resources, so we had to make sure users would not get bogged down by all the options and help them keep track of their changes.

We were definitely going to break this settings page out into multiple components. In the past, we’ve sometimes opted to use Redux to manage the form state for our settings pages. Since the state was mostly being used for one family tree of components instead of multiple trees, we could have used local state. Using Redux also helped us keep the component’s code more concise. The downside to using Redux was that we had to write the additional files for actions, reducers and selectors, and their unit tests, which slowed down our development velocity. For this project, we decided to try something new at the time, which was React Hooks. Hooks helped us manage form state using only component state while keeping our component files more concise.

The parent component PermissionSetsForms.tsx acted as the brains of this operation. It handles loading the initial state, the logic for updating the state after the admin modified the access for a resource in one of the modals, and makes the POST/PUT requests to create/update the Role Group. A very simplified implementation could look like this:

The benefits of using hooks and only component state were that it was faster to prototype the UI and add on new components, for additional permission set management features we added later on such as search filtering in the menu and version control to name a few.

Assigning Role Groups to Users

In our old implementation, the UI only supported assigning one Role Group to a given user. Not only did we have to add support for assigning Role Groups to a group of users (Teams), but we also needed to support assigning multiple Role Groups to a given user or Team. We then had to guarantee that our application recognized the different Role Groups assigned to the user whether they were assigned directly to that user or via that user’s Teams.

Our Redux store really came in handy here. We already had these entities in our Redux store, so the work that we needed to do here was gluing these pieces together with a memoized selector. Here’s a very abridged snapshot of our app’s Redux state:

The structure of your state probably won’t look exactly like what we have above. The important part is that the state has all the information you need to ascertain the user’s scope of access:

  • What Role Groups (either default or custom) exist and what features do they grant access to?
  • Who is the current user?
  • Which Role Groups are assigned to the current user?
  • Which Teams have the current user as a member and which Role Groups are assigned to those teams?

In this case, the Redux selector that we want to write should use the state data to return an array of roles granted to the user.

The current user has an ID of user2. user2 is directly assigned to roleGroup3. user2 is also a member of the Tier 2 Escalations team, which is assigned to roleGroup4. As a result, user2 has the roles granted by roleGroup3 and roleGroup4. Therefore, we’d expect the output of the above selector to be ['org.user', 'org.permission.workflow.create', 'org.permission.workflow.read', 'org.permission.workflow.update']

Real Time Updates to Permissions

Admins are frequently modifying the permissions settings and assigned Role Groups for different agents and teams. It’s important that these access updates are received by users in real time and the UI reacts accordingly so that agents are not required to refresh the application to fetch the latest assigned roles.

Kustomer is an event driven system and so our backend would publish socket events to the relevant users to handle updating the Redux store on the client side in the following scenarios:

  • Role Group is created or updated
  • The current user’s Role Group assignments change
  • The current user is added or removed from a Team
  • The Role Group assignments for a Team the user is a member of change

When we have the settings page built, our selectors ready to use and our store is subscribed to real time updates, we can start looking at how we implement the access checks throughout our app.

Frontend Access Verification Architecture

The Kustomer application is huge, with features like Customer Timeline for agents, search filtering, a reporting dashboard for managers and a few dozen sophisticated settings pages for admins. There were thousands of files in our codebase that needed to be updated to check for the new permissions. We had a lot of ground to cover and needed to move fast, but we also needed something easy to adapt to for other teams implementing new features without hindering their velocity. There were two core ingredients in making this migration painless:

  • The PermissionService — this is a class with a singleton instance that has methods for determining access to a resource based on the user’s roles. We can use the selectCurrentUserRoles selector we wrote above to load the current user’s assigned roles from our Redux store. We can use PermissionService.can either in a non-React context such as route handlers, actions, selectors, and sagas or in component lifecycle methods to determine if we should load certain data when a component mounts. A barebones representation of this class looks something like this:
You might also consider memoizing PermissionService.can to prevent recalculating access more than you need to. You can bust the memoized cache if the user’s roles are updated during their session.

Example usage with React Router for blocking access to a page:

  • Using a higher-order component (HOC). The Permission HOC uses the aforementioned PermissionService and is useful for quickly wrapping form elements and buttons with access checks. If the access check fails, we can render the component with some sort of disabled or read only state. We use a tooltip to let the user know that they do not have access, but you can render any alternative UI you’d like.

Example usage in a save button:

In addition to buttons or links that direct the user to another form, we also try to wrap every editable input in a form with this HOC. This might seem excessive, but we have many pages in our application where a user might only have partial access, so we should not allow them to interact with any inputs that correspond to an API endpoint that they may not have access to.

Caption: This user has access to read and delete the Business Rule settings in our app, but not update.

The App Should Always Work, No Matter What Your Role Group Is

We mentioned that it is possible for a user to be assigned no roles or a blank role group. We can expect that these users aren’t able to access any API resources except for those that do not require any roles. (there aren’t many in our app) It’s worth testing the application as one of these users. Is the user able to load the app or do they just get a white screen with a bunch of errors in the console? Is the user able to navigate to pages that might be totally unusable for them?

If anything, a splash page advising the user that they have no access to any features is better than a broken view, which would cause confusion. In addition, we had API endpoints that required permissions that we ended up removing the requirements for. For example, we used to require roles to fetch the custom role groups assigned to the user. Seems like a catch-22, right?

If your team writes and runs end-to-end (E2E) tests often, then you might also consider writing smoke tests that run for users with different roles.

Developer Experience

Our team structure is akin to the Spotify model where teams are split out into different squads based on their area of ownership in the product. We were only one team of developers so we could not just release these changes and expect the other five teams to adopt the new roles system in their projects on day one. Here are some efforts that helped make rolling out the new paradigm easier:

  • Internal developer documentation with detailed how-to’s is a given. Documenting our new roles system was a requirement prior to launch and we made sure to notify the entire engineering team when documentation was available.
  • At Kustomer, we have monthly engineering town hall meetings where our teammates have the opportunity to give a demo presentation to the entire department. Getting the entire team together to give a quick overview of the roles implementation encouraged engineers from other squads to proactively let our squad know about features they were working on and enabled us to collaborate on the permissions aspect for their feature. Demonstrating our work in person definitely encouraged ongoing collaboration more than if we just dropped a link to the documentation in Slack.
  • If your frontend codebase is transitioning from a legacy RBAC implementation, then try to deprecate any legacy components and files as soon as possible. For a few months after releasing our new RBAC implementation, we still had the helpers for our legacy implementation in our codebase. Naturally, developers from other squads still may not have been sure which methods they should use, which resulted in us occasionally having to go back and update to use the correct methods
  • We migrated to Typescript soon after project completion. Typing our roles state and role groups settings pages has the benefit of enforcing the data structure that developers need to write when adding new resources and roles to the settings page.
  • Make sure that your authorization-related error logging is feature-specific. Bugs will pop up where we allowed a user to try fetching a resource from our API when they do not have the required access (indicating we missed an access check somewhere in the code). We use Sentry for error monitoring on the frontend and all engineers check it daily or get notified via e-mail, Slack, Pagerduty, etc. about new regressions. In our initial rollout, we logged thousands of errors related to rejected promises from failed network requests. Each error event was bucketed under a catch all error Unauthorized Request. While manually viewing the individual error events in Sentry gave some helpful clues about where in the product the error occurred, everything being bucketed under one issue discouraged developers from fixing permissions issues related to their area of ownership. Since we knew the API endpoint or which action dispatched the request in our error handling, the solution was to simply namespace the errors. For example, Unauthorized Request to /v1/customers/:id indicated that the access check for fetching customer data is incorrect, and the team that owns that feature can fix the bug.

Where can we go from here?

We covered a lot of the frontend tasks that went into our advanced RBAC implementation. While this article gives a glimpse of the data model, we have not gone through the layers of work required on the backend including but not limited to: How do we define and organize the required roles for different API endpoints? How would a microservice fetch a user’s role data to perform a required access check? If every user needs to fetch their roles on every request, then how do we scale?

Not to mention, there’s still more work that our team is cooking up to satisfy the needs of our larger enterprise customers. We recently released Field Level Permissions so that admins can be even more granular and configure which fields on an object a user can read or edit. We are constantly iterating to make the product more customizable without sacrificing the user experience.

We hope you gained some insight from this article. If there are any issues you notice with the implementation, something important missing or if you just have a question, please feel free to leave a comment.

--

--