Centralized Cloud Identity and Access Management Architecture at Cermati

Published in

Cermati Group Tech Blog

8 min readNov 14, 2022

Some time in April 2022, I stumbled upon this paper by Saurabh Deochake and Vrushali Channapattan talking about Twitter’s approach to centralizing their data resource access management in their hybrid cloud infrastructure setup. At Twitter, they began with on-premise infrastructure for which they’re using LDAP for infrastructure access management. Later on, they decided to also use Google Cloud Platform (GCP) for their data platform infrastructure and they needed to unify the identity and access management system between their on-premise system and GCP infrastructure.

After reading the paper, I saw that what they were doing has some parallels with how we’re trying to centralize our cloud IAM architecture here at Cermati. We have already written a few articles related to our effort in centralizing our IAM architecture before, which can be found here:

But in those articles, we were specifically talking about our public key infrastructure and internal IAM dashboard instead of talking about the more holistic design of the centralized IAM architecture we’re implementing. Therefore, inspired by Twitter’s paper, I decided to try writing this article to describe how we’re centralizing our IAM from a more holistic point of view.

Scope of Work

The first thing to be noted is that the scope of our IAM centralization is a bit different from what Twitter’s team did in the paper.

In Twitter’s case, they’re mainly concerned with the fine-grained access management to data resources necessary for analytics purposes. They already have this implemented using LDAP for their on-premise system, but at some point, they adopted Google Cloud Platform (GCP) and their data is now located on both their on-premise system and also on GCP.

In our case, we are mainly concerned with role-based access to our infrastructure and other IT resources which are located on the cloud — where each asset might be tied to a certain software or cloud platform provider, and each software or platform provider has its own IAM implementation that’s entirely separated from each other. The access policies required by the employees we’re granting access to the resources tend to be quite generic according to their job roles. Thus, a simple role-based access control model should suffice for our needs.

Our system never really has any on-premise component aside from our VoIP system, which is managed exclusively by a separate team. Technically, we should be able to integrate our VoIP system with our public key infrastructure and IAM system. But due to differences in the system standards and general requirements of the system operations, we decided to not pursue the integration yet. Therefore, the scope of our discussion in this article is limited to the assets we have on the cloud.

General Architecture

The following diagram shows the high-level architecture of our IAM system.

In the architecture diagram above, we can see the system components divided into three core groups:

Identity Management, which contains the source of truth for our employee’s identity and its derivational components to be used for authentication and authorization purposes — PKI and TOTP.
Access Management, which receives the identity of the employees — whose access needs to be provisioned and managed — from the Identity Management components, and then proceeds to execute the required procedures to the target platforms (and also synchronize any changes on the platform back to the IAM database).
Target Platform, which runs the access management procedures from the Access Management components.

Identity Management

We use Google Workspace (which is tied to the employee’s company e-mail account) as the source of truth for the whole IAM system. It will be used to provision and revoke stuff on the system resources managed by the IAM system, and everything will be traced back to the Google Workspace account.

The following diagram shows how the identity synchronization flow works.

Employee identity is replicated to each target platform to be managed by each platform’s access policies.

Each employee will have an account provisioned on the target platforms. The accounts on the target platforms are basically a replica of the employee’s identity, they will be configured to be associated with the employee’s email account but managed by a separate platform outside Google Workspace’s scope. The accounts will be provisioned on a target platform only if the employee specifically requests access to the platform and the request is approved by both the employee’s manager and a representative from the team assigned to manage the platform.

Our setup has a weakness where if the employee’s email account is deleted during offboarding, the deletion doesn’t automatically cascade to the other accounts provisioned for the employee. This might not be a big security issue for platforms where we’re using Google’s auth API to log into, but it is a concern for platforms where username and password pairs are used. But this is addressed by the access management system, as IAMD (the access management dashboard) has the capability to perform batch revocation to accesses associated with an employee’s email address given the email address as the input.

The Google Workspace account identity is also extended to our public key infrastructure (PKI) and auth server platforms, where employees can have certificates — which can be used as proof of identity when accessing internal systems with PKI integration — and authenticator devices to generate TOTP (time-based one-time password) — which can be used as an extra factor when authenticating to internal systems that requires an extra layer of security.

The identity from Google Workspace is used to create certificates and authenticator devices for employees to access internal systems which require extra layers of security.

Access Management

Unlike the case with Twitter where they configured their LDAP groups with access policies to be mirrored to GCP, we don’t mirror permissions from Google Workspace to the target platforms. This is due to Google Workspace not meant to be used as an access control management system that can be integrated with other platforms, which is the selling point of LDAP.

Thus, we need to move the access policy and permission definitions elsewhere. In our case, it’s handled by IAMD. As mentioned before, we’re using a role-based access management structure to determine which resources can be accessed by a company employee.

The access provisioning workflow for an employee.

When the employee needs more access to the target platform that’s not defined in the role for the employee’s job, let’s say because the employee is promoted, they can just follow the same workflow to ask for more access.

When the access needs to be revoked, an IAMD admin can simply provide the email address of the employee and which target platform’s access that needs to be revoked from the system.

For a more detailed explanation of IAMD’s architecture and how it works, we have a separate article talking specifically about IAMD.

Target Platform

The target platforms include a variety of standalone software and systems required for Cermati’s day-to-day business operations. Thus, the target platforms usually already have their own IAM capabilities and workflow defined.

Because the target platforms already support multi-user workflow with their own access management models, we’re just piggybacking on the target platforms’ access management features when integrating them with our IAM ecosystem. We just need to implement an integration module for the target platform that’s compliant with IAMX, a simple identity access management framework we built internally to enable us to easily integrate all of the platforms with our IAM ecosystem.

IAMX itself is basically just a definition of a set of standards we should comply with when developing a connector module to integrate with our target platforms, with an IAMX core executor module that will validate the connector module interfaces when running the connector’s access management procedure implementations.

More information on IAMX can be found on the GitHub repository of the project.

Conclusion

Cermati’s IAM system started to take shape when we first implemented our public key infrastructure (PKI). Initially, it was used only for managing the developers’ access to our software development infrastructure and was connected to GitHub as the source of truth for the employee’s identity. But we saw the potential to expand the coverage to also handle internal resource access for non-engineering divisions, so we decided to rewrite the system to use Google Workspace as the source of truth for the employee’s identity instead of GitHub — since outside engineering, most of our employees don’t have GitHub accounts.

The certificates issued by PKI were initially only used by developers to access internal tools such as Jenkins. But as we expand the PKI to be usable by divisions other than engineering, we added more things to be integrated with the system by allowing the certificates to be used for authenticating to our company VPN servers — and later also to SSH servers running on our staging and production environments.

Managing access becomes cumbersome pretty quickly, so we designed IAMX to standardize our access management workflow and developed IAMD to help with our day-to-day access management operations. But integrating them with some of our infrastructure components required us to develop an intermediary interface for the components in order to allow the IAMX connector modules we developed for the purpose, so we developed CSSH, Torii, and Krew to provide IAMX with better interfaces to our SSH, VPN, and Kubernetes cluster access management functionalities while adding TOTP to the authentication workflow for extra security where it’s needed — here’s where we introduce PrivacyIDEA to our infrastructure. CSSH, Torii, and Krew aren’t covered in this article, we might write articles to talk more about these services in the future.

Way before we wrote the initial implementation of our PKI, we were looking into LDAP as an identity management solution to be integrated with Google Workspace. We really wanted to have an IAM system where everything can be integrated with the Google Workspace identity. But at the time, the method we discovered for LDAP integration to the Google Workspace requires LDAP to be the source of truth instead of Google Workspace — which is quite risky since if LDAP isn’t configured properly we might just wipe out our whole Google Workspace users. So we decided not to pursue it, and since we also had a few more urgent things to work on at the time, we ended up abandoning LDAP.

If we decided to continue pursuing an LDAP-based solution back then, we might end up with a very different setup for our IAM ecosystem right now. But for now, we consider the setup we end up with to be good enough for our use cases while being easily extendable to cover a lot more platforms to be integrated with the IAM workflow.