Improving Identity Access Management at Cermati

Published in

Cermati Group Tech Blog

10 min readJul 5, 2022

This article is written in collaboration with Muhammad Faiz Al-Hadiid, Software Engineer (Infrastructure) at Cermati

Heimdall, the gatekeeper of Asgard in Marvel’s Thor Film

As a software engineer in the infrastructure team at Cermati, one of our jobs is to act as a gatekeeper to grant or revoke access to all of our infrastructure resources and tools ranging from the cloud web console, monitoring & logging dashboard, building pipeline, artifact registry, and many more. This job sounds easy at first, you just need to open the dashboard of the desired platform, pick the permissions to be granted or revoked, and then save the changes. But, imagine you need to do these steps for multiple platforms and dozens of people with different needs and this requires to be done multiple times a week. The easy task now becomes time-consuming and introduces toil that slows down other tasks.

Realizing this emerging issue, our engineering manager, Edwin Tunggawan, tried to find an existing open-source solution to solve this, but we didn’t find one that fits some of our specific use cases. Later, he proposed an idea to create a generic framework to manage our infrastructure resources access. This framework is expected to be platform-agnostic/use-case-agnostic and can be embedded in any kind of tool or web application. The framework is called IAMX (Identity Access Management eXecutor). We make the code for this framework public and the repository can be accessed here.

IAMX (Identity Access Management eXecutor)

It is built using JavaScript with the consideration of it having a low barrier of entry so that anyone can contribute to extending this framework for various use cases and also use this framework for any access management needs.

There are two major components in IAMX: the core and the connector implementation.

The core component itself is broken down further into 3 components:

Credentials Registry: A key-value store that keeps the credential and configuration information for the platforms that we are supporting in the form of YAML. The core also defines how the credentials will be loaded.
Executor: The backbone of the IAMX framework, as it’s the module that manages the creation of the connector object and execution of the logic implemented inside the connector object. The execution itself will be driven by the application logic where it’s being used and it’s also responsible to return the execution result to the application logic.
Connector Interface: It defines the API contract to be implemented in each target platform connector so that the executor can carry out uniform steps of operations regardless of the target platforms. You can also consider this as the abstraction layer to interact with the target platforms.

The connector implementation must comply with the contract that is defined in the IAMX core for it to be usable for the IAMX workflow. This can be implemented separately from the core, so the implementation details can be very flexible as long as it follows the contract.

To prove that this framework is useful and generic enough, Edwin initiated the project implementation by getting his hands working on the core modules and a few of our first connectors. Turns out, these were a breeze and the next step is we need to develop the application that will leverage this IAMX framework. The application we are developing is a web-based application to ease onboarding and offboarding workload even for a non-techie. The application is called IAMD (Identity Access Management Dashboard).

IAMD (Identity Access Management Dashboard)

The initial development of this application was assigned to our recently joined intern during that time, Ihsan Muhammad Asnadi, who now joins our product engineering team as a full-time engineer. He has done good work in developing the initial version of this application and diligently explore the suggestions given to him, making the initial version a good foundation for an application that can alleviate our pain in managing the onboarding and offboarding workload to our infrastructure resources.

Image of the onboarding sequence diagram of IAMD

The onboarding process involves three actors: requester (anyone in the company), approver (the department lead or manager), and provisioner (the infrastructure component administrator). It consists of two levels of approvals, the first one is from the approver who knows the context of what access the requester needs and the second one is from the provisioner who knows the context of the target platform use case details.

The process starts with the requester creating a new access request via IAMD. IAMD will send a new access request email notification to the approver(s) based on what department the requester is choosing during the request creation. The approver can choose to approve or reject the request if the access request context is inappropriate.

Once the approver approves the access request, IAMD will send another 2 email notifications, one to notify the requester regarding the progress and the other one to notify the provisioner that a request is ready to be provisioned. In this state, the provisioner can also choose to provision or reject the request if they deem the access request context is improper.

When the provisioner decides to proceed with the requested provisioning, IAMD will call the IAMX executor to provide the requested access. Upon successful creation of the request, IAMD will forward the generated credentials (can be any form) to the requester’s email address. For generated credentials in the form of username-password, we make sure that the target platform will force the requester to change the password on the first login whenever possible. This is important because even though the generated password is strong enough and only known by the system, if the email gets hacked, the hacker will also gain access to this platform.

Image of the offboarding sequence diagram of IAMD

The offboarding process only involves one actor, the admin, who will be responsible for choosing the existing access to be revoked via IAMD. After selecting the access, IAMD will put the selected ones into a revocation queue table in the database which will be polled asynchronously by a worker that will execute the actual revocation. We don’t use a message queue here because we want to keep it simple for the initial iteration and currently we also don’t need near-real-time processing for this access revocation.

These base processes have been implemented well by Ihsan and thus we can start using them for our day-to-day operation from the time he finished the initial phase of IAMD development.

Current State

Software is meant to evolve and so does IAMD and IAMX. We are still actively developing IAMD to date (at least until this article was published) and since Ihsan has moved to another team, the development is continued by our new joiners, Muhammad Faiz Al-Hadiid and Jonathan Selvyn.

Thanks to them, we now have integrated almost all of our infrastructure resources access requests to IAMD. Some of the supported platforms are cloud providers’ console accesses, Jenkins, Grafana, Nexus, Sentry, and even our SSH and VPN accesses. The support is not limited to this list; it’s still growing.

We also managed to synchronize the actual access state for our team members in each platform to IAMD. Therefore, we now can monitor and audit all of the existing access from a single dashboard. This helpful capability is mostly developed by Faiz and here’s how it works.

IAMD Synchronizer is a daemon process running in the background that will periodically fetch user info using IAMX from each platform and store it in the database. IAMD Synchronizer doesn’t directly interact with the IAMD backend system, but it shares the same database so it can modify user data directly.

The mechanism of our synchronizer system is pretty simple. Basically, it will wait until the scheduled time then fetches user data from each platform and store it in the database. The detailed flow of our synchronizer system is explained in the flowchart below.

The synchronization worker runs at a certain scheduled time which In our case runs twice a day. We run the synchronization process for each platform in parallel. Since we use JavaScript, we just run it on an asynchronous function. We can use thread in the programming language that supports it. On every platform sync, we fetch all user data from the platform’s connector. After we get the user data, we assign a unique synchronization ID using UUID as an identifier when this specific user is fetched then after that we bulk insert the user data. After the bulk insert succeeds then we can safely remove user data from the previous synchronization. We wrap all the above processes on a database transaction. If one process fails, we roll back the pre-synchronization state to maintain data consistency. We can set the maximum repeat of the synchronization process if the process fails.

Based on the explanation above, our synchronization process is basically just dumping and deleting previous sync data at a specified time. There are reasons for that. First, our application is not critical enough, so we don’t need to update it too often. Also, we can just execute the synchronization script manually if we want to synchronize outside of the scheduled time. Second, we don’t do user data and permission mutation on the IAMD database because every change in user data will be stored in the platform’s internal storage. So, we choose to dump and delete previous sync data instead of manually updating each record one by one.

The features we have implemented so far have aided us in cutting down the toiling work for onboarding and offboarding our infrastructure resources (we’ll talk about the numbers in the next section). But, we don’t stop here for development. We have some future improvement plans that we’ll discuss in a later section.

Return of Investment

You might be wondering whether implementing IAMD is a good investment. Oftentimes when engineers try to implement automation, we are too focused on developing the automation and have less time to do the original work.

Image from xkcd webcomic depicts what we might often get when implementing automation (https://xkcd.com/1319/)

Fortunately, this is not the case for IAMD. We can say that it’s a pretty good investment since it cuts down approximately 90% of the time required to onboard a new joiner to our infrastructure resources. We measured the time to provision accesses for 8 platforms using IAMD compared to without IAMD, we got around 2 minutes when using IAMD and at least 30 minutes if we do it without IAMD.

Aside from reduced onboarding and offboarding time, we also get the centralized dashboard to get an overview of existing accesses across multiple platforms which really improves our visibility. Also, we can revoke stale/suspicious access from this centralized dashboard.

Future Improvement Plans

Up to this point, IAMD has passed the MVP (minimum viable product) phase and we already have some improvements in our backlog. Here are some examples:

Integrate more platforms, not only for infrastructure-related but non-infrastructure resources as well. There are some platforms that are used by other teams that are not maintained by the Infrastructure team, such as our back-office dashboards, third-party marketing tools, and third-party helpdesk dashboards. We want to make sure that access to these platforms can be managed and monitored as well from IAMD to minimize the risk of accidental unauthorized access grants.
Capability to send the access request to the requester’s direct supervisor for approval. Currently, when a team member creates an access request, the request will be sent to his/her division’s team lead/supervisor/PIC (person in charge) based on a list that we configure manually. Initially, we decided to pick this trade-off because the list itself is not too long for now and it rarely changes. But, as the company rapidly grows, the list might get long and difficult to maintain in the near future. Therefore, we’ll prepare for that by implementing the capability to determine the requester’s direct supervisor based on actual HR data.
Capability to grant temporary access. Sometimes, a team member only needs access to a certain platform temporarily, and currently, IAMD can’t automatically revoke the access when the access is not required anymore. We can easily forget to revoke these temporary accesses manually and may create an attack vector, should the credentials leak. We want to minimize this risk by automating the revocation for temporary access.

Conclusion

We are building our custom identity access management framework and service since we didn’t find one in the market that fits some of our specific use cases. The framework is called IAMX (Identity Access Management eXecutor), which is platform-agnostic/use-case-agnostic and can be embedded in any kind of tool or web application. This framework consists of the core and the connector implementation for connecting with any platform.

We use this framework in developing our identity access management service called IAMD (Identity Access Management Dashboard). We define the business logic/onboarding-offboarding workflow in IAMD, so any change to these workflows won’t affect the implementation in the IAMX framework.

Up to this point, IAMD has been integrated into most of our infrastructure resources. It helps us cut down approximately 90% of the time required to onboard a new joiner to several of our infrastructure resources compared to manually onboarding the new joiner to each platform manually.

We also have several improvement plans in our backlog, like integrating more platforms for infra and non-infra resources, the capability to send the access request to the requester’s direct supervisor for approval, the capability to grant temporary access, etc.