Improving database security at FollowAnalytics with AWS IAM database authentication and ConsoleMe
We’ve been investing a significant amount of our time at the SRE team on improving security at FollowAnalytics. Some of the things that we’ve been working on include vulnerability detection with integration to our ticketing system, data encryption, dynamic credentials everywhere via AWS Roles, automated backup replication with the AWS Backup service, centralized account management via AWS Organizations, Service Control Policies (SCPs), and the AWS SSO plugged to several services that we use such as Datadog, Databricks, AWS Client VPN, Spinnaker, etc. Our goal is to store our client’s data safely, and removing any static credential is a crucial aspect to achieve it.
Netflix open-sourced a tool called ConsoleMe at re:Invent 2020. The service caught our attention as it aligns with the changes we’ve been making at FollowAnalytics regarding access management. The AWS Security Pillar advocates for implementing the principle of least privilege, separation of duties for each interaction with your AWS resources, and removing the need for long-term static credentials, and that’s where ConsoleMe has been helping us.
This tool provides an interface where the developers can ask for permissions to access AWS resources in a very simplified way. On the SRE side, we have to review and approve the permissions asked, so it removes many toils from what we do. We’re a small team of 2 SREs, so optimizing our workflow and focusing on what’s important is vital for our success. All the permissions asked are linked to AWS Roles, which guarantees that no static keys are moving around. After that, the developer can use their role via their terminal session or by clicking on the IAM Role on the ConsoleMe dashboard to get them to the AWS Console with their role assumed.
AWS provides IAM authentication to several database services, so you can access the databases with temporary passwords, and that’s where we decided to use ConsoleMe as a solution for providing and removing access to our database services. The users authenticate to ConsoleMe with the AWS SSO service, and they can ask for permissions via policy templates that come with the tool out-of-the-box or templates that we can configure ourselves. As soon as we approve the request, they can use their AWS Role to perform actions on AWS.
The scope of this article is to describe how we can use ConsoleMe to provide secure access to our databases, but keep in mind that the tool completely replaces the authorization process of the users by using IAM Roles either from the AWS Dashboard or the cli.
Imagine if you could access your database with a dynamic password easily generated by a terminal command or even via a Bot on your corporate messaging system. That password automatically expires within a few minutes and is shared with encryption in place. The secret is automatically deleted as soon as you read it, and asking for the database credentials can be easily audited. There are no static passwords stored anywhere. How long does it take? A few seconds.
If that looks like magic, or way too complicated, stay tuned. We will demystify how you can achieve something similar to that at your organization.
ConsoleMe gives access to AWS resources by using AWS Roles. It can assume a particular IAM Role if it has a specific tag. The tags are based on either the user itself or a group. To automatically generate those roles for the developers, we store on S3 a configuration file with the databases that we want to manage the access with ConsoleMe and the users that should have access to that database. Users are identified by their corporate email address, which is easily trackable on Slack and the AWS SSO.
The configuration file looks like the following:
As soon as the file gets updated via the Continuous Integration tool, a Lambda function is triggered to automatically create an AWS IAM Role and database user based on IAM authentication. The ConsoleMeInstanceProfile can assume the IAM Role and another lambda function presented later in this article. The lambda function abstracts the logic behind creating users, adding them to database roles or groups, depending on the database engine of choice. At this point, every developer listed on the configuration file will have its database username created and an IAM Role displayed on the ConsoleMe dashboard.
The next step for a developer is to ask for permissions to access a specific database, which can be easily done via the ConsoleMe Policy Wizard. We already have a policy template, so developers don’t need to write any policy by themselves. They only need to provide the database id, the AWS Account where the database is provisioned, and the username they will access (read-only or read-write). This Dashboard looks like this:
After adding the permission, ConsoleMe will generate the AWS Policy and email the SRE team to review and either approve or deny the request. Once the access is approved, an inline policy is attached to the Role and ready to be used.
One way to request the database credentials is to do it via terminal with the ConsoleMe cli tool called weep and the well-known aws-cli. For example, the developer can use weep to export the AWS Keys generated by the AWS STS for a given Role and then execute the aws-cli command to generate the password for either RDS or Redshift.
That can be achieved by executing the two commands below (for RDS Connections):
eval $(weep — config weep.yaml export arn:aws:iam::12345678910:role/DatabasesRole_hugohenley)
aws rds generate-db-auth-token — hostname database-identifier.xxxxxxx.eu-west-1.rds.amazonaws.com — port 5432 — username hugohenley_ro — region eu-west-1
The generate-db-auth-token command will generate a temporary database password that is valid for a few minutes only, and it’s as simple as that. Once connected to the database, there is no need to generate a new password again.
The Slack App
We have some analysts who need read-only access to the database and are not necessarily familiar with setting up tools like the aws-cli and weep. So we decided to provide a second option to retrieve a dynamic database password using a tool that is already part of their daily routine: Slack.
A Slack App was created (we named it Bob, the Bot) with the Slack Block Kit to provide an interface for the analyst to ask for the password for a given database.
The Slack app is deployed on our EKS cluster and uses the Slack API Socket Mode to receive events from Slack. It will parse the event as soon as the user clicks a given button and store the necessary information on DynamoDB. The data contains the trigger id, the email of the Slack user that requested access, the action id (used to map the action to the correct database), and the channel id, used by the bot to open a communication with the user on Slack.
A lambda function will be triggered by the DynamoDB event and take all the necessary steps to generate a password valid only for the user requesting the access. Even though the database credentials are only valid for a few minutes, we decided to add yet another level of security and encrypt the message somehow, so it’s not stored in plain text anywhere. We achieved that by calling the PrivateBin API (the tool that we use internally to share secrets between team members securely) with the database password to be encrypted. The PrivateBin API will then return a link to the secret. This link is only accessible if the developer/analyst is connected to the AWS Client VPN and automatically burned after opening.
The Lambda function
We deployed a DynamoDB database and the lambda function with the AWS Proton service integrated into our GitHub account. The AWS Proton is now our default deployment engine for lambda functions and the related resources.
Behind the scenes, that’s what happens when the lambda function is triggered:
The following sequence diagram describes the steps taken by the lambda function:
All that process is entirely transparent for the user and takes around 5 seconds. Once they click on the button related to the database that they want the temporary password, the bot immediately replies, informing the user that the request was handled. A few seconds later, Bob will reply with the PrivateBin URL that contains the database password to be used.
Deleting unused policies automatically
If the policy attached to a user is not used after a predefined time window (usually 90 days), we automatically delete it and email the users informing them that their policies will be removed due to prolonged inactivity. We want to guarantee that the employees only have access to what you need to perform your job. The developer is informed in advance when a specific Policy is about to be deleted, so no one has bad surprises when trying to access a service that they used to have access to. Repokid (https://github.com/Netflix/repokid/) and a lambda function perform that process to send reports to developers and the SRE team with all the candidates for policies removal.
AWS IAM Groups are recommended as a best practice to grant least privilege, but having ConsoleMe combined with repokid was one step further for us in terms of control and automated expiration of unused policies. The details of the implementation of this process are not in the scope of this blog post, but we plan to talk about it soon.
At this point, you’re probably wondering how tools like Terraform fit into this process. We heavily use Terraform at FollowAnalytics, and pretty much everything on our infrastructure is created by Terraform modules that we either use from the community or develop ourselves, and that’s also true for ConsoleMe. However, we also think that specific tasks need to be coded using a programming language to be fully automated and free our time.
We’re a small team, and we can’t spend the entire day configuring HCL and YAML files if we want to do more. Tools like ConsoleMe, and Repokid can take our team to the next level of automation and reduce the number of repetitive tasks that we need to perform, so we can focus on the next task that will bring real value to the company. In general, databases, buckets, and IAM Roles used by the apps are not as dynamic, and we want them to be versioned and have them following the git-flow, so we keep them all on Terraform.
The processes described above made the database access much more secure since we don’t have static passwords linked to database users. In addition, we can easily enforce MFA to any service integrated with our AWS SSO. Another benefit that we found is removing users from the IAM Users list and keeping them on the AWS SSO, where the AWS keys are dynamic by default.
Having dynamic database passwords in place is not as practical as having static passwords saved on your preferred database tool to connect easily. However, static passwords present a risk for the company, so we needed to balance security and productivity. The password generation via either Slack or weep+aws-cli adds a few seconds to the authentication process only. As soon as your connection to the database is established, you don’t have to authenticate again, so it looks pretty reasonable, especially when considering its benefits.
Is it worth the effort? It took us two weeks to design and develop the entire lambda and Slack solution, with one engineer working on the database authentication and Slack bot while the other team member worked on making ConsoleMe production-ready. So I would say the immediate gains were huge compared to the effort that we had to put into the solution. We also took some time to start contributing to the project. Many thanks to the Netflix engineers for opening the source code of such a fantastic tool. We hope we can contribute more to the project and its vision. We see lots of potential in the tool for IAM-related tasks and some others, like copying S3 buckets and databases for our data team.