Automating secret rotation in AWS
At some point when writing code you will have to deal with secrets, such as API keys or access tokens. Coming up with an effective way of storing and accessing these secrets can be difficult, especially if you want to do it in a secure and scalable way. On top of this you also need an easy way to rotate these secret values to ensure they stay secure and to prevent them from being compromised.
At Peak, we wanted to find a way of simplifying this process of managing secrets, but before I go into how we did this, let me explain why this was so important.
The Problem
At Peak, most of our cloud architecture is hosted on AWS. Therefore, we use AWS CloudFormation extensively to deploy and manage our architecture. In fact, we have over 1300 CloudFormation stacks across all of our environments. Using CloudFormation is a great way to deploy and manage your infrastructure stacks, however, trying to manage the access tokens used by all of these stacks can be a nightmare.
In AWS there are currently two options when it comes to storing access tokens. You can either use AWS Secrets Manager or AWS Parameter Store (where you have the option of storing values as secure strings.) The problem with both of these options, we found, was that there was no easy way to pass the secrets to the resources being created by the CloudFormation stacks, such as AWS CodeBuild. The only way to achieve this was by either:
1. Storing the access token as a stack parameter
2. Writing a custom function to retrieve the access token which could be read by the CloudFormation stack during deployment
Unfortunately, there were issues with both of these options. With the first option, when it became time to rotate an access token you would have to manually find and update each stack individually. For us, this would be a massive undertaking and just wasn’t scalable.
The second option of using a custom resource did solve the issues around using stack parameters, however, it actually creates more problems than it solves from a security perspective. The main issue being that this process takes an access token value, which is stored securely, and then writes it as plain text to an S3 file. This file is then read by the CloudFormation stack during deployment. This means that if you don’t have the correct permissions set up in S3, your access token could be visible to anyone.
The Saving Grace
Thankfully, AWS has now introduced a third option. In CloudFormation you can now import values straight from AWS Secrets Manager into your stacks using the following command:
{{resolve:secretsmanager:{secret-name}:SecretString:{secret-key}}}
This meant that we could pass our access tokens from AWS Secrets Manager into our CloudFormation stacks without having to enter them as parameters or write them to S3. Perfect! Well…not quite.
In CloudFormation you can only update a stack if there has been a change to the stack template. Unfortunately, because the above line of code doesn’t change, it isn’t possible to rotate the access token used by your resources simply by performing a stack update.
The second issue is that, ideally, you want the process of rotating an access token to be as quick as possible. For instance, if one of your access keys becomes compromised, you need to be able to rotate that access token as quickly as possible. You can’t afford to spend all day manually finding and updating stacks. As I mentioned earlier, at Peak we have over 1300 CloudFormation stacks across all of our environments. We therefore needed a way of automating this entire process.
The Solution
To build our solution we decided to use AWS Step Functions. Using a step function allowed us to build a simple workflow which could be reused.
There were two main parts to this problem:
1. Find all the stacks which need to be updated
2. Update all the required stacks so that the new access token would be pulled through to the resources
Firstly, to each CloudFormation stack which would need to use the access token, we added a parameter called RotateToken which was set to true. Then, using the AWS SDK, we listed all of our CloudFormation stacks and then filtered them based on whether they had RotateToken as a parameter.
The second task was a bit trickier because, as I mentioned earlier, updating the stacks once wouldn’t work because there has been no change to the template. To get around this problem we added Conditions to the any of the resources in the CloudFormation template that would be using the access token value. CloudFormation Conditions allow us to define the circumstances under which resources are created or configured. Basically, if the condition equates to true the resource is created, if it is false the resource is not created. We defined the condition as follows using the value of the RotateToken parameter, which has already been added to the stack, like so:
Conditions:
RotateToken:
!Equals [ !Ref RotateToken, “true” ]
You then just need to add the following statement to any resource which will be using the token:
Condition: RotateToken
In the Step Function we we would then update the stacks twice. The first update would set the `RotateToken` parameter to be `false` therefore effectively deleting any resource which uses the access token value. The second update would then set the `RotateToken` value to be `true`. This would recreate all the previously deleted resources and force them to pull through the updated access token value from Secrets Manager.
Finally, we configured the Step Function so that it would be triggered by an update to the secret in Secrets Manager. This was easily done using a Cloudwatch Event rule.
{
"detail-type": [
"AWS API Call via CloudTrail"
],
"source": [
"aws.secretsmanager"
],
"detail": {
"eventSource": [
"secretsmanager.amazonaws.com"
],
"eventName": [
"PutSecretValue",
"UpdateSecret",
"CreateSecret"
]
}
}
All of this meant that by simply updating the access token value in Secrets Manager, all of the CloudFormation stacks which use that token would automatically be updated, saving countless hours of manual work.
This isn’t the whole story though. Once we proved that the solution was viable, we then had to make it ready for production and integrate it into our actual code base. To see how we accomplished this make sure that you follow Peak AI Product for the next article in this series…
Access Key Rotation: Converting a POC to Production Ready Code
View more articles like this in the Peak Content Hub.