How to monitor the age of your AWS credentials, using Terraform, Go, AWS Lambda and Slack

Not so long ago, the world relied on machines that had to be managed and provisioned manually. Luckily for us developers, the past couple of years have seen a new wave of technologies become the new standard. I’m talking about Serverless Architecture and Infrastructure Automation. I was exposed to both technologies 6 months ago and I fell in love.

In this post, I wanted to share an example that help explain why these technologies got me so excited. To do this, we’ll take a look at our recipe for building a notification system in AWS (using Terraform, AWS Lambda, AWS Cloudwatch and Go). This system is designed to notify us of any outdated user credentials by sending an alert through Slack.

System architecture

Summary

In this example, the user (Demogorgon) has created a credential key. 25 minutes after the key’s creation, there will be an increased risk that a problem may occur. Therefore, to prevent the key from going out of control, an incoming webhook (Eleven) will notify us every 10 minutes of its age by sending a message through Slack.

At infrastructure level, using a Cloudwatch event (UpsideDown), the webhook interacts with the user’s credential key every 10 minutes. This event launches a Lambda function (OperationMirkwood) that, without provisioning any machines, is able to take a glimpse at the status of the key’s age. In response, the webhook sends a report that details the key’s age.

Coding Go

This code receives an IAM user and identifies the age of their credential key with AWS SDK for Go This happens over a serverless instance using AWS Lambda. AWS Lambda has multiple runtime options, from Python, to Go, NodeJS, and Java.

I’ve had a soft spot for Go since the first time I tried it. It’s really fast, easy to learn and has one of the best code benchmarks on AWS Lambda. Here are some of my highlights from the repository

  1. This is a smart way to get the environment variables using LookupEnv

2. This is the function for sending a slack notification, (we need to setup an incoming webhook to get the URL).

3. This is a AWS IAM SDK for Go usage example

Once you’re done with your Go code, the next step is to find the fastest, easiest, and reusable way to deploy our Go code. The next sections will try to explain how to do it in detail!

Setting up Terraform

As you know, managing a simple infrastructure is pretty easy using the AWS Console, but when the infrastructure grows up it becomes harder to handle. For this reason, at EmpathyBroker we have started using Terraform to provision our underlying AWS infrastructure.

Wild Terraform appears to Empathybroker’s DevOps Team

Although getting Terraform up and running can be as easy as riding a bike, when you are working in a DevOps team where the infrastructure can be changed multiple times, it’s essential that you follow Best Practices when doing it.

For example, Terraform uses local storage to persist its data by default, but this is not the best option for us. Fortunately, the guys from Terraform have provided us with some flexibility. We decided to use the S3 backends. instead as they help us prevent corruption and keep any sensitive data off disk.

We use Dynamo DB to protect the Terraform state with locking and consistency. This is perfect for working as a team as it prevents people from making concurrent or overlapping changes. We also use Amazon S3 with encryption enabled. S3 is the only location where the state is persisted and it allows us to quickly rollback to a previous version if faced with unexpected errors.

When using vanilla Terraform, there are a number of other limitations to keep in mind. These limitations primarily stem from how difficult it is to keep code DRY (Don’t Repeat Yourself) its single plan execution, its remote state management, and in hardcoding an S3 path using the S3 backend.

To avoid these limitations, we use a Terraform wrapper known as Terragrunt. This allows us to repeat code more easily, multiplan execution with plan-all, (apply-all or destroy-all are interesting options depending on your needs), improve remote state management, and use relative paths. What we love the most about Terraform is how fast you can create, modify and destroy resources in AWS.

The DRY concept is a must when you have to handle a lot of resources, (and most of them more than once). In order to adhere to this we use Terraform modules that promote reusability wherever possible. For this reason, we set up a folder only for modules and another for the infrastructure that uses those modules and standard Terraform code.

Terraform live and modules

Working with modules is extremely easy, and it will be explained in the next section.

Designing Infrastructure

Our Go code is executed as a Lambda function every 10 minutes. This is triggered by a Cloudwatch alarm in order to prevent the user credentials key from reaching its out of control area. The infrastructure is created using two modules; one to configure the Lambda function and another to configure the Cloudwatch alarm.

The Lambda module creates a Lambda function with the IAM policy attached in the repository. We specify a .zip with our Go code and the environment variables that are used by that code:

Below you can see how we attach a policy that limits the permissions to list the users and the access keys from IAM:

The Cloudwatch module creates a new Cloudwatch event called UpsideDown scheduled to run every 10 minutes and associated with the lambda function:

The Cloudwatch module requires data from the Lambda module. Therefore, we need to pass the output from the Lambda module as input to the Cloudwatch module.

In the main.tf we configure the provider which needs the region and the role ARN to assume the role through AWS Vault. The .tfvars files are config files for Terragrunt.

The Terragrunt file inside the folder points to where the Terragrunt config is located within the parent folder. In the Terragrunt config, we select the backend type, the bucket where the state is located in S3, and the Dynamo DB table:

Once the Terraform code is done, we only need to plan the infrastructure using the plan command:

If the plan’s successful, everything should work fine. If we want to deploy it over AWS, we only have to hit an apply command and Terraform will create the resources planned before. Voila!

The Results

We should now be able to check our messages on Slack to see the status of the Demogorgon’s credentials key.

Demogorgon user specs
Slack messages

As we can see in the pictures above, the Demogorgon’s user credentials key was created at 18:07. At 18:40, Eleven alerted us through Slack that the Demogorgon’s key is now out of control. Everything seems to be working as expected.

A Trail of Destruction

Terraform is a pleasure to plan and create resources with, but it can be ruthless too. If we ever want to remove all of the changes we’ve applied, we can hit the destroy command. This causes Terraform to obliterate the resources created before:

The destroy command is important because you can delete many resources using just one command. It’s far simpler than using the AWS console. This means you can create and destroy resources easily and avoid increasing your AWS bill.

If you look carefully at the Terragrunt commands above, you’ll see they are all prepended by “aws-vault”. AWS-Vault is an open source utility for securely storing and accessing AWS credentials. Explaining our AWS-Vault setup would be worthy of another extensive article. For now, you can take a glimpse at the following workflow to get an idea of how it functions.

AWS Vault workflow

Conclusion

I hope that this taster has gotten you just as excited about what’s possible with serverless architecture and infrastructure automation as I was six months ago. You can download the completed project here. In future posts, we will see how to leverage what we’ve shared here to build more awesome stuff. Stay tuned!

If you want to learn more about EmpathyBroker and how we do things head over to our blog. If you liked this, I’d also recommend that you read my colleague, José Hermosilla’s, post on the other monitoring practices we’ve put in place to comply with the GDPR

Do you have any questions about infrastructure or an experience you’d like to share? Please, drop a comment below. I’d love to hear from you!

About Me

I’m a Telecommunications Engineer and an eager beaver in DevOps culture. I started working in On-Premise infrastructure where the interdependence between teams gets really complicated. Happily for me, for the last 6 months I was totally focused in infrastructure automation of AWS using Terraform at EmpathyBroker. Furthermore, I’m an AWS Certified SysOps Administrator Associate, an AWS Certified Solutions Architect Associate and an AWS Certified Developer Associate.

Images used: