The power of CloudWatch Events

How to update Route53 records when a new instance is created using AutoScaling events, Lambda function, and Terraform.

Published in

Yoyo Engineering

5 min readMay 2, 2017

Recently I was looking at a way to automatically update Route53 records for instances created by an AutoScaling Group. Most people tend to choose Elastic Load Balancer to do that. But to me it felt unnecessary because the service I was building didn’t need load balancing and high availability at that stage. Also, I didn’t want to pay unnecessary for ELB.

My first approach was to create a simple Python script and trigger it when an instance is create by using AWS User Data. Unfortunately, it would mean granting permissions to modify public Route53 records to the instance even though I only needed it on the bootstrap. Then I found a great blog post on AWS website about using AWS CloudWatch Events and Lambda functions to autoregister new instances. It looked promising. Lambda function would be the one that needs Route53 permissions and I would pay only for a single execution. I decided to simplify Jeremy’s and Efrain’s solution not to use DynamoDB. Also, I made a decision to only update public dns records for specified AutoScaling events.

I started with a very simple Lambda function to see how it works. After few changes to my code, I was annoyed with uploading the code. Since we have already been using Terraform at Yoyo Wallet, I decided to add it to the toolset to automate this process.

You can find the overview of my approach as well as an interesting challenges that I had to face below. If you would like to see the whole code, you can find it here. Enjoy!

AutoScaling

I started with creating Terraform code for AutoScaling Group first. My configuration is straightforward — a single instance in a single Availability Zone with few tags populated on launch. There are two important bits though: “CNAME” tag and “depends_on” setting. First one is necessary to find the right Route53 record to update for the newly created instance. It makes the code reusable for other AutoScaling Groups. “depends_on” guarantees the right order of created resources. I am using “aws_cloudwatch_event_target” so that the Lambda function and CloudWatch Rule were ready before AutoScaling spins up a first instance.

Lambda code

After creating the AutoScaling Group I was able to start working on my Python script that would update Route53. Amazon has a great documentation about how to write Lambda function in Python and what events AutoScaling is sending. Our Lambda function has to accept a JSON data and check if the event type is the one we are looking for. Next it needs to extract instance ID from the event, find its CNAME tag, and update Route53 entry. The code can be found here. Below you can see an example event that you can use to test your Lambda code.

{
  "detail-type": "EC2 Instance Launch Successful",
  "source": "aws.autoscaling",
  "detail": {
    "AutoScalingGroupName": [ "<your_ASG_name>" ],
    "EC2InstanceId": "<your_running_instance_id>"
  }
}

IAM Role and Policy

In my opinion, the most difficult part. It took me a while to understand how Lambda function gets its credentials from IAM and what “AssumeRole” action was. Basically, it allows an application or a service to request temporary credentials from IAM for the specified role. It does that by calling Security Token Service (AWS STS) endpoint with “AssumeRole” action and ARN role. Returned credentials have permissions specified in a policy for the requested role.

CloudWatch

CloudWatch is the glue that connects all the pieces together. It will receive events from AutoScaling group, filter them out, and trigger Lambda function. In my case, I am only interested in successful launch events from one particular AutoScaling Group so I need to specify these fields in my “event_pattern”.

{
  "source": [ "aws.autoscaling" ],
  "detail-type": [ "EC2 Instance Launch Successful" ],
  "detail": {
    "AutoScalingGroupName": [ "${var.environment}-${var.role}" ]
  }
}

Lambda Function

As I said in the beginning, I wanted Terraform to upload my Lambda code. To do that, I am using simple bash commands called from “null_resource”.

resource "null_resource" "prepare-lambda" {
  triggers {
    main         = "${base64sha256(file("${path.module}/files/update_public_info.py"))}"
    lib          = "${base64sha256(file("${path.module}/files/aws.py"))}"
    requirements = "${base64sha256(file("${path.module}/files/requirements.txt"))}"
    temmplate    = "${base64sha256(data.template_file.info_file.rendered)}"
  }

  provisioner "local-exec" {
    command = "rm -rf ${path.module}/output || true"
  }

  provisioner "local-exec" {
    command = "mkdir ${path.module}/output"
  }

  provisioner "local-exec" {
    command = "echo $\"${data.template_file.info_file.rendered}\" > ${path.module}/output/settings.yml"
  }

  provisioner "local-exec" {
    command = "pip install -r ${path.module}/files/requirements.txt -t ${path.module}/output"
  }

  provisioner "local-exec" {
    command = "cp ${path.module}/files/* ${path.module}/output"
  }

  provisioner "local-exec" {
    command = "cd ${path.module}/output && zip -r lambda.zip ."
  }
}

resource "aws_lambda_function" "attach_lambda_function" {
  filename      = "${path.module}/output/lambda.zip"
  function_name = "${var.environment}-${var.role}"
  role          = "${aws_iam_role.lambda_ha_role.arn}"
  description   = "An AWS Lambda function for ${var.environment}-${var.role}"
  handler       = "update_public_info.handler"
  timeout       = "10"
  runtime       = "python2.7"

  depends_on = ["null_resource.prepare-lambda"]
}

It will install all Python dependencies to a local directory, update config file, zip everything together, and upload it to AWS.

Putting everything together

You can checkout my code locally or use it directly from github as a module like below.

data "template_file" "user_data" {
    template = <<-EOF
#!/bin/bash

echo "Hello World!"

    EOF
}

module "service" {
  source             = "github.com/gruzewski/terraform"
  ami_id             = "${var.ami_id}"
  availability_zones = "${var.availability_zones}"
  cname              = "service"
  environment        = "${var.environment}"
  instance_profile   = ""
  instance_type      = "t2.small"
  role               = "service"
  region             = "${var.region}"
  root_volume_size   = 8
  security_groups    = "${var.security_groups}"
  ssh_key_name       = "${var.ssh_key_name}"
  subnet_ids         = "${var.subnet_ids}"
  user_data          = "${data.template_file.user_data.rendered}"
  zone_id            = "${var.zone_id}"
}

Troubleshooting

I have found it very difficult to troubleshoot any problems related to CloudWatch and Lambda. It could fail on many levels, for example by creating a wrong event pattern in CloudWatch Rule which would prevent Lambda Function from being triggered. Below you can find some tips which helped me with getting everything working.

Start with very broad CloudWatch event pattern and IAM policy and narrow them down.
Print event in your code and use it to test your lambda function (by going to “Actions” -> “Configure test event” in AWS Lambda console).
Start troubleshooting from top to bottom (check monitoring in CloudWatch Rules and Lambda to see if your event was matched by the rule and lambda was executed, then move to Lambda logs).
Use logging.
Don’t give up :)

I hope you have enjoyed this article. There will be more interesting stuff coming soon!