Trigger AWS Lambda with CloudWatch Alarm: Terraform — Event Source Mapping

Emir Mujic
5 min readNov 26, 2019

What is AWS Lambda?

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service — all with zero administration.

AWS Lambda as part of your automation toolkit

If your infrastructure is hosted on AWS, you may want to use AWS Lambda as part of your DevOps/SRE automation toolkit. In this article you will learn how to create simple automation using AWS services — CloudWatch Alarm, SNS, SQS and Lambda.

Lambda — source of automation

Let’s create automation that will remove EC2 from ELB, reboot it and after a timeout return it to ELB. And it’s all triggered by alert on CloudWatch. Simple, right?

#my_lambda.pyimport boto3
import os
import time
import logging
import json
## Prepare logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
## Initialize boto3
elb = boto3.client(‘elb’)
ec2 = boto3.client(‘ec2’)
## Load environment
elbID = os.environ[‘elbID’]
def restart_service(event, context):
logger.info(‘## SQS message arrived.’)
message = event[‘Records’][0][‘body’]
## Parse message to retrieve instanceID
message_json = json.loads(message)
instanceID = message_json[“Trigger”][“Dimensions”][0][“value”]
## Get instance name
instanceName = get_ec2_name(instanceID)
logger.info(‘## Instance: ‘ + instanceName)
## Remove ec2 from load balancer
logger.info(‘## Instance removed from LB — 5mins draining.’)
remove_from_load_balancer(instanceID)
## Wait for elb draining process 5mins
time.sleep(300)
## Proceed with restart
logger.info(‘## Instance reboot started — 5mins timeout.’)
ec2.reboot_instances(InstanceIds=[instanceID])
## Sleep 5mins after reboot
time.sleep(300)
## Return ec2 to load balancer
logger.info(‘## Instance added to LB.’)
add_to_load_balancer(instanceID)
## Health check, retry every 20s
while (check_instance_health(instanceID) != ‘InService’):
time.sleep(20)
logger.info(‘## All done, exiting.’)
return
def remove_from_load_balancer(instanceID):
elb.deregister_instances_from_load_balancer(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
return
def add_to_load_balancer(instanceID):
elb.register_instances_with_load_balancer(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
return
def check_instance_health(instanceID):
response = elb.describe_instance_health(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
return(response[“InstanceStates”][0][“State”])
def get_ec2_name(instanceID):
response = ec2.describe_instances(InstanceIds=[instanceID])
for tag in response[“Reservations”][0][“Instances”][0][“Tags”]:
if tag[‘Key’] == ‘Name’:
return tag[‘Value’]

Basic setup— Terraform

The Golden Standard for IaC is Terraform, so I won’t explain it deeply. Snippets are organized as a single module with scripts folder reserved for python script and dependencies.

Create SQS Queue, SNS Topic and Policy

In the first step let’s create SNS topic, SQS queue and attach a policy that allows topic to send messages to queue.

# Queue
resource "aws_sqs_queue" "my_lambda_restart_queue" {
name = "my_lambda_restart_queue"
max_message_size = 262144
message_retention_seconds = 600
receive_wait_time_seconds = 0
visibility_timeout_seconds = 2400
}
# Topic and Subscription
resource "aws_sns_topic" "sns_my_lambda_restart_topic" {
name = "my_lambda_restart"
}
resource "aws_sns_topic_subscription" "sns_my_lambda_restart_sub" {
topic_arn = "${aws_sns_topic.sns_my_lambda_restart_topic.arn}"
protocol = "sqs"
endpoint = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
raw_message_delivery = true
}
# Policy
data "template_file" "sqs-policy" {
template = "${file("${path.module}/sqs_policy.json")}"
vars = {
sqs_arn = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
sns_arn = "${aws_sns_topic.sns_my_lambda_restart_topic.arn}"
}
}
resource "aws_sqs_queue_policy" "my-lambda-restart-sqs-policy" {
queue_url = "${aws_sqs_queue.my_lambda_restart_queue.id}"
policy = "${data.template_file.sqs-policy.rendered}"
}
#sqs_polic.json{
“Version”: “2012–10–17”,
“Statement”: [
{
“Principal”: {
“AWS”: “*”
},
“Effect”: “Allow”,
“Action”: [
“SQS:SendMessage”
],
“Resource”: “${sqs_arn}”,
“Condition”: {
“ArnEquals”: {
“aws:SourceArn”: “${sns_arn}”
}
}
}
]
}

Create CloudWatch Alert

Now, create CloudWatch Alert for each instance added to the load balancer. Note that alarm_action needs to be SNS topic created in chapter above.

# Required data
data "aws_instances" "my_instances" {
filter {
name = "tag:group"
values = ["instances-tag"]
}
}
# Cloudwatch Alert
resource "aws_cloudwatch_metric_alarm" "high_network_transmit" {
count = "${length(data.aws_instances.my_instances.ids)}"
alarm_name = "${count.index} - Lambda Trigger"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 6
metric_name = "NetworkOut"
namespace = "AWS/EC2"
statistic = "Average"
period = 300
threshold = 100000000
treat_missing_data = "notBreaching"
alarm_actions=["${aws_sns_topic.sns_my_lambda_restart_topic.arn}"]
dimensions {
InstanceId = "${element(data.aws_instances.my_instances.ids, count.index)}"
}
}

Create AWS Role for Lambda

Lambda needs access to SQS, to be able to change the state of EC2’s and manipulate ELB. So let’s create IAM role for lambda and attach the right policies.

resource “aws_iam_role” “my-lambda-role” {
assume_role_policy = <<EOF
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “lambda.amazonaws.com”
},
“Action”: “sts:AssumeRole”
}
]
}
EOF
}
resource “aws_iam_policy” “my-lambda-sqs-policy” {
name = “my-lambda-sqs-policy”
description = “Allow lambda function to read sqs message.”
policy = “${data.template_file.sqs-receive-policy.rendered}”
}
resource “aws_iam_policy” “my-lambda-ec2-policy” {
name = “my-lambda-ec2-policy”
description = “Allow lambda function to manipulate ec2 instances.”
policy = “${data.template_file.ec2-policy.rendered}”
}
resource “aws_iam_policy” “my-lambda-elb-policy” {
name = “my-lambda-elb-policy”
description = “Allow lambda function to manipulate elb.”
policy = “${data.template_file.elb-logs-policy.rendered}”
}
data “template_file” “sqs-receive-policy” {
template = “${file(“${path.module}/sqs-receive-policy.json”)}”
vars = {
sqs_arn = “${aws_sqs_queue.my_lambda_restart_queue.arn}”
}
}
data “template_file” “ec2-policy” {
template = “${file(“${path.module}/ec2-policy.json”)}”
}
data “template_file” “elb-policy” {
template = “${file(“${path.module}/elb-policy.json”)}”
}
resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-receive-sqs” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-sqs-policy.arn}”
}
resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-ec2” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-ec2-policy.arn}”
}
resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-elb” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-elb-policy.arn}”
}
data "aws_elb" "my-elb" {
name = "my-elb"}"
}
#ec2-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:StopInstances",
"ec2:StartInstances",
"ec2:RebootInstances"
],
"Resource": "*"
}
]
}
#sqs-receive-policy.json{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:ChangeMessageVisibility",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes"
],
"Resource": "${sqs_arn}"
}
]
}
#elb-logs-policy.json{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DescribeInstanceHealth", "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}

Create Lambda

To ensure we have at least one instance in ELB we need to set reserved_concurrent_executions to value = 1 to prevent running concurrent lambdas. Also have in mind that max allowed execution timeout for AWS lambda is 15 minutes.

data "archive_file" "source" {
type = "zip"
source_dir = "${path.module}/scripts/"
output_path = "${path.module}/lambda_function.zip"
}
resource "aws_lambda_function" "my-lambda-restart" {
filename = "${substr(data.archive_file.source.output_path, length(path.cwd) + 1, -1)}"
function_name = "my_lambda"
description = "Restarts service when notified by CloudWatch Alert"
role = "${aws_iam_role.my-lambda-role.arn}"
handler = "lambda_function.restart_service"
source_code_hash = "${data.archive_file.source.output_base64sha256}"
runtime = "python3.6"
reserved_concurrent_executions = 1
timeout = 900
environment {
variables = {
elbID = "${data.aws_elb.my-elb.name}"
}
}
}

Create Lambda event — trigger

Last step is to create aws_lambda_event_source_mapping resource that will create event which will invoke lambda on every SQS message in the queue.

resource "aws_lambda_event_source_mapping" "my_lambda_restart_mapping" {
batch_size = 1
event_source_arn = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
enabled = true
function_name = "${aws_lambda_function.my-lambda-restart.arn}"
}

Now we just need to apply new infrastructure to state and automation is ready to go. Also you can run terraform fmt to format *.tf files if you plan to use snippets above.

In this article I used boto3 AWS SDK for python for EC2 and ELB manipulation. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services and it’s a great tool for automation.

--

--