Trigger AWS Lambda with CloudWatch Alarm: Terraform — Event Source Mapping
What is AWS Lambda?
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume — there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service — all with zero administration.
AWS Lambda as part of your automation toolkit
If your infrastructure is hosted on AWS, you may want to use AWS Lambda as part of your DevOps/SRE automation toolkit. In this article you will learn how to create simple automation using AWS services — CloudWatch Alarm, SNS, SQS and Lambda.
Lambda — source of automation
Let’s create automation that will remove EC2 from ELB, reboot it and after a timeout return it to ELB. And it’s all triggered by alert on CloudWatch. Simple, right?
#my_lambda.pyimport boto3
import os
import time
import logging
import json## Prepare logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)## Initialize boto3
elb = boto3.client(‘elb’)
ec2 = boto3.client(‘ec2’)## Load environment
elbID = os.environ[‘elbID’]def restart_service(event, context):
logger.info(‘## SQS message arrived.’)
message = event[‘Records’][0][‘body’] ## Parse message to retrieve instanceID
message_json = json.loads(message)
instanceID = message_json[“Trigger”][“Dimensions”][0][“value”] ## Get instance name
instanceName = get_ec2_name(instanceID)
logger.info(‘## Instance: ‘ + instanceName) ## Remove ec2 from load balancer
logger.info(‘## Instance removed from LB — 5mins draining.’)
remove_from_load_balancer(instanceID) ## Wait for elb draining process 5mins
time.sleep(300) ## Proceed with restart
logger.info(‘## Instance reboot started — 5mins timeout.’)
ec2.reboot_instances(InstanceIds=[instanceID]) ## Sleep 5mins after reboot
time.sleep(300) ## Return ec2 to load balancer
logger.info(‘## Instance added to LB.’)
add_to_load_balancer(instanceID) ## Health check, retry every 20s
while (check_instance_health(instanceID) != ‘InService’):
time.sleep(20)
logger.info(‘## All done, exiting.’)
returndef remove_from_load_balancer(instanceID):
elb.deregister_instances_from_load_balancer(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
returndef add_to_load_balancer(instanceID):
elb.register_instances_with_load_balancer(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
returndef check_instance_health(instanceID):
response = elb.describe_instance_health(
LoadBalancerName=elbID, Instances=[{‘InstanceId’: instanceID}])
return(response[“InstanceStates”][0][“State”])def get_ec2_name(instanceID):
response = ec2.describe_instances(InstanceIds=[instanceID])
for tag in response[“Reservations”][0][“Instances”][0][“Tags”]:
if tag[‘Key’] == ‘Name’:
return tag[‘Value’]
Basic setup— Terraform
The Golden Standard for IaC is Terraform, so I won’t explain it deeply. Snippets are organized as a single module with scripts
folder reserved for python script and dependencies.
Create SQS Queue, SNS Topic and Policy
In the first step let’s create SNS topic, SQS queue and attach a policy that allows topic to send messages to queue.
# Queue
resource "aws_sqs_queue" "my_lambda_restart_queue" {
name = "my_lambda_restart_queue"
max_message_size = 262144
message_retention_seconds = 600
receive_wait_time_seconds = 0
visibility_timeout_seconds = 2400
}# Topic and Subscription
resource "aws_sns_topic" "sns_my_lambda_restart_topic" {
name = "my_lambda_restart"
}resource "aws_sns_topic_subscription" "sns_my_lambda_restart_sub" {
topic_arn = "${aws_sns_topic.sns_my_lambda_restart_topic.arn}"
protocol = "sqs"
endpoint = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
raw_message_delivery = true
}# Policy
data "template_file" "sqs-policy" {
template = "${file("${path.module}/sqs_policy.json")}"
vars = {
sqs_arn = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
sns_arn = "${aws_sns_topic.sns_my_lambda_restart_topic.arn}"
}
}resource "aws_sqs_queue_policy" "my-lambda-restart-sqs-policy" {
queue_url = "${aws_sqs_queue.my_lambda_restart_queue.id}"
policy = "${data.template_file.sqs-policy.rendered}"
}
#sqs_polic.json{
“Version”: “2012–10–17”,
“Statement”: [
{
“Principal”: {
“AWS”: “*”
},
“Effect”: “Allow”,
“Action”: [
“SQS:SendMessage”
],
“Resource”: “${sqs_arn}”,
“Condition”: {
“ArnEquals”: {
“aws:SourceArn”: “${sns_arn}”
}
}
}
]
}
Create CloudWatch Alert
Now, create CloudWatch Alert for each instance added to the load balancer. Note that alarm_action
needs to be SNS topic created in chapter above.
# Required data
data "aws_instances" "my_instances" {
filter {
name = "tag:group"
values = ["instances-tag"]
}
}# Cloudwatch Alert
resource "aws_cloudwatch_metric_alarm" "high_network_transmit" {
count = "${length(data.aws_instances.my_instances.ids)}"
alarm_name = "${count.index} - Lambda Trigger"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 6
metric_name = "NetworkOut"
namespace = "AWS/EC2"
statistic = "Average"
period = 300
threshold = 100000000
treat_missing_data = "notBreaching"
alarm_actions=["${aws_sns_topic.sns_my_lambda_restart_topic.arn}"]
dimensions {
InstanceId = "${element(data.aws_instances.my_instances.ids, count.index)}"
}
}
Create AWS Role for Lambda
Lambda needs access to SQS, to be able to change the state of EC2’s and manipulate ELB. So let’s create IAM role for lambda and attach the right policies.
resource “aws_iam_role” “my-lambda-role” {
assume_role_policy = <<EOF
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “lambda.amazonaws.com”
},
“Action”: “sts:AssumeRole”
}
]
}
EOF
}resource “aws_iam_policy” “my-lambda-sqs-policy” {
name = “my-lambda-sqs-policy”
description = “Allow lambda function to read sqs message.”
policy = “${data.template_file.sqs-receive-policy.rendered}”
}resource “aws_iam_policy” “my-lambda-ec2-policy” {
name = “my-lambda-ec2-policy”
description = “Allow lambda function to manipulate ec2 instances.”
policy = “${data.template_file.ec2-policy.rendered}”
}resource “aws_iam_policy” “my-lambda-elb-policy” {
name = “my-lambda-elb-policy”
description = “Allow lambda function to manipulate elb.”
policy = “${data.template_file.elb-logs-policy.rendered}”
}data “template_file” “sqs-receive-policy” {
template = “${file(“${path.module}/sqs-receive-policy.json”)}”
vars = {
sqs_arn = “${aws_sqs_queue.my_lambda_restart_queue.arn}”
}
}data “template_file” “ec2-policy” {
template = “${file(“${path.module}/ec2-policy.json”)}”
}data “template_file” “elb-policy” {
template = “${file(“${path.module}/elb-policy.json”)}”
}resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-receive-sqs” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-sqs-policy.arn}”
}resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-ec2” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-ec2-policy.arn}”
}resource “aws_iam_role_policy_attachment” “role-policy-attachment-lambda-elb” {
role = “${aws_iam_role.my-lambda-role.name}”
policy_arn = “${aws_iam_policy.my-lambda-elb-policy.arn}”
}data "aws_elb" "my-elb" {
name = "my-elb"}"
}
#ec2-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:StopInstances",
"ec2:StartInstances",
"ec2:RebootInstances"
],
"Resource": "*"
}
]
}
#sqs-receive-policy.json{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:ChangeMessageVisibility",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes"
],
"Resource": "${sqs_arn}"
}
]
}
#elb-logs-policy.json{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DescribeInstanceHealth", "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
Create Lambda
To ensure we have at least one instance in ELB we need to set reserved_concurrent_executions to value = 1
to prevent running concurrent lambdas. Also have in mind that max allowed execution timeout for AWS lambda is 15 minutes.
data "archive_file" "source" {
type = "zip"
source_dir = "${path.module}/scripts/"
output_path = "${path.module}/lambda_function.zip"
}resource "aws_lambda_function" "my-lambda-restart" {
filename = "${substr(data.archive_file.source.output_path, length(path.cwd) + 1, -1)}"
function_name = "my_lambda"
description = "Restarts service when notified by CloudWatch Alert"
role = "${aws_iam_role.my-lambda-role.arn}"
handler = "lambda_function.restart_service"
source_code_hash = "${data.archive_file.source.output_base64sha256}"
runtime = "python3.6"
reserved_concurrent_executions = 1
timeout = 900environment {
variables = {
elbID = "${data.aws_elb.my-elb.name}"
}
}
}
Create Lambda event — trigger
Last step is to create aws_lambda_event_source_mapping
resource that will create event which will invoke lambda on every SQS message in the queue.
resource "aws_lambda_event_source_mapping" "my_lambda_restart_mapping" {
batch_size = 1
event_source_arn = "${aws_sqs_queue.my_lambda_restart_queue.arn}"
enabled = true
function_name = "${aws_lambda_function.my-lambda-restart.arn}"
}
Now we just need to apply new infrastructure to state and automation is ready to go. Also you can run terraform fmt
to format *.tf
files if you plan to use snippets above.
In this article I used boto3 AWS SDK for python for EC2 and ELB manipulation. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services and it’s a great tool for automation.