Auto-Pilot Alarms: Automating CloudWatch Alarms with Lambda and Terraform

Published in

Globant

6 min readMar 5, 2024

In the vast and dynamic realm of cloud computing, proactive monitoring is not just a best practice — it’s a cornerstone of operational excellence. Amazon Web Services (AWS) offers an expansive suite of tools designed to empower organizations to build, deploy, and manage applications with unparalleled agility. Yet, amidst this flexibility, there lies a critical need to vigilantly oversee the performance and health of AWS resources, such as EC2 instances, RDS databases, and Lambda functions. AWS CloudWatch alarms stand as sentinels, providing real-time alerts that can help avert potential disasters before they escalate. By setting up CloudWatch alarms, teams ensure they are immediately notified of any anomalies in their environment, enabling swift action to maintain service availability, optimize resource utilization, and uphold the user experience.

However, as any seasoned AWS architect would attest, configuring alarms for a myriad of resources can be a tedious and error-prone endeavor. The default processes (using the AWS Console or the AWS CLI) require a granular, manual setup for each alarm, which becomes impractical when dealing with large-scale infrastructures or aiming for consistent alarm configurations across different environments. This is where the power of automation comes into play — specifically through the use of AWS Lambda functions. By leveraging serverless computing to automate the creation and management of CloudWatch alarms, organizations can encapsulate best monitoring practices into reproducible, scalable, and maintainable patterns. In this article, we will delve into the intricacies of using Lambda functions to simplify and streamline the orchestration of CloudWatch alarms, ensuring your AWS landscape remains under a watchful eye with minimal manual overhead.

Building on the innovative approach outlined in “Use tags to create and maintain Amazon CloudWatch alarms for Amazon EC2 instances (Part 1)” by Khurram Nizami, this article diverges from the use of AWS CloudFormation in favor of Terraform. The decision to adopt Terraform is strategic, catering to the preferences and operational standards of clients who manage their AWS resources with this powerful tool.

Prerequisites

cf2tf serves as a command-line interface (CLI) tool designed to transform CloudFormation templates into Terraform configuration files. The tool aims to facilitate this conversion, yet it’s important to acknowledge that achieving complete accuracy in this process remains a challenge at present. This is largely due to the complexity involved in accurately mapping CloudFormation template sections values to their equivalent representations in Terraform’s HashiCorp Configuration Language (HCL). As a result, while cf2tf strives to provide a reliable conversion, there are inherent limitations in its ability to deliver a perfect one-to-one translation.

With cf2tf, we can transform the YAML files supplied by AWS into Terraform configuration files. These generated files can then be meticulously examined and fine-tuned to align with our specific requirements. Considering the limited number of resources required to establish our operational environment, a manual review and translation of each YAML file into HashiCorp Configuration Language (HCL) is plausible but would be an extensive and potentially laborious task. Therefore, to optimize our workflow and save valuable time, I recommend utilizing cf2tf as an efficient starting point for our conversion process. With cf2tf, we can accelerate the initial stage of adapting AWS-provided templates into a Terraform-friendly format, allowing us to focus on refining and customizing the configurations to best suit our deployment strategy.

Important: cf2tf uses the latest version of the Terraform AWS Provider, so if you are using an outdated version, you would need to make other adjustments in the code.

Architecture

The following diagram shows an overview of the architecture of the solution. It consists of Amazon EventBridge Events (formerly called CloudWatch Events) to know if an instance is Started/Terminated and is tagged with the key we defined (ALARM_TAG). These events will trigger the Lambda function to create/delete the corresponding CloudWatch Alarms using the threshold values defined in the other environment variables:

The Amazon SNS Topic will notify when an alarm goes off:

Description of the architecture. Credit: https://github.com/aws-samples/amazon-cloudwatch-auto-alarms

Upon the transition of an EC2 instance into the running state, the Initiate-CloudWatchAutoAlarms Amazon CloudWatch Events rule promptly activates, instigating a call to the CloudWatchAutoAlarms Lambda function. This Lambda function then proceeds to retrieve details about the newly launched instance, specifically scanning for the presence of a tag key dubbed Create_Auto_Alarms. Should this tag key be detected, the function automatically proceeds to establish a set of default alarms, calibrated to the predefined thresholds specified within the environment variables. This streamlined process ensures that monitoring is proactively configured without requiring manual intervention, thus maintaining a robust, responsive infrastructure.

As an EC2 instance configured with automatic alarms is terminated, the implemented solution reacts by autonomously removing any alarms linked to that specific instance. This automated cleanup process ensures that the monitoring system remains free from redundant alarms, thereby maintaining an organized and efficient alerting framework tailored only to active resources.

Step-by-Step Guide

For this article, we will operate under the assumption that the Simple Notification Service (SNS) has already been created, and we will not incorporate the use of an S3 bucket into our setup. To do the setup, follow these steps:

Clone the GitHub repository amazon-cloudwatch-auto-alarms:

git clone https://github.com/aws-samples/amazon-cloudwatch-auto-alarms
cd amazon-cloudwatch-auto-alarms

2. Optional: Create a Python virtual environment for cf2tfinstallation:

python -m venv cf2tf-env
source cf2tf-env/bin/activate

3. Install the cf2tf tool:

pip install cf2tf

4. Convert the file CloudWatchAutoAlarms.yaml into Terraform code:

cf2tf CloudWatchAutoAlarms.yaml > CloudWatchAutoAlarms.tf

As mentioned before, cf2tf does not generate, usually, a usable code, so we need to make some fixes. However, we have saved some time in the translation.

5. We can get rid of all the inserted comments, like:

// CF Property(Targets) = [
//  {
//    Arn = aws_lambda_function.lambda_function.arn
//    Id = "LATEST"
//  }
// ]

6. Inside the aws_cloudwatch_event_rule resources, we need to escape the double quotes in the event_pattern argument:

event_pattern = "{ \"source\": [ \"aws.ec2\" ], \"detail-type\": [ \"EC2 Instance State-change Notification\" ], \"detail\": { \"state\": [ \"running\", \"terminated\" ] } }"

7. As we are not using the S3 bucket to store the Python Code, we need to create an archive_file. hen use the source_code_hash inside our aws_lambda_function, and remove the variables s3_deployment_bucket and s3_deployment_key:

data "archive_file" "lambda-file-zip" {
  type        = "zip"
  source_dir  = "./src"
  output_path = "cw-lambda-alerts.zip"
}
resource "aws_lambda_function" "lambda_function" {
  
  // Remove the code_signing_config_arn block and add:
  filename         = "cw-lambda-alerts.zip"
  source_code_hash = data.archive_file.lambda-file-zip.output_base64sha256
}

8. We need to make some adjustments to the aws_iam_role resources and for clarity, add aws_iam_policy_document to each policy and an aws_iam_role_policy_attachment.

9. You can now execute a Terraform plan and apply.

10. Create an EC2 instance with the tag key: Create_Auto_Alarms

For my use case, I modified the Python code so that instead of searching for the Create_Auto_Alarms tag key, it considers only a specific tag key and value that all the client’s generated EC2 instances have. And because the scope was to only monitor EC2 instances, I removed the aws_cloudwatch_event_rule for Lambda and RDS resources.

Conclusion

The cf2tf tool is an invaluable asset for developers and infrastructure engineers seeking to bridge the gap between AWS CloudFormation and HashiCorp’s Terraform. While not flawless, cf2tf streamlines the translation process of infrastructure code, catalyzing adopting Terraform’s robust ecosystem, particularly when transitioning to or integrating with multi-cloud environments or seeking the benefits of Terraform’s state management and modular approach.

The deployment of CloudWatch alarms as part of a comprehensive monitoring strategy is crucial for the early detection of irregular behavior and the prevention of potential system failures. However, it’s important to recognize that while these alarms are essential for maintaining system health, they also incur costs. AWS charges for the operational use of CloudWatch alarms, so businesses must balance the need for detailed monitoring with the associated expenses. Efficient cost management can be achieved by carefully planning alarm conditions and avoiding unnecessary granularity that leads to excessive charges.

References

Nizami, K. (2021) Use tags to create and maintain Amazon CloudWatch alarms for Amazon EC2 instances (Part 1).
GitHub Repo aws-samples/amazon-cloudwatch-auto-alarms.
CloudWatch Pricing.
GitHub Repo DontShaveTheYak/cf2tf.