Tagging EC2 instances created by AutoScaling Group with Lambda and Cloudwatch
When using autoscaling, AWS doesn’t apply tags to all resources. Here is a simple solution using Lambda and Cloudwatch events
Tags on AWS are a very useful tool, commonly used to encapsulate resources in groups. AWS provides some useful examples of tagging strategies, for this article we will consider our tag usage for cost allocation:
- Environment: We define the environment in which the resource is located (development, quality, etc.);
- Automation: We define whether the resource should be turned on and/or off at night or during the weekend, when our dev team is not coding and therefore is not needed;
- Role: We define a role, which is a name to the application or service that they reference (for example, everything related to balancing, such as AWS balancer, EC2 machines with NGINX, etc.) to identify all the AWS resources needed for a given application;
Calculating marginal costs: By dividing resources by role, you can calculate the remaining cost of each GB of storage, traffic, or each new content uploaded to the platform. AWS can provide cost reports using custom tags directly.
In this article, we will focus on the challenge and the benefits of using tags to perform marginal cost calculations.
Calculation of marginal cost is a very important element to control the growth of costs and revenues and it is therefore clear the importance of including within its calculation the full spectrum of costs that contribute to its definition. To calculate the marginal cost we use AWS reports enriched with custom tags, so it’s clear how being able to tag as much as possible is therefore of paramount importance, fortunately AWS allows you to tag almost all resources. For static resources, it’s fairly simple: tags are inserted during creation, which remain almost unchanged throughout the resource’s life cycle.
For dynamic resources, it is slightly more complex. In our architecture, a large part of the resources are used for containers. As explained in the previous article, we make extensive use of AutoScalingGroups to manage EC2 machines on which to run containers.
AWS doesn’t propagate tags to all resources automatically
Fortunately, AWS has already come up with a mechanism to propagate AutoScalingGroup tags to EC2 machines, unfortunately not all resources are tagged, EBS, Elastic Ip and Network interfaces are excluded, generating a “hole” in the calculation of the marginal cost. In the future, AWS will likely improve this and automatically propagate tags to newly discovered resources, but there is a possibility to overcome this gap in a very simple and low-cost way.
Why recommended solutions do not suit our needs
After searching the internet, and asking to AWS Support, we identified two solutions:
- In this thread, AWS suggests converting our Launch Configurations to EC2 launch templates and then to ASG launch template. This new template supports tagging EBS directly from the AutoScalingGroup, but does not provide a solution for Elastic Ip and Network interfaces.
- In the AWS support answer, they recommended creating a custom init script that tags all resources that are not tagged automatically (even in the previous thread they recommended it as a workaround). This approach works, but it forces you to add permissions to each EC2 for tag itself, plus you have to maintain the code used by the EC2 for each AutoScalingGroup, there isn’t a centralized place to change the configuration.
So we looked for an alternative way, using a very versatile tool, the Lambda functions and the Cloudwatch Events.
Each action on AWS is logged, you can create custom events on Cloudwatch Event based on specific patterns, such as when a new EC2 instance becomes running, below the event pattern snippet.
"EC2 Instance State-change Notification"
The Cloudwatch Event Rule then starts whenever any EC2 instance goes into the running state, sending the event to a Lambda function (the event is JSON-encoded), which has all the necessary information, including the instance-id of the newly owned instance passed to the running state. The following is an example of an event.
"detail-type": "EC2 Instance State-change Notification",
With this information, Lambda performs the following operations (in italics the AWS Actions used):
- Retrieve EC2 machine tags (DescribeInstances);
- Retrieve the volume, network interfaces, and elastic IP information of the EC2 machine (DescribeVolumes, DescribeNetworkInterfaces, DescribeAddresses);
- Check if there are tags on the instance as well;
- If tags are missing, tag the resource (CreateTags);
The only problem we encountered was that sometimes the lambda started before the AutoScalingGroup propagated the tags to the instance, it was a rare occurrence, but when it happened it made the lambda useless. We thus had to either wait for the instance to have tags, or implement a retry mechanism.
Lambdas, however, provide an integrated retry mechanism in case of failure or unmanaged exception, but it is not currently configurable if the lambda is invoked by a Cloudwatch event (with other services you can choose the maximum number of retries). At the time of writing, after the first failure, they will try twice more (with a 1-minute delay between tests, from my tests), which is good enough for our purposes.
With this solution we found an elegant way to tag all EC2 resources not automatically tagged, without having to put additional logic on the machines, and keeping permissions to a minimum, and limited to the lambda. it is easily extensible to any AWS resource that you can tag through the SDK. This was also a great testbed for testing the power of Cloudwatch Event Pattern, which in addition to Lambda allows you to easily extend the functionality of AWS.
We are quite satisfied with the workaround we implemented, it works fine and allows us to address the gap in tagging functionality that AWS has for the time being. We plan to remove this workaround as soon as AWS will allow propagating tags to all resource types created with autoscaling.