Cost Optimization for AWS CloudWatch

Published in

awsblackbelt

9 min readMay 31, 2023

AWS CloudWatch is a monitoring service that helps you track, store, and analyze metrics, logs, and events from your AWS resources and applications. It provides real-time insights into the health and performance of your infrastructure and applications, allowing you to quickly identify and troubleshoot issues, and optimize resource utilization.

While CloudWatch offers a wide range of features and capabilities, it is important to optimize its usage to ensure efficient resource allocation and cost optimization. In this article, we will discuss the best practices for optimizing CloudWatch costs, including how to calculate costs, use CloudWatch metrics and logs effectively, and configure alarms and notifications for optimal resource management.

Calculating CloudWatch Costs

To optimize CloudWatch costs, it is important to understand how pricing is calculated. CloudWatch pricing is based on several factors, including the number of metrics, logs, and alarms you create, the frequency of data points collected, and the retention period of logs and metrics.

Metrics

CloudWatch pricing for metrics is based on the number of metrics you create and the frequency of data points collected. CloudWatch metrics are charged based on the number of metrics created per month, and the number of data points collected per metric. For example, if you create 10 custom metrics and collect data points every minute, you will be charged for 10 x 60 x 24 x 30 = 432,000 metrics per month.

Logs

CloudWatch pricing for logs is based on the volume of ingested data, the number of CloudWatch Log Streams, and the retention period of the logs. CloudWatch charges for each GB of ingested data and each GB of data scanned by queries. Additionally, CloudWatch charges for each active Log Stream and for storing logs beyond the free tier retention period. For example, if you ingest 100 GB of log data per month, scan 50 GB of data with queries, and store logs for 90 days, your monthly charges would be:

Ingestion charges: 100 GB x $0.50/GB = $50 Query charges: 50 GB x $0.001/GB = $0.05 Active Log Stream charges: 1 x $0.50/active Log Stream = $0.50 Storage charges: 100 GB x ($0.03/GB/month x 3 months) = $9

Total charges: $50 + $0.05 + $0.50 + $9 = $59.55

Alarms and Notifications

CloudWatch pricing for alarms and notifications is based on the number of alarms you create, the number of notifications you receive, and the frequency of the notifications. CloudWatch charges for each alarm you create and each notification sent to an Amazon SNS topic, Amazon SQS queue, or AWS Lambda function. For example, if you create 10 alarms that send notifications every hour to an SNS topic, you will be charged for 10 x 24 x 30 = 7,200 alarms per month and 7,200 x $0.10/notification = $720/month for SNS notifications.

Best Practices for Cost Optimization

Now that we understand how CloudWatch pricing is calculated, let’s discuss some best practices for optimizing CloudWatch costs.

Use CloudWatch Metrics Effectively

Metrics are the backbone of CloudWatch and provide valuable insights into the health and performance of your infrastructure and applications. To optimize CloudWatch costs, it is important to use metrics effectively and avoid creating unnecessary metrics.

Here are some tips for using CloudWatch metrics effectively:

Use AWS-provided metrics: AWS automatically provides metrics for many AWS services, such as EC2, RDS, and Lambda. These metrics are pre-configured and do not count towards your custom metric quota.
Use dimensions: Dimensions are key-value pairs that help you categorize and filter metrics. By using dimensions, you can create more granular and specific metrics, which can help you better understand the behavior of your infrastructure and applications. However, it is important to use dimensions judiciously and avoid creating too many dimensions, as this can increase costs.
Use high-resolution metrics sparingly: CloudWatch allows you to collect metrics at a higher resolution than the default of 1-minute intervals. However, high-resolution metrics can quickly add up and increase costs. Only use high-resolution metrics when necessary, and consider using CloudWatch Logs Insights for detailed analysis.
Use CloudWatch APIs to optimize metric publishing: CloudWatch APIs allow you to publish metrics more efficiently by sending multiple metrics in a single API call. This can help reduce costs by reducing the number of API calls and the associated charges.

Use CloudWatch Logs Effectively

Logs are an important part of CloudWatch and provide valuable insights into the behavior of your applications and infrastructure. However, it is important to use logs effectively and avoid ingesting unnecessary log data.

Here are some tips for using CloudWatch logs effectively:

Use log levels: Log levels allow you to categorize log messages by severity, such as INFO, WARN, and ERROR. By using log levels, you can filter out less important log messages and reduce the volume of data ingested into CloudWatch.
Use log patterns: Log patterns allow you to filter and extract specific information from log messages. By using log patterns, you can create more granular and specific log data, which can help you better understand the behavior of your applications and infrastructure.
Use CloudWatch Logs Insights for detailed analysis: CloudWatch Logs Insights is a powerful tool that allows you to analyze log data using SQL-like queries. By using Logs Insights, you can perform detailed analysis of your log data, which can help you identify and troubleshoot issues more efficiently.

Configure Alarms and Notifications Carefully

Alarms and notifications are important for ensuring the health and performance of your infrastructure and applications. However, it is important to configure alarms and notifications carefully and avoid creating unnecessary alarms.

Here are some tips for configuring alarms and notifications carefully:

Use composite alarms: Composite alarms allow you to create alarms that are triggered based on multiple metrics or conditions. By using composite alarms, you can reduce the number of alarms and notifications you receive, which can help reduce costs.
Use Amazon SNS filters: Amazon SNS filters allow you to filter out unnecessary notifications based on the content of the message. By using SNS filters, you can reduce the number of notifications you receive and avoid unnecessary charges.
Use CloudWatch Events for automation: CloudWatch Events allow you to automate tasks based on events in your AWS resources. By using CloudWatch Events, you can automate tasks such as scaling EC2 instances, stopping and starting instances, and triggering Lambda functions. This can help reduce costs by optimizing resource utilization.

Code Optimization Examples

In addition to following best practices for optimizing CloudWatch costs, it is also important to optimize your code to reduce the volume of data ingested into CloudWatch. Here are some tips for optimizing your code:

Use CloudWatch Logs Agent for log ingestion: CloudWatch Logs Agent is a lightweight agent that can be installed on your EC2 instances to collect and send log data to CloudWatch. By using the CloudWatch Logs Agent, you can reduce the volume of data ingested into CloudWatch and improve the efficiency of log ingestion.
Use log levels and log patterns: As mentioned earlier, using log levels and log patterns can help you reduce the volume of data ingested into CloudWatch by filtering out less important log messages and creating more granular log data.
Use batched writes for metric ingestion: As mentioned earlier, using CloudWatch APIs to publish metrics more efficiently can help reduce costs. One way to do this is to use batched writes, which allow you to send multiple metrics in a single API call. By using batched writes, you can reduce the number of API calls and the associated charges.

Batched writes to publish metrics in Python

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch')

metrics_data = [
    {
        'MetricName': 'CPUUtilization',
        'Dimensions': [
            {
                'Name': 'InstanceId',
                'Value': 'i-0123456789abcdef0'
            },
        ],
        'Timestamp': datetime.now(),
        'Value': 60,
        'Unit': 'Percent'
    },
    {
        'MetricName': 'DiskSpaceUtilization',
        'Dimensions': [
            {
                'Name': 'InstanceId',
                'Value': 'i-0123456789abcdef0'
            },
        ],
        'Timestamp': datetime.now(),
        'Value': 70,
        'Unit': 'Percent'
    },
]

cloudwatch.put_metric_data(
    Namespace='Custom',
    MetricData=metrics_data
)

In this example, we are publishing two custom metrics (‘CPUUtilization’ and ‘DiskSpaceUtilization’) for an EC2 instance with the ID ‘i-0123456789abcdef0’. We are using the put_metric_data API to publish the metrics in a single API call.

Filter and Aggregate Metrics using AWS Lambda

import boto3
import time

def lambda_handler(event, context):
    cloudwatch = boto3.client('cloudwatch')
    metric_data = []

    # Collect, process, and aggregate data here
    aggregated_data = process_and_aggregate_data(event)

    for data_point in aggregated_data:
        metric_data.append({
            'MetricName': data_point['metric_name'],
            'Dimensions': data_point['dimensions'],
            'Timestamp': time.time(),
            'Value': data_point['value'],
            'Unit': data_point['unit']
        })

    # Send aggregated data to CloudWatch
    response = cloudwatch.put_metric_data(
        Namespace='YourCustomNamespace',
        MetricData=metric_data
    )

def process_and_aggregate_data(event):
    # Your custom data processing and aggregation logic here
    pass

This AWS Lambda function collects, processes, and aggregates data before sending it to CloudWatch. Customize the process_and_aggregate_data function to handle your specific data.

Set Log Retention Periods using AWS CDK

Use the AWS Cloud Development Kit (CDK) to create a CloudWatch Log Group with a specified retention period:

from aws_cdk import (
    aws_logs as logs,
    core
)

class LogRetentionExampleStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # Create a CloudWatch Log Group with a retention period of 14 days
        logs.LogGroup(self, "MyLogGroup",
            log_group_name="my-log-group",
            retention=logs.RetentionDays.TWO_WEEKS
        )

app = core.App()
LogRetentionExampleStack(app, "LogRetentionExampleStack")
app.synth()

This CDK script creates a CloudWatch Log Group named “my-log-group” with a retention period of 14 days.

Efficient CloudWatch Alarm Configuration

Create alarms for essential metrics only and avoid overly sensitive thresholds. Here’s an example of a well-configured alarm using the AWS CDK:

from aws_cdk import (
    aws_cloudwatch as cloudwatch,
    aws_ec2 as ec2,
    aws_sns as sns,
    core
)

class CloudWatchAlarmExampleStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # Create an SNS topic
        topic = sns.Topic(self, "MyTopic")

        # Create an EC2 instance
        instance = ec2.Instance(self, "MyInstance",
            instance_type=ec2.InstanceType("t2.micro"),
            machine_image=ec2.MachineImage.latest_amazon_linux()
        )

        # Create a CloudWatch alarm for CPU utilization
        cloudwatch.Alarm(self, "CPUUtilizationHigh",
            alarm_description="Alarm when CPU utilization is above 80%",
            alarm_name="CPUUtilizationHigh",
            comparison_operator=cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
            evaluation_periods=1,
            metric=instance.metric_cpu_utilization(),
            period=core.Duration.minutes(5),
            threshold=80,
            alarm_action=topic
        )

app = core.App()
CloudWatchAlarmExampleStack(app, "CloudWatchAlarmExampleStack")
app.synth()

In this example, a CloudWatch alarm is created for an EC2 instance’s CPU utilization, with a threshold of 80%. The alarm will trigger only when the CPU utilization is greater than or equal to 80% for a single evaluation period of 5 minutes (300 seconds). The AWS CDK is used to create the required resources and configure the alarm efficiently.

Reduce Data Ingestion Costs with Amazon Kinesis Data Firehose

Use Amazon Kinesis Data Firehose to batch, compress, and stream data to CloudWatch Logs, reducing ingestion costs:

import boto3
import gzip
import json

kinesis = boto3.client('firehose')

def lambda_handler(event, context):
    # Process data and convert to JSON
    log_data = process_data(event)
    json_data = json.dumps(log_data)
    compressed_data = compress_data(json_data)

    # Send data to Kinesis Data Firehose
    response = kinesis.put_record(
        DeliveryStreamName='your-delivery-stream-name',
        Record={
            'Data': compressed_data
        }
    )

def process_data(event):
    # Your custom data processing logic here
    pass

def compress_data(json_data):
    compressed = gzip.compress(json_data.encode('utf-8'))
    return compressed

This AWS Lambda function processes, converts to JSON, compresses, and sends data to Amazon Kinesis Data Firehose, which then streams the data to CloudWatch Logs. Customize the process_data function to handle your specific data.

Optimize CloudWatch Event Rules with AWS CDK

Create CloudWatch Event Rules that filter and route events to reduce the number of events processed and lower costs:

from aws_cdk import (
    aws_events as events,
    aws_lambda as lambda_,
    core
)

class CloudWatchEventRuleExampleStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # Create a Lambda function
        lambda_function = lambda_.Function(self, "MyLambdaFunction",
            runtime=lambda_.Runtime.PYTHON_3_8,
            handler="lambda_handler.handler",
            code=lambda_.Code.from_asset("path/to/lambda/code")
        )

        # Create a CloudWatch Event Rule with a specific event pattern
        events.Rule(self, "MyEventRule",
            description="Example rule that filters and routes events",
            event_pattern=events.EventPattern(
                source=["aws.ec2"],
                detail_type=["EC2 Instance State-change Notification"],
                detail={
                    "state": ["running"]
                }
            ),
            targets=[events.Targets.LambdaFunction(lambda_function)]
        )

app = core.App()
CloudWatchEventRuleExampleStack(app, "CloudWatchEventRuleExampleStack")
app.synth()

In this example, a CloudWatch Event Rule is created using the AWS CDK, filtering for specific EC2 instance state-change notifications (only those transitioning to the “running” state). The filtered events are then routed to an AWS Lambda function.

Summary

Optimizing CloudWatch costs is an important part of managing your AWS resources efficiently. By following best practices for using CloudWatch metrics and logs effectively, configuring alarms and notifications carefully, and optimizing your code, you can reduce costs and improve the efficiency of your infrastructure and applications. Remember to regularly review your CloudWatch usage and adjust your configurations as necessary to ensure cost optimization.

About the Author:

My name is Sven Leiss and I am an 5x certified AWS enthusiast and AWS Migration Blackbelt. I have been working in the AWS space for the past 7 years and have extensive knowledge of the AWS platform and its various services. I am passionate about helping customers get the most out of the cloud and have a great track record of successful implementations.

I have extensive experience in designing and implementing cloud architectures using AWS services such as EC2, S3, Lambda and more. I am also well versed in DevOps and AWS cloud migration journeys.

If you are looking for an experienced AWS expert, I would be more than happy to help. Feel free to contact me to discuss your cloud needs and see how I can help you get the most out of the cloud.