Unlocking the Power of AWS CloudWatch: A Guide to Sending Custom Metrics using Python

AWS CloudWatch Custom Metrics With a Real-world Example for Docker Containers

Mujahed Altahleh

Published in

DevOpsars

8 min readJan 21, 2023

Introduction

Effective monitoring is crucial for any infrastructure, and with the growing popularity of containerization and microservices, monitoring these services has become increasingly important. Custom solutions may be required to achieve comprehensive monitoring. This article will demonstrate how to send custom metrics to AWS CloudWatch using boto3 and Python, using real-world examples from monitoring Docker containers.

It’s also important to note that sending metrics to CloudWatch may incur additional costs, so it’s important to consider this when implementing this solution in your environment.

Prerequisites

Before diving into the details, setting up the necessary prerequisites is important. First, we must have Python 3, libraries ( boto3 and docker), and aws-cli installed and configured.

Next, we need to have some Docker containers up and running. If you do not already have these services set up, you can check the code repo for the docker-compose I used to implement this tutorial.

The code is written in Python 3, so basic knowledge of coding is a must

Getting The Metrics from The Docker Environment

Docker provides a rich API for getting metrics on container usage. Using the Docker SDK (a python module), we can easily retrieve CPU, memory, and metrics for all your containers. This section will show how to use the docker python library to get these metrics.

Retrieving metrics from Docker requires using the stats() method on the containers object from the docker client. The following code snippet serves as an example of this process. The implementation section will feature a more detailed example

import docker
client = docker.from_env()
container = client.containers.get("container_name")
metrics = container.stats(stream=False)
cpu_usage = metrics["cpu_stats"]["cpu_usage"]["total_usage"]

Similarly, we can use the same approach to get the container’s memory and network usage metrics. Once we have the metrics, we can use boto3’s put_metric_data() method to send the data to CloudWatch. Here is a sample code as well.

import boto3

cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
    Namespace='DockerMetrics',
    MetricData=[
        {
            'MetricName': 'CPUUsage',
            'Dimensions': [
                {
                    'Name': 'ContainerName',
                    'Value': 'container_name'
                },
            ],
            'Value': cpu_usage,
            'Unit': 'Count'
        },
    ]
)

Now that we understand the basic building blocks of our solution let’s get deeper into the implementation details.

Implementation steps

Install AWS CLI; more details on installing and configuring it are available here in AWS's official documentation.
Run some containers in our working environment.
Install python modules by running pip install docker boto3.
The next step is to write and run the code, so let’s go through it

First, as we described above, we need to define our cloudwatch client object as below

import boto3
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
# set the AWS profile name; if you don't have one, remove this line or set it to "default."
boto3.setup_default_session(profile_name='devopsars')

In this part of the code, we used the following parameters:

For the first parameter, we set the service name to ‘cloudwatch’
The second parameter is optional to set the region name

The second line is also optional and is used to set the AWS profile if you set a profile name when configuring the AWS CLI.

Next, we need to define the docker client to talk to the docker engine and get the metrics.

There are different ways to do this. The following is the easiest one.

# Setup the docker client
client = docker.from_env()

Now, we will define two functions. The first one, called calculate_stats_summary(stats), takes the containers.stats() object as input. It filters out unnecessary metrics and calculates CPU metrics not provided out of the box. The second function, called get_container_status(), iterates through the list of containers the docker client object provides. It retrieves the container’s stats and calls the calculate_stats_summary function to summarize the metrics. It also checks the container’s status (UP or DOWN) and sends the collected metrics to the AWS CloudWatch console using the cloudwatch.put_metric_data(MetricData) function as described earlier.

And before we jump to the full code, here is a description of the structure for the MetricsData object.

MetricName is the metric's name, which will appear in the cloudwatch console, and we can filter by it.
Service Name is the name of the container that we are monitoring.
Host Name is the host's name that the container is running on.
Unit is the measurement unit of the value. This determines the unit of the value in the cloudwatch console.
Value is the value of the metric.
Namespace is the first thing we look for when we need to read the metrics details in cloudwatch, this will appear in the cloudwatch console, and all the keys will be under this namespace. We can change it to whatever we want.
Dimensions are the columns' names for the metrics in the CloudWatch console. We can add as much as we want and filter by them. For example, we can filter by the hostname and see all the host metrics.

import docker
import boto3
import os

# Setup the boto3 client
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
# set the AWS profile name; if you don't have one, remove this line or set it to "default."
boto3.setup_default_session(profile_name='devopsars')

# Setup the docker client
client = docker.from_env()

# Set of values units used in the cloudwatch
megabytes_values = {'mem_current', 'mem_total'}
percentage_values = {'cpu_percent', 'mem_percent'}
numbers_unit = {'status'}


def calculate_stats_summary(stats):
    name = stats['name'] # get the name of the container
    name = name.strip('/') # remove the / from the name
    mem_current = float(stats["memory_stats"]["usage"]/1024/1024/1024) # get the current memory usage
    mem_total = float(stats["memory_stats"]["limit"]/1024/1024/1024) # get the total memory usage
    mem_percent = round(float((mem_current / mem_total) * 100.0), 2) # calculate the percentage of memory usage
    cpu_count = stats['cpu_stats']['online_cpus'] # get the number of cpu cores
    cpu_percent = 0.0
    cpu_delta = float(stats["cpu_stats"]["cpu_usage"]["total_usage"]) - float(stats["precpu_stats"]["cpu_usage"]["total_usage"]) # get the cpu usage by the container
    system_delta = float(stats["cpu_stats"]["system_cpu_usage"]) - float(stats["precpu_stats"]["system_cpu_usage"]) # get the total cpu usage by the system

    # calculate the cpu usage percentage
    if system_delta > 0.0:
        cpu_percent = cpu_delta / system_delta * 100.0 * cpu_count
        cpu_percent = round(cpu_percent, 2)
    # create a dictionary with the stats summary to be sent to cloudwatch
    summary_stats = {'cpu_percent': cpu_percent,
                     'mem_current': mem_current, 'mem_percent': mem_percent, 'name': name}
    # print("cpu_percent: {} mem_current: {} mem_percent: {} name: {}".format(cpu_percent, mem_current, mem_percent, name))
    return summary_stats

def get_container_status():
    for container in client.containers.list():
        check_string = 'app' # To select only the containers that we want to monitor their status (up or down)
        if check_string in container.name:
            if container.status == 'running':
                status = 1
            elif container.status == 'exited':
                status = 0
            try:
                # send the status of the container to cloudwatch
                cloudwatch.put_metric_data(
                    MetricData=[
                        {
                            'MetricName': "Status",
                            'Dimensions': [
                                {
                                    'Name': 'Services Name',
                                    'Value': container.name
                                },
                                {
                                    'Name': 'Host Name',
                                    'Value': os.uname()[1]
                                },

                            ],
                            'Unit': 'None',
                            'Value': float(status)

                        },

                    ],
                    Namespace='Docker/Stats')

            except ValueError:
                    # print('not convertable key {}, value is: {}'.format(key, value))
                continue
        try:
            # get the stats of the container and send them to cloudwatch
            full_stats = container.stats(stream=False)
            summary_stats = calculate_stats_summary(full_stats)
            #detect the unit of the value and send it to cloudwatch
            for key in summary_stats:
                try:
                    if key in percentage_values:
                        unit = 'Percent'
                    elif key in megabytes_values:
                        unit = 'Megabytes'
                    else:
                        unit = 'None'
                    print("key: {}, value: {}".format(key,summary_stats[key])) #for deubgging
                    
                    ''' 
                    To send the stats to cloudwatch:
                    - MetricName is the name of the metric, this will appear in the cloudwatch console and we can filter by it
                    - Service Name is the name of the container that we are monitoring
                    - Host Name is the name of the host that the container is running on
                    - Unit is the unit of the value, this determines the unit of the value in the cloudwatch console
                    - Value is the value of the metric
                    - Namespace is the namespace of the metric in cloudwatch, this will appear in the cloudwatch console and all the keys will be under this namespace, we can change it to whatever we want
                    - Dimensions is the dimensions of the metric, we can add as much as we want, they are the columns in the cloudwatch console and we can filter by them, for example we can filter by the host name and see all the metrics of the host.
                    '''
                    cloudwatch.put_metric_data(
                        MetricData=[
                            {
                                'MetricName': key,
                                'Dimensions': [
                                    {
                                        'Name': 'Services Name',
                                        'Value': '{}'.format(summary_stats['name'])
                                    },
                                    {
                                        'Name': 'Host Name',
                                        'Value': os.uname()[1]
                                    },
                                ],
                                'Unit': unit,
                                'Value': float(summary_stats[key])

                            },
                        ],
                        Namespace='Docker/Stats')

                except ValueError:
                    # print('not convertible key {}, value is: {}'.format(key, value))
                    continue

        except KeyError:
            continue
    if __name__ == '__main__':
        get_container_status()

If we run this code, we will notice the metrics on the CloudWatch console under the Namespace Docker/Stats. To find the new metrics go to https://console.aws.amazon.com/cloudwatch/home, then from the left menu, select “All Metrics,” and you will see a new group called “Custom namespaces” under this section, we will find our namespace in the screenshot below:

By selecting the namespace Docker/Stats, we will find our dimensions, and selecting the dimensions will show the new metrics. Here is mine:

Scheduling the Metrics Collection

To ensure metrics are collected and sent to CloudWatch regularly, you can use a tool like cron on Linux or Task Scheduler on Windows to schedule the script. This will automate collecting and sending metrics to CloudWatch, allowing you to keep an eye on the important metrics at all times.

The Conclusion

This article has provided a guide on how to send custom metrics from docker to AWS CloudWatch using boto3 and Python. While better solutions may be available for monitoring dockers in CloudWatch, the main focus of this article is to demonstrate how to send custom metrics to CloudWatch. By automating collecting and sending metrics, you can easily monitor your containerized infrastructure and make data-driven decisions to optimize performance and security. Additionally, it’s worth noting that this method can be used to send custom metrics from any system or application, not just limited to docker. Furthermore, you can use CloudWatch to set up alarms and notifications, create custom dashboards, and collect and monitor log files using CloudWatch Logs Insights.

Finally, always test and verify any scripts or solutions before implementing them in a production environment, and consider any potential costs associated with sending metrics to CloudWatch.

The repo for the code used in this article: https://github.com/mtahle/CustomCloudWatchMetrics

References

CloudWatch examples using SDK for Python (Boto3)

CloudWatch examples using SDK for Python (Boto3) - AWS SDK Code Examples There are more AWS SDK examples available in…

docs.aws.amazon.com

CloudWatch - Boto3 Docs 1.26.54 documentation

A low-level client representing Amazon CloudWatch Amazon CloudWatch monitors your Amazon Web Services (Amazon Web…

boto3.amazonaws.com

Docker SDK for Python

A Python library for the Docker Engine API. It lets you do anything the command does, but from within Python apps - run…

docker-py.readthedocs.io