How to Export DynamoDB Client Metrics to Graphite

Using Codahale metrics and AwsSdkMetrics to export finely-grained DynamoDB client metrics

Vinod Canumalla
Expedia Group Technology
5 min readAug 20, 2019

--

Service client metrics are essential for tuning the client configuration, monitoring, and observability of a service client’s performance within your DynamoDB application. This is essential even for native cloud-based services and their clients used within your applications.

Have you ever wondered if there are any metrics available for AWS service clients? And if so, do you know how to enable/capture AWS service client metrics? And how to send the custom metrics to an internal Graphite server?

There are a number of blogs and articles available online on how to enable sending DynamoDB client metrics to CloudWatch for monitoring purposes, but not to an internal Graphite server.

I’ll share how we capture AWS service client metrics here at Hotels.com (part of Expedia Group) with an example of sending DynamoDB service client metrics to a Graphite server.

AWS CloudWatch provides metrics for AWS service performance, but by default, it doesn’t give client side service client metrics such as the number of retries, the time taken to connect to an AWS service, and request/response times.

The AWS SDK metrics library provides those client side metrics if they are enabled within your application code. These metrics will be exported to CloudWatch by default. There are two categories of metrics available within AWS SDK metrics library: request metrics and service metrics.

Below are a few of the metrics reported by AWS service clients such as the DynamoDB and S3 service clients.

AWS Request Metrics

AWS Service Metrics

Enabling client metrics

Amazon provides the AwsSdkMetrics library for reporting the client metrics sent to CloudWatch by default when this feature is enabled. Metrics reporting can be enabled using:

  • JVM command-line option to send to CloudWatch
  • application code to send to CloudWatch
  • application code to send to Graphite (using a custom metrics collector)

By default, the metrics are uploaded to the us-east-1 region. You can change the region by changing a system property as shown below.

Note: You must set up IAM roles for your application to export metric data to CloudWatch. We assume here that the application is already running in AWS with the necessary access available.

JVM command-line option — to CloudWatch

All that is needed is setting the below flags as JVM options at the startup time will enable metrics reporting.

Application code — to CloudWatch

At the start of your application, just add the following lines so that the metrics reporting is enabled until the application is stopped:

It sounds really easy to have DynamoDB client-side metrics on CloudWatch alongside the DynamoDB service metrics, but the CloudWatch metrics are reported at 1 minute intervals, which is not finely-grained enough to use for debugging performance issues or tuning the client configuration. Reporting metrics to CloudWatch also has associated costs based on the number of metrics reported ($0.01 per 1,000 metrics gets/puts). Having more instances of an application and high request throughput increases the costs.

Application code — to Graphite

The AwsSdkMetrics library also enables developers to provide a custom metrics collector to report the metrics to Graphite instead of to CloudWatch. Using the Codahale metrics library and the following AWS SDK Metrics APIs for Java, the client metrics can be sent to Graphite at a much more detailed level.

AwsSdkMetrics APIs

The MetricCollector, RequestMetricCollector, and ServiceMetricCollector APIs are triggered whenever an AWS service client makes a call to an AWS service. By extending these APIs, it is possible to implement metrics reporting to Graphite. Here is example code for reporting DynamoDB Client metrics.

NOTE: This example is only meant to demonstrate the basics of how to export client metrics to Graphite. Beyond this, we recommend using metric reservoirs as explained in the blog Your Latency Metrics Could Be Misleading You.

Step 1:

In your application startup class, initialise the metrics registry and Graphite reporter, and assign the metrics registry to AwsSdkMetrics through a custom metric collector class.

Step 2:

Implement the custom metrics collector, AwsSdkMetricCollector, by extending MetricCollector and overriding two methods: one for collecting request metrics, and the other for collecting service metrics. This class will start metrics collection and use the two methods to report metrics to the METRIC_REGISTRY that has been passed into its constructor.

The above AwsSdkMetricCollector class uses two new classes extending the RequestMetricCollector and ServiceMetricCollector base classes. These extract the metric names and values to be sent to METRIC_REGISTRY as described in the next two steps.

Step 3:

The AwsSdkRequestMetricCollector extends RequestMetricCollector. It evaluates the available request metrics and creates appropriate metric registry objects like timers and histograms for reporting latency type metrics and count type metrics. The method collectMetrics uses a PredefinedMetricTransformer, a CloudWatch API that helps in extracting names and values of metric dimensions.

Please note that the DynamoDBConsumedCapacity metric name and value have been tweaked to capture the decimal value. The Codahale (v4.0.0) histograms only work with int and long values but not double values. This tweak is only needed if you want to capture fractional consumed capacity units. Remember that the consumed capacity units are reported only when used .withReturnConsumedCapacity in the DynamoDB request builder.

Step 4:

AwsSdkServiceMetricCollector extends ServiceMetricCollector. This class evaluates the available service metrics and creates appropriate metric registry objects like meters, timers, and histograms. ServiceMetricCollector has two methods implemented: collectByteThroughput and collectLatency. For DynamoDB, this gives the HttpClientGetConnectionTime only. Here is a sample code snippet:

DynamoDB client metrics on Grafana

These are the Grafana dashboards for the DynamoDB client metrics:

By default, the DynamoDBConsumedCapacity metric is reported for each request (as of now, the Codahale library histogram metric doesn’t provide the sum of the counts in the specified interval), so based on total request rate, we can estimate the total consumed capacity units as shown in the second graph below.

This enables the capture of much finer detail and more useful metrics than the CloudWatch metrics and also reduces the costs. We hope our example helps you implement your own the custom metrics collector for AWS service clients.

References

--

--