Getting Observability Into DynamoDB With OpenTelemetry in Node

Andrew @Scout
12 min readJul 10, 2023

--

In today’s cloud-native world, observability is essential to monitor and troubleshoot complex distributed systems. However, achieving observability can be challenging, especially when dealing with managed services like Amazon DynamoDB. OpenTelemetry is an open-source observability framework that provides a consistent way to collect, process, and export telemetry data from different components of an application.

Adding telemetry to your interactions with DynamoDB is required to understand and optimize your application’s performance and behavior. Here’s why you might want to add OpenTelemetry to your DynamoDB interactions:

  1. Performance Monitoring: Telemetry can help you track the latency of your DynamoDB operations. For instance, you can monitor how long it takes to read or write data. If these operations are slower than expected, they could be causing performance issues in your application.
  2. Debugging: If there’s an issue with your application, telemetry data can provide valuable context to help debug it. For instance, if a certain request is failing, telemetry data can help you determine whether the issue is with your application code or with the read/write operations to DynamoDB.
  3. Usage Patterns: Telemetry can provide insights into how your application is using DynamoDB. For instance, you can see what times of day have the most read/write operations, or which operations are used most frequently. This can help you optimize your application and its interaction with DynamoDB.
  4. Cost Optimization: AWS charges for DynamoDB based on reading, writing, and storing data in your DynamoDB tables. By understanding the usage patterns and the request patterns (like read/write frequency), you can optimize your cost.

By integrating OpenTelemetry with DynamoDB, you can gain observability into your application, which is critical for maintaining and improving the system’s reliability, performance, and functionality.

Adding telemetry to DynamoDB

If you’re here you’re already using DynamoDB, but if not you can learn more about it here: https://aws.amazon.com/dynamodb/

For this post we’ve created a simple CRUD Node app that reads and writes to a DynamoDB table called Items that has a partition key id and two other keys, name and content.

With that done, setting up telemetry for a Node app with DynamoDB requires a little extra configuration. First we need some more dependencies. For general Node observability with OpenTelemetry, we’ll need the following:

  • @opentelemetry/api: The core API for OpenTelemetry. It includes interfaces, types, and descriptions for the main classes like Tracer, Meter, and Propagator.
  • @opentelemetry/auto-instrumentations-node: This package contains automatic instrumentation for Node.js. When you enable automatic instrumentation, the library will automatically create and associate spans with your application’s operations.
  • @opentelemetry/exporter-trace-otlp-proto: This package exports trace data in the OpenTelemetry protocol (OTLP) format. OTLP is a general-purpose telemetry data delivery protocol, designed in the spirit of Google’s Dapper distributed tracing system.
  • @opentelemetry/resources: This package provides classes and functions for working with resources (a service’s immutable attributes).
  • @opentelemetry/sdk-node: This is the main entry point for the OpenTelemetry SDK in Node.js. The SDK provides controls for sampling, processing, and exporting trace data.
  • @opentelemetry/sdk-trace-node: This is the OpenTelemetry Trace SDK for Node.js. It’s used to manually control and instrument tracing for the application.
  • @opentelemetry/semantic-conventions: This package provides the standard naming conventions and semantic attributes that OpenTelemetry recommends.

We’ll install them like this:

npm install --save \
@opentelemetry/api \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-proto \
@opentelemetry/resources \
@opentelemetry/sdk-node \
@opentelemetry/sdk-trace-node \
@opentelemetry/semantic-conventions

We don’t import these into our app.js file directly. Instead all our telemetry setup will be in an independent file, which we’ll call instrumentation.js. Here’s what that will look like:

// Require dependencies
const opentelemetry = require("@opentelemetry/sdk-node");
const {
getNodeAutoInstrumentations
} = require("@opentelemetry/auto-instrumentations-node");
const {
Resource
} = require("@opentelemetry/resources");
const {
OTLPTraceExporter
} = require("@opentelemetry/exporter-trace-otlp-proto");
const {
NodeTracerProvider
} = require("@opentelemetry/sdk-trace-node");
const {
SemanticResourceAttributes,
} = require("@opentelemetry/semantic-conventions");
const {
BatchSpanProcessor
ConsoleSpanExporter
} = require("@opentelemetry/sdk-trace-base");
// Configure your Exporter for TelemetryHub with an Ingest Key specific to your account
const OTLPoptions = {
url: "https://otlp.telemetryhub.com/v1/traces",
headers: {
"x-telemetryhub-key": "$INGEST_KEY"
},
};
const otlpExporter = new OTLPTraceExporter(OTLPoptions);
// Set up the Resource that defines this service, attached to all generated spans
// This can be customized with distinguishing attributes as necessary
const provider = new NodeTracerProvider({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: "$YOUR_SERVICE_NAME",
}),
});
// Create a BatchSpanProcessor to handle span aggregation, suitable for production environments
// This will use the configured Exporter to send them.
provider.addSpanProcessor(new BatchSpanProcessor(otlpExporter));
// For troubleshooting, a ConsoleSpanExporter can be useful to print spans locally for inspection
// Multiple Processors can be attached to a given provider
provider.addSpanProcessor(new BatchSpanProcessor(new ConsoleSpanExporter()));
// Register the Provider with the API, so generated spans are actually exported
provider.register();
// Create and start the SDK that will orchestrate span collection and sending
// This will load and apply a large number of instruments, you may wish to pare them down later
const sdk = new opentelemetry.NodeSDK({
traceExporter: otlpExporter,
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start()

There is a lot in this file, but fundamentally it is setting up all the processors required to collect traces and spans from our app (All this is taken from the TelemetryHub documentation. Head there to see how to get set up with OpenTelemetry in a range of different languages).

We’re using TelemetryHub to monitor our telemetry. You can sign up for a free account here. You can see there are two TelemetryHub-specific variables we need to add:

  1. $INGEST_KEY, which you can get from your TelemetryHub dashboard.
  2. $YOUR_SERVICE_NAME, which you can set as needed to delineate this service from others that you instrument.

At this point, if we were to run our application again, instrumentation will work for our Node app. But we also want more insight into our DynamoDB instance. So we are also going to add a telemetry dependency specific to AWS, along with adding instrumentation for that to instrumentation.js. First, we’ll install the AWS-specific library for OpenTelemetry:

npm install @opentelemetry/instrumentation-aws-sdk

Then we’re going to initialize it within instrumentation.js (add this to the other requires at the top of the file):

const { AwsInstrumentation } = 
require('@opentelemetry/instrumentation-aws-sdk');

Then we will register that instrumentation alongside our Node instrumentation. To do so just add it to the instrumentations array:

const sdk = new opentelemetry.NodeSDK({
traceExporter: otlpExporter,
instrumentations: [getNodeAutoInstrumentations(), AwsInstrumentation()],
});

We now have:

  • A Node app,
  • that uses a DynamoDB database,
  • with OpenTelemetry instrumentation throughout.

Getting everything running

We need all this instrumentation to load before our app starts. We could require (instrumentation.js) in our app.js, but to force the require, we can push it in a CLI command with the –require command:

node --require ./instrumentation.js app.js

With that command, we have a fully instrumented Node app up and running. Let’s see what it outputs.

Outputting traces

With instrumentation.js running, let’s add another item to our database:

{
"id": "2",
"name": "Item 2",
"content": "This is another item."
}

Now, if you look in your terminal (because we have the ConsoleSpanExporter() turned on), you’ll see a ton of ‘spans’ from the telemetry for your app. We won’t recreate the entire output here as it’s about 200 lines for a single API call, but basically you get a breakdown of every span along the route from request to database entry:

  • middleware — query
  • middleware — expressInit
  • middleware — jsonParser
  • request handler — /items
  • tcp.connect
  • POST
  • DynamoDB.PutItem
  • POST /items

For each, you’ll get the shared traceId (all of these will have the same traceId), the timestamp and duration (in microseconds), any attributes for that operation, and the status code (with 0 being ‘success’). So you can see which operations were a success (hopefully all of them) and how long each took.

That’s great, but having to parse 200 lines in your terminal every time someone POSTs something to your site is not going to make for happy developers. That’s where a tool like TelemetryHub can help. The same trace in TelemetryHub shows up like this:

Easier to grok. The middleware and requests are insignificant in terms of time. The TLS/TCP component of the call is more significant, but really all the time is spent within the DynamoDB.PutItem call–actually interfacing with the database while it inserts the data.

This is just one datapoint, but in a real app, this endpoint would be called thousands of times an hour and you’ll start to build up real data on how significant these performance limitations are. Do you need to rearchitect your app to reduce this latency? Rework how you are storing data? Or is this perfectly reasonable for the type of call? Now you have the data to decide.

We can also look at what happens when a user stumbles along the sad path of your app. Let’s say for some reason you don’t send a partition key to DynamoDB:

{
"name": "Item 3",
"content": "This is a broken item."
}

Now the POST span will show the error code (2):

Then we can pull apart the DynamoDB.PutItem call again to see what the actual problem is:

We’re missing the key id in the item. Handily, the span also includes the statement we submitted so we can validate the issue. You can also click on ‘Find Similar Spans’ and filter on the http.status.code to find other instances of the database operation receiving a bad request.

Add metrics to your DynamoDB Node application

With traces, you now have a better understanding of what’s happening between your Node application and your DynamoDB database. But let’s take it one step further and add metrics. If you’re working with Amazon DynamoDB and want to monitor it using OpenTelemetry, here are some key metrics you might monitor:

  1. Request Counts: The total number of requests made to your DynamoDB service. By categorizing these requests (Read, Write, Update, etc.), you can gain insights into the use patterns of your service.
  2. Error Rates: The number of failed requests compared to successful ones. This helps in understanding the reliability of your service.
  3. Throttling: DynamoDB throttles operations to prevent your application from consuming too many resources. Monitoring this can help you optimize resource usage.
  4. Capacity: For provisioned tables, monitoring the read and write capacity units can help you manage costs and ensure you have enough capacity for your needs.
  5. Consumed Read and Write Capacity: You can monitor the amount of read and write capacity your application is using. This helps you to optimize your capacity planning.
  6. System Errors: Number of system errors during AWS SDK invocation to the DynamoDB service.

In addition to these, you might monitor things like the size of your DynamoDB tables, the number of items in your tables, and other related data depending on your application’s specific needs.

Let’s do a few of these. Metrics in OpenTelemetry are set up in such a way that once you can add one metrics collector, it’s pretty easy to start adding others. First, we need to add a couple of other dependencies:

npm install - save @opentelemetry/sdk-metrics
npm install @opentelemetry/exporter-metrics-otlp-http

sdk-metrics is, unsurprisingly, the OpenTelemetry SDK for recording metrics from Node. exporter-metrics-otlp-http allows you to send metrics to the OpenTelemetry metrics collector.

We’ll include these at the top of our instrumentation.js file so we can import the methods from them we need, along with another method we’ll need from @opentelemetry/api:

const {
OTLPMetricExporter,
} = require("@opentelemetry/exporter-metrics-otlp-http");
const {
MeterProvider,
PeriodicExportingMetricReader,
ConsoleMetricExporter,
} = require("@opentelemetry/sdk-metrics");
const { metrics } = require("@opentelemetry/api");

Within instrumentation.js we can then set up our meterProvider and create a new meter:

const meterProvider = new MeterProvider();
metrics.setGlobalMeterProvider(meterProvider);
meterProvider.addMetricReader(
new PeriodicExportingMetricReader({
exporter: new ConsoleMetricExporter(),
exportIntervalMillis: 10000,
})
);
meter = meterProvider.getMeter("example-exporter-collector");

This will then export to our console logging all the metrics from that meterProvider every 10 seconds.

Because we are requiring instrumentation.js before we load our node app.js, we have that meter object available to us in app.js, we can use it to log some metrics.

First, we’ll just create a counter that will log the number of requests we’re sending to DynamoDB:

const requestCounter = meter.createCounter("requests", {
description: "DynamoDB requests counter",
});

We can then add that to each of our endpoints and increment for each request:

if (err) res.status(500).send(err);
else {
requestCounter.add(1);
res.send(data);
}

We can then see the number of requests coming into our database (at this point, that number of 5):

{
descriptor: {
name: 'requests',
type: 'COUNTER',
description: 'DynamoDB requests counter',
unit: '',
valueType: 1
},
dataPointType: 3,
dataPoints: [
{ attributes: {}, startTime: [Array], endTime: [Array], value: 5 }
]
}

But the real help from OpenTelemetry comes from being able to combine it with the DynamoDB API to log information and metrics directly from the database.

First, let’s instantiate a connection to DynamoDB:

const ddbClient = new AWS.DynamoDB();

This time we’ll create an Observable Gauge which allows us to measure non-additive values:

const tableSizeObserver = meter.createObservableGauge("DynamoDBTableSize", {
description: "Size in bytes of DynamoDB table",
});

We’ll then add a function which updates this gauge every five seconds with the table size:

async function updateTableSize() {
try {
var params = {
TableName: "Items",
};

const data = await ddbClient.describeTable(params).promise();

// Get the table size details
const tableSizeBytes = data.Table.TableSizeBytes;

tableSizeObserver.addCallback((result) => {
result.observe(tableSizeBytes);
});

// Record the table size details
} catch (error) {
console.error("Failed to fetch table size:", error);
}
}

// Update table size every 5 seconds
setInterval(updateTableSize, 5000);

We can then export that measurement:

{
descriptor: {
name: 'DynamoDBTableSize',
type: 'OBSERVABLE_GAUGE',
description: 'Size in bytes of DynamoDB table',
unit: '',
valueType: 1
},
dataPointType: 2,
dataPoints: [
{
attributes: {},
startTime: [Array],
endTime: [Array],
value: 339
}
]
}

We can also inspect the error messages from DynamoDB to understand any issues. This can be helpful with system errors or throttling. To monitor throttling with OpenTelemetry in JavaScript, you can create a counter metric that is incremented each time a throttling event occurs. DynamoDB throttling usually happens when your application exceeds the provisioned throughput for a table:

// Create a counter metric for throttling events
const throttlingCounter = meter.createCounter('DynamoDBThrottlingEvents', {
description: 'Counts the number of throttling events from DynamoDB',
});

async function makeRequest() {
try {
// Make a request to DynamoDB here
var params = {
TableName: "Items",
];
const data = await ddbClient.put(params).promise();

} catch (error) {
// Check if the error is a throttling event
if (error.code === 'ProvisionedThroughputExceededException') {
// Increment the throttling counter
throttlingCounter.add(1);
}
}
}

Here we catch exceptions from the request to DynamoDB, check if the exception is a ProvisionedThroughputExceededException (which indicates a throttling event), and if so, increment the throttling counter so we end up with a count of how many throttling events have occurred.

Finally, let’s also track our consumed capacity with our database. This is a critical metric for DynamoDB as your pricing is dependent on the capacity consumed. Here, we’ll have to expand away from just the DynamoDB connection we’ve been using for our internal metrics to use the Amazon CloudWatch API. CloudWatch allows you to monitor all your AWS services. Here, we’ll be using it to get the details of our read/write operations with our DynamoDB table.

For the OpenTelemetry part, this works as before. We’re going to create two counters:

// Create counter metrics for consumed capacity
const consumedReadCapacityCounter = meter.createCounter(
"DynamoDBConsumedReadCapacity",
{
description: "Counts the consumed read capacity units of a DynamoDB table",
}
);

const consumedWriteCapacityCounter = meter.createCounter(
"DynamoDBConsumedWriteCapacity",
{
description: "Counts the consumed write capacity units of a DynamoDB table",
}
);

Then we need our CloudWatch service object:

const cloudwatch = new AWS.CloudWatch();

Then we’ll create a function to retrieve the information from cloudwatch and add it to our telemetry:

async function updateTableSize() {
try {
var params = {
TableName: "Items",
};
const data = await ddbClient.describeTable(params).promise();
// Get the table size details
const tableSizeBytes = data.Table.TableSizeBytes;
tableSizeObserver.addCallback((result) => {
result.observe(tableSizeBytes);
});
// Record the table size details
} catch (error) {
console.error("Failed to fetch table size:", error);
}
}
// Update table size every 5 seconds
setInterval(updateTableSize, 5000);

Higher performance for lower costs

Using the AWS SDK for OpenTelemetry gives you access to the entire API for DynamoDB. Above, we’ve only touched on a few opportunities for tracing and metrics, but you can access all your data within the database, and all the data about your database, either directly through DynamoDB or through CloudWatch.

Adding that data and metrics to telemetry means you can have complete visibility into your DynamoDB and Node app in a single place. Here, we’re just outputting mostly to the console, but you can add a backend such as prometheus or TelemetryHub to gain true insight.

If you are using DynamoDB, that insight equates to dollars/euros/yen. By adding OpenTelemetry to any DynamoDB and Node app, you are able to start optimizing your performance, finding those long duration calls, and finding those expensive operations. You then have the data and measurement apparatus to experiment and iterate until you can find the ideal DynamoDB setup for your Node application.

--

--