Test Observability for AWS Lambda with Grafana Tempo and OpenTelemetry Layers

Oscar
Kubeshop
8 min readJul 18, 2024

--

I got great feedback from my Pulitzer award-winning blog post, “Testing AWS Lambda & Serverless with OpenTelemetry”. The community wanted a guide on using the official OpenTelemetry Lambda layers instead of a custom TypeScript wrapper. 😄

I decided to write this follow-up but to spice it up a little 🥵. Today I’m using Grafana Cloud, which has become one of my favorite tools! We use it extensively at Tracetest for our internal tracing, metrics, profiling, and overall observability.

See the full code for the example app you’ll build in the GitHub repo, here.

OpenTelemetry Lambda Layers

With a decade of development experience, one thing I’ve learned is that no-code solutions help save time and delegate maintenance and implementation to a third party. It becomes even better when it’s free 🤑 and from the OpenTelemetry community!

There are two different layers we will use today:

  1. The Node.js auto-instrumentation for AWS Lambda enables tracing for your functions without writing a single line of code, as described in the official OpenTelemetry docs, here and on GitHub, here.
  2. The OpenTelemetry collector AWS Lambda layer enables the setup to be 100% serverless without any need to maintain infrastructure yourself. You still need to pay for it though 👀.

Grafana Cloud

Grafana Cloud has become a staple tool to store everything related to observability under one umbrella. It allows integration with different tools like Prometheus for metrics or Loki for logs.

In this case, I’ll use Tempo, a well-known tracing backend where you store the OpenTelemetry spans generated by the Lambda functions.

Trace-based testing everywhere and for everyone!

Trace-based testing involves running validations against the telemetry data generated by the distributed system’s instrumented services.

Tracetest, as an observability-enabled testing tool for Cloud Native architectures, leverages these distributed traces as part of testing, providing better visibility and testability to run trace-based tests.

The Service under Test

Who said Pokemon? We truly love them at Tracetest, so today we have a new way of playing with the PokeAPI!

Using the Serverless Framework, I’ll guide you through implementing a Lambda function that sends a request to the PokeAPI to grab Pokemon data by id, to then store it in a DynamoDB table.

Nothing fancy, but this will be enough to demonstrate how powerful instrumenting your Serverless functions and adding trace-based testing on top can be! 💥

Requirements

Tracetest Account

  • Sign up to app.tracetest.io or follow the get started docs.
  • Create an environment.
  • Select Application is publicly accessible to get access to the environment's Tracetest Cloud Agent endpoint.
  • Select Tempo as the tracing backend.
  • Fill in the details of your Grafana Cloud Tempo instance by using the HTTP integration. Check out the tracing backend resource definition, here.
  • Test the connection and save it to finish the process.

AWS

  • Have access to an AWS Account.
  • Install and configure the AWS CLI.
  • Use a role that is allowed to provision the required resources.

What are the steps to run it myself?

If you want to jump straight ahead to run this example yourself ⭐️.

First, clone the Tracetest repo.

git clone https://github.com/kubeshop/tracetest.git
cd examples/quick-start-serverless-layers

Then, follow the instructions to run the deployment and the trace-based tests:

  1. Copy the .env.template file to .env.
  2. Fill the TRACETEST_API_TOKEN value with the one generated for your Tracetest environment.
  3. Set the Tracetest tracing backend to Tempo. Fill in the details of your Grafana Cloud Tempo instance by using the HTTP integration including headers looking like authorization: Basic <base 64 encoded>. It should be encoded base64 with the format of username:token. Follow this guide to learn how. And, check out this tracing backend resource definition. You can apply it with the Tracetest CLI like this tracetest apply datastore -f ./tracetest-tracing-backend.yaml.
  4. Fill the authorization header in the collector.yaml file from your Grafana Tempo Setup. It should be encoded base64 with the format of username:token. Follow this guide to learn how.
  5. Run npm i.
  6. Run the Serverless Framework deployment with npm run deploy. Use the API Gateway endpoint from the output in your test below.
  7. Run the trace-based tests with npm test https://<api-gateway-id>.execute-api.us-east-1.amazonaws.com.

Now, let’s dive into the nitty-gritty details. 🤓

The Observability Setup

Instrumenting a Lambda function is easier than ever, depending on your AWS region, add the ARN of the OpenTelemetry Collector and the Node.js tracer.

# serverless.yaml
functions:
api:
# Handler and events definition
handler: src/handler.importPokemon
events:
- httpApi:
path: /import
method: post

Next, add a couple of environment variables to configure the start of the handler functions and the configuration for the OpenTelemetry collector.

# serverless.yaml
environment:
OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/collector.yaml
AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-handler

The opentelemetry-nodejs layer will spin off the Node.js tracer, configure the supported auto-instrumentation libraries, and set up the context propagators.

While the opentelemetry-collector layer is going to spin off a version of the collector executed in the same context as the AWS lambda layers, configured by the collector.yaml file.

# collector.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"

Easy peezy lemon squeezy 🍋 right? well, this is everything you need to do to start your observability journey!

For every trace, there should be a test!

After having the observability setup, now is time to go to the next level by leveraging it by running some trace-based tests. This is our test case:

  • Execute an HTTP request against the import Pokemon service.
  • This is a two-step process that includes a request to the PokeAPI to grab the Pokemon data.
  • Then, it executes the required database operations to store the Pokemon data in DynamoDB.

What are the key parts we want to validate?

  1. Validate that the external service from the worker is called with the proper POKEMON_ID and returns 200.
  2. Validate that the duration of the DB operations is less than 100ms.
  3. Validate that the response from the initial API Gateway request is 200.

Running the Trace-Based Tests

To run the tests, we are using the @tracetest/client NPM package. It allows teams to enhance existing validation pipelines written in JavaScript or TypeScript by including trace-based tests in their toolset.

The code can be found in the tracetest.ts file.

import Tracetest from '@tracetest/client';
import { TestResource } from '@tracetest/client/dist/modules/openapi-client';
import { config } from 'dotenv';

Get True Test Observability

Make sure to apply the Tempo tracing backend in Tracetest. Create your Basic auth token, and use this resource file for reference. View the tracetest-tracing-backend.yaml resource file on GitHub, here.

type: DataStore
spec:
id: tempo-cloud
name: Tempo
type: tempo
tempo:
type: http
http:
url: https://tempo-us-central1.grafana.net/tempo
headers:
authorization: Basic <base 64 encoded>
tls: {}

Apply the resource with the Tracetest CLI.

tracetest config -t TRACETEST_API_TOKEN
tracetest apply datastore -f ./tracetest-tracing-backend.yaml

Or, add it manually in the Tracetest Web UI.

With everything set up and the trace-based tests executed against the PokeAPI, we can now view the complete results.

Run the test with the command below.

npm test https://<api-gateway-id>.execute-api.us-east-1.amazonaws.com

Follow the links provided in the npm test command output to find the full results, which include the generated trace and the test specs validation results.

[Output]

Find the trace in Grafana Cloud Tempo

The full list of spans generated by the AWS Lambda function can be found in your Tempo instance, these are the same ones that are displayed in the Tracetest App after fetching them from Tempo.

👉 Join the demo organization where you can start playing around with the Serverless example with no setup!! 👈

From the Tracetest test run view, you can view the list of spans generated by the Lambda function, their attributes, and the test spec results, which validate the key points.

Key Takeaways

Simplified Observability with OpenTelemetry Lambda Layers

In this post I’ve highlighted how using OpenTelemetry Lambda layers allows for automatic tracing without additional code, making it easier than ever to set up observability for your Serverless applications.

Powerful Integration with Grafana Cloud

Grafana Cloud has become an essential tool in our observability toolkit. By leveraging Grafana Tempo for tracing, we can store and analyze OpenTelemetry spans effectively, showcasing the seamless integration and its benefits.

Enhanced Trace-Based Testing with Tracetest

Tracetest is a game-changer for trace-based testing. By validating telemetry data from our instrumented services, it provides unparalleled visibility and testability, empowering us to ensure our distributed systems perform as expected.

Would you like to learn more about Tracetest and what it brings to the table? Check the docs and try it out today by signing up for free!

Also, please feel free to join our Slack community, give Tracetest a star on GitHub, or schedule a time to chat 1:1.‍

Originally published at https://tracetest.io.

--

--