Trace-Based Testing with OpenTelemetry: Meet Open Source Malabi

Michael Haberman
OpenTelemetry
Published in
7 min readOct 1, 2021

By Yuri Shkuro, creator and maintainer of Jaeger, and Michael Haberman, Co-Founder & CTO of Aspecto.

If you deal with distributed applications at scale, you probably use tracing. And if you use tracing data, you already realize its crucial role in understanding your system and the relationships between system components, as many software issues are caused by failed interactions between these components.

Most of you reading this article are well aware of how powerful tracing is, but here’s the thing, you can do a lot more with it in pre-production phases. Let us explain why and also introduce you to Malabi, the new open-source tool for trace-based testing.

How We Use Tracing Today

Companies these days use tracing data for critical functions such as application performance monitoring, allowing DevOps teams and SREs to find and fix issues in production after they happen. Developers usually use it when trying to debug a complex issue that happened in production.

In most cases, they are being reactive to issues, trying to understand what went wrong. But there’s another use case for tracing data: utilizing trace output to test and validate your application’s behavior.

What is trace-based testing

Trace-based testing is a method that allows us to improve assertion capabilities by leveraging tracing data and makes it accessible while setting our expectations from a test. That enables us to validate essential relationships between software components that otherwise are put to the test only in production.

Trace-based validation enables developers to become proactive to issues instead of reactive.

Why trace-based testing

Trace-based testing answers the complexities of current testing strategies and techniques for distributed applications. We will later see how trace-based testing can help us overcome these challenges.

To set the stage for the example that follows, we will discuss all types of tests that make actual network calls: integration tests, system tests, service tests, end-to-end tests, contract tests, API tests, you-name-it tests. Unit tests and usage of mocks are out of scope.

It is also important to mention that you can leverage trace data for testing without the need to add another layer to your existing tests.

Why traditional testing techniques aren’t enough

Let’s take a look at a use case where “regular” tests fall short and what we can do to combat these challenges. We’ll use a real use case that many people can easily relate to ordering food via your favorite food delivery service.

Here is a pseudo-code representing the logic when ordering food:

// This function is called when you order food via an appasync function orderFoodDelivery(order: Order) {if (!valid(order)) {throw new Error('Bad order')}
const orderDetails = await restaurantApproval(order);
if(orderDetails.failed){throw new Error('Restaurant too busy')}
Promise.all([// Send a notification to mobile appnotifyClient(orderDetails),
// Start deliveryenqueueDelivery(orderDetails),
// Call stripe and if successful upload invoice to a file storage.// if failed, put in queue for retrypaymentAndInvoice(orderDetails),]);}

This code flow is pretty simple.

The backend gets the order details, validates them, and asks the restaurant to deliver the food.

Then we need to notify the customer, find a delivery person, and charge the customer.

One major concern we can all agree on is that getting someone else’s food could make you mad. Really mad!

We want the delivery person, customer, and restaurant to be all aligned with the order details. Making it happen with “traditional” testing techniques can be a significant hassle.

Essentially, the problem is we’re not fully testing the whole process. Usually, we can only validate the response, which is, by design, limiting us from peeking into its internal workflow and does not offer the level of reliability we’re looking for when testing.

For example, the fact that we got an order approval from the restaurant, doesn’t mean the delivery person received our correct address.

How can trace-based testing help?

There are a few common cases in which traces are useful for testing internal workflows:

  1. When dealing with multiple components, your test needs to be familiar with different APIs and configure many clients. Testing against trace uses a single uniform API.
  2. A trace can capture a transient state which is difficult to retrieve after the workflow has finished. For instance, there may be no record anywhere other than logs that an error occurred in a given component.
  3. Each component may not even expose its internal state in sufficient detail to validate the workflow. Tracing provides us with granular data to overcome this.

Trace-based testing validates that the proper interaction between components occurs as part of the tested workflow. We want to validate there’s complete sync between the customer, restaurant, and delivery.

We mentioned we’d introduce you to a new open-source for trace-based testing called Malabi. Take a look at the following integration test to see how we use Malabi to easily validate this sync:

// Nothing worse than getting the wrong food.it('Making sure the right person get the food.', ()=>{const order = {id:1, userId};orderFoodDelivery(order);

// Using Malabi to make sure the internal calls to Kafka and WS are synced with the same order ID.
expect(malabi.kafka({topic:'match-delivery-order'}).toMatch(id:order.id})expect(malabi.websocket({room:order.userId}}).toMatch({id:order.id})})

Say Hello to “Malabi”

In the code example above, you can see how Malabi helps us by making traces accessible in the assertion phase while setting our expectations from the test. We can then validate the internals of the API call.

Malabi is an open-source Javascript framework (still in its early days) based on OpenTelemetry that allows you to leverage trace data and improve assertion capabilities.

(It is also a delicious milk pudding dessert made of rice, sugar, rice flour, and milk).

source: @malabi_marshmallow

This is how open source Malabi works

Let’s review the diagram below that shows what happens in Malabi when running the integration test from above. We are validating that the Order Service submitted correct messages to Kafka and to a Websocket.

The Malabi library that runs inside the Test Runner is fetching traces collected in the OpenTelemetry SDK. This is done by adding a custom exporter to OTEL SDK that stores the trace in memory (the exporter is already provided by Malabi). You can add the custom exporter by yourself, or by using the OpenTelemetry SDK we created for you (with a good amount of auto instrumentations from both OTEL contrib and Aspecto’s repo).

Currently, Malabi supports validating spans created by the service under test (order service in the diagram above). For supporting downstream services and async message brokers, you will need to spin up some backend (Jaeger, OTEL Collector) and store those spans as well.

By collecting the output traces of each test and serving them to the developer in the assertion phase we’re easily getting deeper visibility into the workflow’s internal processes.

You can check Malabi’s repo for more technical details and a demo you can run locally: https://github.com/aspecto-io/malabi

A quick code example for the assertion (this is how we use it internally):

it('happy flow - pull_request opened event is written to db and sqs', async () => {
const event = 'pull_request';
const eventId = 'my_eventId';
const payload = {
action: 'opened',
name: 'Rick',
lastName: 'Sanchez',
phrase: 'Wubba Lubba Dub-Dub',
organization: {
login: 'aspecto-io',
},
};
const res = await client.post('/api/v1/webhook', payload, {
headers: {
[sigHeaderName]: getGithubSignature(payload),
'X-Github-Event': event,
'X-GitHub-Delivery': eventId,
},
});

expect(res.status).toBe(200);
const spans = await getMalabiExtract();

// Assert DB write
expect(spans.mongo().length).toBe(1);
const mongo = spans.mongo().first;
expect(mongo.hasError).toBeFalsy();
expect(mongo.dbOperation).toBe('save');
expect(mongo.mongoCollection).toBe('github-events');
const doc = JSON.parse(mongo.dbResponse);
expect(doc.payload).toEqual(payload);
expect(doc.eventId).toEqual(eventId);
expect(doc.type).toEqual(event);
// Make sure timestamps are written
expect(doc.createdAt).toBeDefined();
expect(doc.updatedAt).toBeDefined();

// Assert SQS Send
expect(sqs.hasError).toBeFalsy();
expect(spans.awsSqs().length).toBe(1);
const sqs = spans.awsSqs().first;
expect(sqs.rpcMethod).toBe('sendMessage');
expect(JSON.parse(sqs.messagingPayload)).toEqual({
type: 'git-hub-pr',
data: payload,
});

const incomingHttpSpan = spans.http().incoming().first;
expect(incomingHttpSpan.hasError).toBe(false);
});

Getting started with Malabi

Malabi’s installation is easy and requires only two parts:

  1. An OpenTelemetry SDK is installed in the service under test.
  2. A test assertion NPM package is installed in the test runner.

Calling all early adopters and the bottom line

It seems that there’s a whole world of opportunities to optimize and increase the effectiveness of testing — by getting more visibility into the processes running across our entire system and by leveraging available tracing data to increase test reliability and possibly preventing issues early in the development cycle.

This is what Malabi is meant to do.

Malabi is an early-stage open source and there is much work to be done — this is where you, brilliant people, come in! We would love your support in this project, feel free to help in any shape or form (contribute code, knowledge, ideas, or just share it. Oh yeah! GitHub stars make traces better).

If that sounds interesting, ping us and we’d be happy to answer any questions you may have on Malabi.

Feel free to also email us if you’d like to contribute to the project.

--

--