“Resources” and “Transactions”: a fundamental duality in observability

Ben Sigelman
LightstepHQ
Published in
3 min readMar 19, 2021

Originally posted as this twitter thread.

0/ Fundamentally, there are only two types of “things worth observing” when it comes to production systems:

1) Resources
2) Transactions

The tricky (and interesting) part is that they’re entirely codependent. This is a thread about that tricky/interesting part…

1/ But first, some definitions.

Transactions: these are the things that traverse your system and (hopefully) “do something.” The classic example would be an end-user request that propagates across networks and process boundaries.

2/ “Transactions” can be described at wildly different granularities: actions in a mobile app, HTTP reqs, function calls, CPU instructions, etc. They’re all part of the overall transaction, and the question is how much detail you can afford to expose to observability tools.

3/ Resources: these are the things in your system that transactions consume in order to do their work. Resources survive across transactions.

Unless you’re looking to operate with negative margins, your resources are also shared between many transactions concurrently.

4/ And “Resources” exist at every scale, too: every VM, Kafka topic, CPU, hashtable, mutex lock, and even your AWS account (just ask your CFO!) is a Resource. Resources are everywhere and come in many shapes and sizes. They do have one thing in common, though: they are finite.

5/ Now, as Transactions transact, they use Resources.

And as Resources saturate, Transactions suffer.

You know what that smells like: ordinary DB queries that suddenly get really slow; backend services that don’t respond to requests anymore; OOM flapping; etc.

6/ {A relevant pet peeve: the very idea that software can be deemed “correct” or “incorrect” without a real workload.

I 💙 CI as much as the next dev, but without a real(istic) workload, we’re only testing Transactions, not Resources. And both need to be healthy, or 💀 💻}

7/ So, about the interesting-and-tricky part:

End-users don’t give a s*** about your Resources. They only care about their Transactions.

BUT! When Transactions fail (or get slow), it’s almost always due to a Resource.

How can observability help here?

8/ As I’ve written about recently (e.g., here), “What Caused That Change?” is the most important and most central question in observability.

Changes in what? You guessed it: Transactions and Resources.

9/ Those changes can either be discovered via Transaction issues (e.g., “end-user requests just got slow”) or via Resource issues (e.g., “our mysql database is falling over” or “it’s taking 3s to grab this mutex lock!”).

10/ For observability to help here, we must be able to pivot, naturally, from (contended) Resources to Transactions and back again. This is typically done through shared metadata (aka “tags” or “attributes”), but it’s hard from a data engineering standpoint.

11/ Resource telemetry is almost always represented as metric time series, and Transaction telemetry is almost always represented as traces (or structured logs/events, but TBH those are just traces with a missing context field!).

12/ Pivoting between the two is a lot more than supporting metrics and traces in a single invoice — what’s needed is a fully-fledged data layer that can join across telemetry attributes to find workload (Transaction) changes that explain utilization (Resource) changes.

13/ I wrote recently about how this looks for Kafka, but the pattern appears in many other areas.

14/ This thread is meant to provide a conceptual summary of this topic in general — if there’s interest or excitement to learn more, I will follow up with specific examples of how observability tooling can/should pivot from Transactions to Resources and back again.

15/ In the meantime, I hope this is a helpful overview of these two critical concepts, how they relate to metrics and traces, and how they relate to each other. Thank you for reading! 🤓 💙

For more threads like this one, please follow me here on Medium or as el_bhs on twitter!

--

--

Ben Sigelman
LightstepHQ

Co-founder and CEO at LightStep, Co-creator of @OpenTelemetry and @OpenTracing, built Dapper (Google’s tracing system).