Unpacking Observability: Understanding Logs, Events, Traces, and Spans
I’ve spent the last few weeks trying to wrap my head around Observability, consuming every book, article, and podcast that I could get my hands on. My most recent explorations have gotten me digging into OpenTelemetry. OpenTelemetry (or OTel for short) is an open-source framework for instrumenting code, and many of the major Observability vendors such as Datadog, Lightstep, and Honeycomb support it. It’s vendor-agnostic, so if you choose to switch Observability vendors, you won’t be royally screwed. I’m in charge of the Observability team at my current company, and my goal is to have the organization follow best practices around Observability. Among other things, this means steering the organization towards adopting OpenTelemetry for instrumentation.
Before jumping into using OpenTelemetry, it’s important to understand core concepts, such Spans and Traces. But what about Logs? Where do these fit in? How about Events? Most of the literature I’ve read about Observability talks about wide Events and deep Traces; however, OpenTelemetry docs don’t seem to put a huge emphasis on Events in the same way. Was I missing something?
So, of course, I decided to do some digging, asking questions in the Observability community and reading all sorts of online docs (see references below) to try to understand things better. The purpose of this blog post is to educate you in the differences between Logs, Events, Spans, and Traces so that you can start digging into OpenTelemetry.
Logs are human-readable flat text files that are used by developers to capture useful data. Logs messages occur at a single point in time (though not necessarily at every point in time).
Unfortunately, log formats aren’t standardized across languages or frameworks, and they can be hard to parse and challenging to query. It’s also hard to group related logs together.
Events are structured logs. They follow a standardized format (JSON), and are waaaay easier to query.
Behold a sample log:
And its Event counterpart:
A Span represents a unit of work. They can be thought of as the work being done during an operation’s execution.
Logs represent occurrences at a specific point in time. Events aren’t that much more useful, other than being easier to read and query. The problem is that in isolation, Events don’t really tell a story. What if instead, we captured info for a given block of time (i.e. a time span)?
Suppose we had the scenario below:
In the olden days, we’d have log that looked like this:
We have a span that looks like this:
Um…letdown? Yeah…if you only had those 3 fields, it would for sure be a letdown. In order for the Span to be more useful to us, we need some additional information. In OpenTelemetry, we can also include the following metadata in our Spans:
- Operation name: The name of the microservice being executed, or a function call
- Start timestamp
- End timestamp (or duration)
- Attributes: (Optional) List of key-value pairs used for aggregation or for filtering trace data (e.g. customer identifier, process hostname). Used to describe and contextualize the work being done under a Span.
- Events: (Optional) Time-stamped strings which are made up timestamp, name, and (optional) Attributes. Used to describe and contextualize the work being done under a Span.
- Parent ID: Unique identifier of the Span’s parent
- Links: (Optional) References to other causally-related Spans
Now with the above metadata, we’ve got the proper context which helps us paint a picture of what happens during that operation.
Traces are also known as distributed traces. They traverse network, process, and security boundaries, to give you a holistic view of your system.
A Span is the basic building block of a Trace. A Trace is made up of a tree of Spans, starting with a Root Span (i.e. Span with no parent), which encapsulates the end-to-end time that it takes to accomplish a task. The Root Span represents a single logical operation, such as clicking a button to add an item to a shopping cart.
Example 1: Trace visualization in LightStep
Example 2: Trace visualization in Honeycomb
In the world of Observability, Spans and Traces reign supreme. What we’ve learned:
- Logs tell you about something at a particular point in time. They don’t have a standardized format, and are therefore hard to query.
- Events are structured logs (JSON), and are easier to query.
- Spans represent an operation. They paint a picture of what happened during the time in which that operation was executed, through contextual information such as associated Events and attributes.
- A Root Span is a Span without a parent, and represents your high-level operation (e.g. clicking a button to add item to a shopping cart).
- Traces stitch all related spans (as a tree) together to tell you the whole story.
I shall now reward you with a picture of a calf.
Peace, love, and code.
I wanted to give a big shoutout to the Observability community on the Honeycomb Pollinators Slack. Folks there have been super responsive and patient with my many questions. I really appreciate it. Also, a shout-out to Alex Vondrak, who put together a great set of slides which clarified a LOT of this stuff for me.
I would also suggest that you reach out to other Observability user communities. I figure that it’s always good to get different points of view from the community! Datadog, for example, also has a Slack user community, and Lightstep has a Discord user community.
More from the Unpacking Observability Series
Unpacking Observability: A Beginner’s Guide
A beginner’s guide to understsanding Observability, why it matters, and how you can get started.
Unpacking Observability: The Observability Stack
Putting together a simple, yet effective OpenTelemetry-centric Observability stack
Unpacking Observability: The Path to OpenTelemetry
How to roll out OpenTelemetry across your organization to achieve Observability vendor neutrality
References & Resources
- Honeycomb Pollinators Slack
- Datadog User Community Slack
- Lightstep Community Discord
- The Path from Logs to Traces, by Alex Vondrak
- Modern Observability with OpenTelemetry (LightStep)
- OpenTelemetry 101: What is Tracing? (LightStep)
- Span Attributes (OpenTelemetry)
- Spans with Events (OpenTelemetry)
- OpenTelemetry Specification (OpenTelemetry GitHub)