Unpacking Observability: Understanding Logs, Events, Traces, and Spans

Adriana Villela
Dzero Labs
Published in
6 min readJul 14, 2021
Hawaiian Sunset. Image by Adri V

I’ve spent the last few weeks trying to wrap my head around Observability, consuming every book, article, and podcast that I could get my hands on. My most recent explorations have gotten me digging into OpenTelemetry. OpenTelemetry (or OTel for short) is an open-source framework for instrumenting code, and many of the major Observability vendors such as Datadog, Lightstep, and Honeycomb support it. It’s vendor-agnostic, so if you choose to switch Observability vendors, you won’t be royally screwed. I’m in charge of the Observability team at my current company, and my goal is to have the organization follow best practices around Observability. Among other things, this means steering the organization towards adopting OpenTelemetry for instrumentation.

Before jumping into using OpenTelemetry, it’s important to understand core concepts, such Spans and Traces. But what about Logs? Where do these fit in? How about Events? Most of the literature I’ve read about Observability talks about wide Events and deep Traces; however, OpenTelemetry docs don’t seem to put a huge emphasis on Events in the same way. Was I missing something?

So, of course, I decided to do some digging, asking questions in the Observability community and reading all sorts of online docs (see references below) to try to understand things better. The purpose of this blog post is to educate you in the differences between Logs, Events, Spans, and Traces so that you can start digging into OpenTelemetry.

Logs

Logs are human-readable flat text files that are used by developers to capture useful data. Logs messages occur at a single point in time (though not necessarily at every point in time).

Unfortunately, log formats aren’t standardized across languages or frameworks, and they can be hard to parse and challenging to query. It’s also hard to group related logs together.

Events

Events are structured logs. They follow a standardized format (JSON), and are waaaay easier to query.

Behold a sample log:

Source: The Path from Logs to Traces, by Alex Vondrak

And its Event counterpart:

Source: The Path from Logs to Traces, by Alex Vondrak

Spans

A Span represents a unit of work. They can be thought of as the work being done during an operation’s execution.

Logs represent occurrences at a specific point in time. Events aren’t that much more useful, other than being easier to read and query. The problem is that in isolation, Events don’t really tell a story. What if instead, we captured info for a given block of time (i.e. a time span)?

Suppose we had the scenario below:

Source: The Path from Logs to Traces, by Alex Vondrak

In the olden days, we’d have log that looked like this:

Source: The Path from Logs to Traces, by Alex Vondrak

We have a span that looks like this:

Source: The Path from Logs to Traces, by Alex Vondrak

Um…letdown? Yeah…if you only had those 3 fields, it would for sure be a letdown. In order for the Span to be more useful to us, we need some additional information. In OpenTelemetry, we can also include the following metadata in our Spans:

  • Operation name: The name of the microservice being executed, or a function call
  • Start timestamp
  • End timestamp (or duration)
  • Attributes: (Optional) List of key-value pairs used for aggregation or for filtering trace data (e.g. customer identifier, process hostname). Used to describe and contextualize the work being done under a Span.
  • Events: (Optional) Time-stamped strings which are made up timestamp, name, and (optional) Attributes. Used to describe and contextualize the work being done under a Span.
  • Parent ID: Unique identifier of the Span’s parent
  • Links: (Optional) References to other causally-related Spans

Now with the above metadata, we’ve got the proper context which helps us paint a picture of what happens during that operation.

Trace

Traces are also known as distributed traces. They traverse network, process, and security boundaries, to give you a holistic view of your system.

A Span is the basic building block of a Trace. A Trace is made up of a tree of Spans, starting with a Root Span (i.e. Span with no parent), which encapsulates the end-to-end time that it takes to accomplish a task. The Root Span represents a single logical operation, such as clicking a button to add an item to a shopping cart.

Below are a few examples of trace visualizations using Lightstep and Honeycomb.

Example 1: Trace visualization in LightStep

Source: The Path from Logs to Traces, by Alex Vondrak

Example 2: Trace visualization in Honeycomb

Source: The Path from Logs to Traces, by Alex Vondrak

Conclusion

In the world of Observability, Spans and Traces reign supreme. What we’ve learned:

  • Logs tell you about something at a particular point in time. They don’t have a standardized format, and are therefore hard to query.
  • Events are structured logs (JSON), and are easier to query.
  • Spans represent an operation. They paint a picture of what happened during the time in which that operation was executed, through contextual information such as associated Events and attributes.
  • A Root Span is a Span without a parent, and represents your high-level operation (e.g. clicking a button to add item to a shopping cart).
  • Traces stitch all related spans (as a tree) together to tell you the whole story.

I shall now reward you with a picture of a calf.

Photo by Sean Nyatsine on Unsplash

Peace, love, and code.

Acknowledgements

I wanted to give a big shoutout to the Observability community on the Honeycomb Pollinators Slack. Folks there have been super responsive and patient with my many questions. I really appreciate it. Also, a shout-out to Alex Vondrak, who put together a great set of slides which clarified a LOT of this stuff for me.

I would also suggest that you reach out to other Observability user communities. I figure that it’s always good to get different points of view from the community! Datadog, for example, also has a Slack user community, and Lightstep has a Discord user community.

More from the Unpacking Observability Series

References & Resources

--

--

Adriana Villela
Dzero Labs

DevRel | OTel End User SIG Maintainer | {CNCF,HashiCorp} Ambassador | Podcaster | 🚫BS | Speaker | Boulderer | Computering 20+ years | Opinions my own 🇧🇷🇨🇦