Bringing Context and Environment back to Event Collection

Does the software industry need yet another logging library? No, for a library that replicates the same old technical approach with some minor syntactical changes at the surface layer of the programming interface. The current method has just far too many design issues leftover from yesteryear thinking. Yes, if we can re-imagine what it means to log, more so emit.

Present Day Logging Instrumentation

Before listing various issues with today’s logging, let’s recap what it looks like for the average enterprise developer when it comes to instrumentation.

The above code is focused on the capturing of the phenomenon to be logged. In contrast, the…

My last few posts have been concerned with benchmarking distributed tracing solutions, including OpenTracing and OpenTelemetry. In this post, I’m going to shift the focus slightly away, initially looking at how OpenSignals fairs compared with the considerable overhead all the tracers benchmarked have added when injected into the playback of an episodic (event) machine recording and then look at what can be gleaned from other profiling techniques, such as thread sampling, in observing time.

Below is the interception code used to create an OpenSignals Service and fire START and STOP signals. The Interceptor maps each named probe (a Java class…

Enter the Matrix

In this post, I’m going to look at how quickly Stenos can playback an episodic memory of some previously recorded software execution of Apache Kafka. Also, we’ll look at what can be achieved within a replay, especially when observability tooling and extensions, built on an activity-based metering model, cannot discern the difference between real or simulated.

The episodic memory file is 6.1GB and contains more than 2.1 billion recorded events of calls made by instrumented methods within the Apache Kafka. Included within this event stream is the metadata for the process and threads. Also, the event stream consists of the…

When I started with this series of distributed tracing benchmarking related posts, the aim was to demonstrate that there though at a conceptual level there is very little difference between tracing and method invocation profiling the difference in terms of overhead and storage cost is immense. This difference is due to distributed tracing implementations always looking to move events over the network to some central point of trace reconciliation. Distributed tracing doesn’t distribute computation well.

Distributed tracing client libraries (and pipelines) do the minimum and offer hardly any value locally other than measuring the timing of activities, queuing (span) events…

Over the last few posts I’ve benchmarked Elastic APM and OpenTelemetry. Today it is the turn of Uber’s Jaeger Tracing built with Go and supporting OpenTracing — another proposed tracing standard heading for the sunset.

Initial Setup

I initially hoped to run the Jaeger backend within a local container, but after many abrupt crashes, I opted to run directly downloaded binaries.

export SPAN_STORAGE_TYPE=badger
./jaeger-all-in-one — badger.ephemeral=false

Note: I choose the badger storage type so I could estimate storage costs.


Like in the other posts, I used an episodic machine memory played back by Stenos to create a benchmark environment for comparing tracing…

In a previous post, I attempted to benchmark the Elastic APM JavaAgent in comparing a profiler such as Satoris against a (distributed) tracing solution. Here I will reuse the harness to do a quick assessment of OpenTelemetry.


I found the many aspects of the bootstrapping of the OpenTelemetry API and SDK to be confusing. I also encountered design bugs that needed to be workaround to get things to load correctly into the Stenos simulated runtime.

But after some trial and error, I finally got the following code to work.

Note: I might have complicated the task in trying only benchmark…

In my last post, I benchmarked the Elastic APM Java agent against Satoris, effectively comparing the runtime overhead of tracing against activity-based metering with some adaptive strategy based augmentation. But there was a problem with the benchmarking of Elastic APM — it never managed to store all transactions and spans that had been created and sent to it for storing within Elastic Search.

The Elastic APM server had dropped nearly 99% of all trace event traffic. The server had been unable to keep up and simply dropped events.😱

But the dropping did not just occur at the server; there was…

Performance Investigation

Last week I was contracted to assist in the performance investigation of an enterprise Java server-based application that was experiencing intermittent performance degradation issues in production. All went well after installing Satoris, an adaptive profiling (metering) agent for the JVM, and then very quickly identifying the various performance hotspots. Everyone involved in the investigation was extremely pleased. So much so that the client, a software vendor themselves, was interested in deploying the Satoris agent alongside their software stack at customer site installations where the software was heavily customized. …

A question was tweeted the other day on my timeline, exclaiming how great it would be to be able to trace a microservice call down to a malloc call.

I’m not going to debate the value of doing so in this article. Instead, I would like to discuss how this can be approached from the perspective of observability as defined by the three pillars we keep hearing being touted.

Leaving aside the impracticalities, a logging vendor would advocate writing a log record listing the method as well as the size of the allocation requested. …

A visual comparison of tracing and metering (data) models

Following my last post, where I detailed three of the most important reasons for a movement away from tracing to activity-based multi-resource metering, I thought it might be useful to attempt to explain an essential difference in the model captured from an entirely visual perspective.

Distributed or local tracing is an API for creating nodes (spans) in a tree (context) and laying them out over a timestamped timeline. Nothing more.

The significant part of the tracing model, derived from calls to create and navigate to and from one node to another across depths, is that for every entry point call…


William Louth. Software engineer specialized in self-adaptive software runtimes, adaptive control, self-regulation, resilience engineering & visualization.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store