The Paradox of Telemetry

Tuning EventSourcing.Backbone for effective insights

Bnaya Eshet
Cloud Native Daily
5 min readJul 4, 2023

--

Introduction

Telemetry is a powerful tool for understanding the health and behavior of event-driven systems. However, there is a paradoxical challenge: too much telemetry can overwhelm and distract, while the right amount can provide valuable insights at a glance. In this post, we explore the art of tuning EventSourcing.Backbone for effective telemetry, focusing on the principle of “less is more”. By identifying key indicators and expanding telemetry strategically, we can achieve a clearer understanding of system behavior and swiftly pinpoint issues when they arise.

Don’t forget to check out the other posts in our EventSourcing.Backbone Series for more insights into event-driven architectures and observability

The Pitfall of Excessive Telemetry

Imagine a scenario where every metric and event is logged and displayed on a dashboard. While it may seem comprehensive, this abundance of information can be overwhelming. It becomes difficult to discern meaningful patterns or identify critical issues amidst the noise. Too much telemetry can lead to information overload, obscuring the true health and performance of the system.

The Power of Key Indicators

Contrastingly, having the right telemetry, the essential key indicators, can provide a concise and actionable overview of the system’s health. These indicators act as high-level signposts, enabling quick assessments of system behavior. By focusing on these crucial metrics, we gain the ability to understand the system’s state in a single glance, providing valuable insights without drowning in unnecessary details.

Strategic Expansion for Troubleshooting

When encountering issues or investigating anomalies, having a well-defined set of key indicators simplifies the troubleshooting process. By expanding telemetry strategically, and focusing on specific areas of interest, we can dive deeper into the problem space. This targeted approach allows us to gather additional telemetry around the relevant components, events, or interactions, providing the necessary context to diagnose and resolve issues efficiently.

EventSourcing.BackboneTelemetry tuning

EventSourcing.Backbone template provides a streamlined approach to telemetry by limiting verbosity. With this template, you have the flexibility to fine-tune your telemetry setup according to your needs. You can adjust the telemetry filter, add or remove instrumentation, enable specific telemetry features using environment variables, and even implement a custom sampler. This level of control allows you to strike the right balance between observability and overhead, ensuring that you have the necessary insights without overwhelming your system.

  • Out of the box, the template exposes minimal tracing information:

The flow is crystal clear, indicating either producer or consumer activities without unnecessary internal details.

  • In order to expose some internals, you can extend the tracing level
// program.cs

services.AddSingleton<TelemetryLevel>(LogLevel.Debug);
// or with fine grain to distinguished tracing from metrics
services.AddSingleton(new TelemetryLevel
{
Metric = LogLevel.Information,
Trace = LogLevel.Debug
});

Which will result in:

The trace captures a wealth of details, but its value remains a subject of debate.

  • Expose Redis & S3 level information can be done by:

Redis:

Set the `` environment variable to `true`.

From the project properties

Or change the `launchSettings.json` file directly.

{
"profiles": {
"http": {
"commandName": "Project",
"launchBrowser": true,
"launchUrl": "swagger",
"environmentVariables": {
"ASPNETCORE_ENVIRONMENT": "Development",
"EVENT_SOURCE_WITH_REDIS_TRACE": "true"
},
"dotnetRunMessages": true,
"applicationUrl": "http://localhost:5109"
},

Traces (log level = information) with Redis information:

S3:

Change the filter to allow instrumentation of s3.

services.AddOpenTelemetry()
.WithEventSourcingTracing(environment,
cfg =>
{
cfg
.AddAspNetCoreInstrumentation(m =>...)

.AddHttpClientInstrumentation(m =>
{
// m.Enrich
m.RecordException = true;
m.FilterHttpRequestMessage = m =>
{
// remove it to record s3 tracing
//if (m.RequestUri?.Host == "s3.amazonaws.com")
// return false;
return true;
};
})

.AddGrpcClientInstrumentation()
.AddOtlpExporter();
// if (environment.IsDevelopment())
// cfg.AddConsoleExporter();
})

It will result in the following traces (log level = information):

Performance Awareness

Tracing each and every operation in event-driven systems can provide extensive insights, but it comes at the cost of performance. Balancing the need for detailed traces with system efficiency is a crucial tradeoff. To navigate this challenge, implementing a smart sampling strategy is key. However, it is essential to handle the sampling implementation with care to avoid breaking the trace sequence and obtaining misleading flow indicators. The trace Id should be honored, ensuring that all traces associated with a particular Id are sampled together. This approach provides a more accurate picture of the system’s behavior while maintaining performance.
To check it in action go to `OpenTelemetryExtensions.cs` and enable the `.SetSampler<TraceSampler>()` in the telemetry builder.

Make sure to inspect the `TraceSampler` before putting it into production!

The `TraceSampler` will sample traces according to their trace id, you can change the `SAMPLE_RATE` const in order to throttle the sampling rate.

internal class TraceSampler : Sampler
{
private const int SAMPLE_RATE = 4;

public override SamplingResult ShouldSample(in SamplingParameters samplingParameters)
{
int hash = samplingParameters.TraceId.GetHashCode();
if ((hash % SAMPLE_RATE) == 0)
return new SamplingResult(SamplingDecision.RecordAndSample);
return new SamplingResult(SamplingDecision.Drop);
}
}

Environment Variable

Set the Environment Variable of the REDIS & S3:

  • REDIS_EVENT_SOURCE_ENDPOINT
  • REDIS_EVENT_SOURCE_PASS
  • S3_EVENT_SOURCE_ACCESS_KEY
  • S3_EVENT_SOURCE_SECRET
  • S3_EVENT_SOURCE_REGION

Conclusion

In the world of event-driven systems, effective telemetry is all about finding the right balance. Rather than drowning in a sea of data, focusing on key indicators + smart sampling strategy provides a clear and actionable overview of system health. By strategically expanding telemetry when needed, we gain the necessary insights to troubleshoot and optimize performance. Remember, less can indeed be more when it comes to telemetry.

Stay tuned for more insights into our EventSourcing.Backbone Series, where we continue to explore the fascinating world of event-driven architecture.

Don’t forget to check out the other posts in our EventSourcing.Backbone Series.

Next

This post is the first of a series about event sourcing and an exciting framework called EventSourcing.Backbone. Enter the series to learn more.

--

--