Correlate Integration Events With The Command That Triggered Them In Application Insights

Extend the Application Insights’ distributed tracing by background jobs or integration events telemetry correlation

Boris Donev
Nerd For Tech
5 min readApr 17, 2021

--

Application Insights is a great tool for monitoring your system. Configuring is really easy too — since it’s a Microsoft product, you only need to click “Add Application Insights” in Visual Studio, and you’re all set up. You will start getting all kinds of logs from application exceptions up to hosting machine CPU performance.

You get this with absolutely no custom code

One really cool stuff with Application Insights is that distributed tracing is enabled by default. In other words, a request that started from your frontend, executed an API endpoint, and that endpoint queried the database and called another HTTP integration or service before returning a result — all of these operations will be correlated when you inspect the logs.

This greedy request made 6 queries into the database

One thing that is not correlated out-of-the-box though are background jobs and integration events (unless you use Microsoft Azure Service Bus Client).

Correlating background jobs and integration events

Let’s say we have a setup where a command can raise an integration event:

Continuing with our case, we have the infrastructure code setup such that it starts all of the handlers for MyEvent as a background jobs (we could use Hangfire for example).
In the context of monitoring, when inspecting the logs we would like all of the handlers for MyEvent to be correlated to the initial command/request, as we saw above with the database queries.
If we log the command and event handlers, however, we will see this:

Operation ID, which correlates operations in Application Insights, is different. This means they will show up as a totally different requests in Application Insights portal, and we will have no way to know where these integration events originated from. This will make inspecting our logs more difficult when looking for bugs.
This happens because the background job service executes in another context. It can be a message queue as well, which may even execute on a different machine. Application Insights therefore assigns new Operation ID to those methods.

Solution for this? We need to get the Operation ID before we go to the other context (background job), propagate it there, and then set it up in the tracking client in the other context. Let’s see how we can do each of these steps.

This is how we can get the current Operation ID from Application Insights, just before we raise the integration event and switch context.

Basically we are creating a fake RequestTelemetry and initialize it with the telemetry context. It is essential that we use OperationCorrelationTelemetryInitializer for initializing custom telemetry requests — as that is the initializer that knows the current context and all the necessary telemetry properties that it needs to assign to the request telemetry, and the Operation ID is one of them. You might be tempted to use System.Diagnostics.Activity.Current.RootId as an Operation ID, which might work, but if you look into the code of the initializer, you’ll see that it does a bunch of checks to make sure the correct properties are initialized.
And if you are wondering where did telemetryConfiguration come from, you just inject TelemetryConfiguration through the constructor.

Now that we have the Operation ID, which can correlate different monitoring events, we need to pass it to the other, background context. One way we can go about it is if we add a baggage to our MyEvent object, which will contain the Operation ID.

Once in the background context, and having the Operation ID, we need to pass the operation ID back to the Application Insights tracker, so it can be correlated to the calling command.
We can create a simple decorator over our event handlers, that could look like this:

We should always use RequestTelemetry or DependencyTelemetry and set the Operation ID (and any other properties) on them. We can set the Name of the operation, add custom properties such as Data and so on. We should never set the Operation ID directly on the TelemetryClient as that will make it use that same ID for all telemetry!
In order for all subsequent telemetry in the event handler to be automatically correlated again, such as HTTP requests and database queries, we need to wrap the handler with a Start/Stop operation.
StartOperation does the proper initialization of your request. We just need to override the Operation ID with the value we got from the event baggage. Once we start the operation, all the following monitoring events will automatically be correlated to the same operation ID.
As soon as we call StopOperation , all operation’s telemetry data will be sent to Application Insights.

Operation ID is the same, which means they are correlated and can be investigated together

Just so we can see a final result of what we did, here is how a more complex scenario is now tracked in Application Insights:

  • Command queries/writes to the database and calls 3 HTTP dependencies
  • The command also raises 2 integration events
  • Each integration event queries the database
  • One of the integration events calls an HTTP dependency
On top is the initial command and its dependencies. The integration events below have specific type and the anonymized data is also traced.

Conclusion

While you could still debug most of the issues without correlating integration events, you can be certain that one time you would encounter such a bug that you would want to know which command raised the integration event, and what were the properties.
Don’t let that moment come when you can anticipate this scenario and be ready when it comes.

--

--

Boris Donev
Nerd For Tech

As a technical team lead of several startup projects, I've come across different issues which I'll try to document and maybe help someone else.