Correlate Integration Events With The Command That Triggered Them In Application Insights
Extend the Application Insights’ distributed tracing by background jobs or integration events telemetry correlation
Application Insights is a great tool for monitoring your system. Configuring is really easy too — since it’s a Microsoft product, you only need to click “Add Application Insights” in Visual Studio, and you’re all set up. You will start getting all kinds of logs from application exceptions up to hosting machine CPU performance.
One really cool stuff with Application Insights is that distributed tracing is enabled by default. In other words, a request that started from your frontend, executed an API endpoint, and that endpoint queried the database and called another HTTP integration or service before returning a result — all of these operations will be correlated when you inspect the logs.
One thing that is not correlated out-of-the-box though are background jobs and integration events (unless you use Microsoft Azure Service Bus Client).
Correlating background jobs and integration events
Let’s say we have a setup where a command can raise an integration event:
public async Task CommandHandler(Command command)
{
// business logic code await dbContext.SaveAsync();
await eventDispatcher.Publish(new MyEvent());
}
Continuing with our case, we have the infrastructure code setup such that it starts all of the handlers for MyEvent
as a background jobs (we could use Hangfire for example).
In the context of monitoring, when inspecting the logs we would like all of the handlers for MyEvent
to be correlated to the initial command/request, as we saw above with the database queries.
If we log the command and event handlers, however, we will see this:
Operation ID
, which correlates operations in Application Insights, is different. This means they will show up as a totally different requests in Application Insights portal, and we will have no way to know where these integration events originated from. This will make inspecting our logs more difficult when looking for bugs.
This happens because the background job service executes in another context. It can be a message queue as well, which may even execute on a different machine. Application Insights therefore assigns new Operation ID to those methods.
Solution for this? We need to get the Operation ID before we go to the other context (background job), propagate it there, and then set it up in the tracking client in the other context. Let’s see how we can do each of these steps.
This is how we can get the current Operation ID from Application Insights, just before we raise the integration event and switch context.
Basically we are creating a fake RequestTelemetry
and initialize it with the telemetry context. It is essential that we use OperationCorrelationTelemetryInitializer
for initializing custom telemetry requests — as that is the initializer that knows the current context and all the necessary telemetry properties that it needs to assign to the request telemetry, and the Operation ID is one of them. You might be tempted to use System.Diagnostics.Activity.Current.RootId
as an Operation ID, which might work, but if you look into the code of the initializer, you’ll see that it does a bunch of checks to make sure the correct properties are initialized.
And if you are wondering where did telemetryConfiguration
come from, you just inject TelemetryConfiguration
through the constructor.
Now that we have the Operation ID, which can correlate different monitoring events, we need to pass it to the other, background context. One way we can go about it is if we add a baggage to our MyEvent
object, which will contain the Operation ID.
Once in the background context, and having the Operation ID, we need to pass the operation ID back to the Application Insights tracker, so it can be correlated to the calling command.
We can create a simple decorator over our event handlers, that could look like this:
We should always use RequestTelemetry
or DependencyTelemetry
and set the Operation ID (and any other properties) on them. We can set the Name of the operation, add custom properties such as Data and so on. We should never set the Operation ID directly on the TelemetryClient
as that will make it use that same ID for all telemetry!
In order for all subsequent telemetry in the event handler to be automatically correlated again, such as HTTP requests and database queries, we need to wrap the handler with a Start/Stop operation.StartOperation
does the proper initialization of your request. We just need to override the Operation ID with the value we got from the event baggage. Once we start the operation, all the following monitoring events will automatically be correlated to the same operation ID.
As soon as we call StopOperation
, all operation’s telemetry data will be sent to Application Insights.
Just so we can see a final result of what we did, here is how a more complex scenario is now tracked in Application Insights:
- Command queries/writes to the database and calls 3 HTTP dependencies
- The command also raises 2 integration events
- Each integration event queries the database
- One of the integration events calls an HTTP dependency
Conclusion
While you could still debug most of the issues without correlating integration events, you can be certain that one time you would encounter such a bug that you would want to know which command raised the integration event, and what were the properties.
Don’t let that moment come when you can anticipate this scenario and be ready when it comes.