An Adventure with Datadog and Azure App Services

Robert Glickman
PayScale Tech
Published in
7 min readJun 28, 2019

--

The result can be seen in this repo or it can be installed from the App Service by going to the Extensions tab within the Azure portal. Within that repo is a README that will tell you how to install Datadog in your App Service.

Set up the agent host

Set up a host running the Datadog agent, like so: https://docs.datadoghq.com/tracing/send_traces/. Grant that host a public IP or otherwise make the host accessible on port 8126 from the Azure app service. Make sure the agent has attribute “apm_non_local_traffic: true” (under apm_config).

Add the Datadog nuget package to your application

Add the same version as is used in this extension (currently 1.3.0). See here: https://www.nuget.org/packages/Datadog.Trace.ClrProfiler.Managed

Set the environment variables

Go to your app service and click on “Configuration.” Add new variables by clicking “New application setting” for the following:

DD_AGENT_HOST pointing to the accessible IP for the machine in the step above (“Set up the agent host”).

DD_API_KEY pointing to your Datadog API key (From the Datadog portal: Integrations -> APIs -> API Keys)

DD_ENV set to the environment name you want to appear in Datadog

DD_SERVICE_NAME pointing to the service name you want to appear in Datadog

Add the extension

From the left pane of the app service, select “Extensions.” Click on “Add” and select the “.NET Core Datadog APM x86” for an app that is running .Net Core in a non-self-contained application. Accept the license terms and select “OK” to install. Restart the application for this to take effect (and for any of the above environment variable changes to take effect).

You probably also need to uninstall any other APM extensions you have installed on the app service.

Background

Datadog’s APM product for .NET and .NET Core graduated from beta in April. We had been eagerly awaiting this release, as we had already been monitoring some of our Node.js applications with Datadog with positive results and we wanted to add our various .NET Core services to the mix. Adding these other interleaved services to the same observability platform seemed like an impactful feature to pursue.

Currently, we run most of our .NET Core services on Azure App Services. This product is a hosted solution for services that abstracts away most of the concern for managing the underlying VMs. As such, Azure allows very limited access to the underlying host(s) and provides extensibility through App Service extensions, an open platform for installing pieces of additional functionality within your App Service.

When on the technological edge, one often needs to work through significant ambiguity. While Azure App Services can run with a variety of platforms, it is often natural to see them with .NET and .NET Core applications. Since Datadog’s support for these types of applications is so new, they have yet to offer official support for an App Service extension, and had no solid timeline for when it would be developed. This feature seemed important enough from our perspective to pursue a solution on our own and Datadog support was helpful and impactful through the entire process.

Figuring out how to build an extension

This was my first foray into building an App Service extension. The documentation is scattered, and examples were a good start, but didn’t necessarily cover the requirements for this extension. The primary documentation I found was here, with much of the documentation I found pointing to this ancient example.

Since there are other APM products that already exist as extensions, I sought one of these out as an example. The above documentation referred to a site extensions portal as the source of the extensions and I downloaded a few to no avail, but I couldn’t find, for example, the RayGun APM extension, which I ended up finding through a web search here (this should have been my first clue to a problem I encountered later).

Attempt #1: Manually install agent and APM artifacts on the App Service

With the RayGun extension, it was clear how they were setting many of the environment variables (through applicationHost.xdt), and it seemed like there was an install.cmd and uninstall.cmd which were invoked by convention when the extension was installed and uninstalled, respectively. They were taking the RayGun agent and running it through a WebJob on the App Service.

OK, this seemed like something I could try to replicate. I installed the Agent and .NET Core APM on my dev machine, and copied the files that resulted from the installation to Azure through Kudu. I set up the environment variables that were needed — according to the Datadog documentation — through the App Service application settings, which I previously knew to work for this purpose. I set up the WebJob to run the agent EXE and restarted the app service to allow the changes to take effect.

The result: I was able to get the agent running and logging within the WebJob. There were a ton of errors logged, and I added some environment variables to point config files to the right location on these machines. However, there were a number of errors still about loading files and not being able to access various system resources and the like.

WARN | (pkg/util/log/log.go:482 in func1) | Failed to open registry key: The system cannot find the file specified.

ERROR | (pkg/util/winutil/pdhutil/pdhhelper.go:85 in pdhEnumObjectItems) | Failed to enumerate windows performance counters (class Processor)
ERROR | (pkg/util/winutil/pdhutil/pdhhelper.go:86 in pdhEnumObjectItems) | This error indicates that the Windows performance counter database may need to be rebuilt

ERROR | (pkg/collector/corechecks/loader.go:76 in Load) | core.loader: could not configure check file_handle: Requested counter is a single-instance: Process
ERROR | (pkg/collector/scheduler.go:184 in GetChecksFromConfigs) | Unable to load the check: unable to load any check from config ‘file_handle’

But, there were also some promising logs that indicated things were working to some extent:

INFO | (cmd/agent/app/start.go:121 in StartAgent) | Starting Datadog Agent v6.10.1
INFO | (cmd/agent/app/start.go:149 in StartAgent) | Hostname is: ***************
INFO | (cmd/agent/gui/gui.go:81 in StartGUIServer) | GUI server is listening at 127.0.0.1:5002
INFO | (pkg/forwarder/forwarder.go:154 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: “https://6-10-1-app.agent.datadoghq.com" (1 api key(s))

INFO | (pkg/forwarder/transaction.go:193 in Process) | Successfully posted payload to “https://6-10-1-app.agent.datadoghq.com/intake/?api_key=******************************", the agent will only log transaction success every 20 transactions

Ultimately, I engaged Datadog support at this point with the errors that we were seeing and another possible solution surfaced: to move the Agent to a separate host and have the APM send things to this remote host.

Attempt 2: Separate Agent Host

The APM MSI install involved only two files: the Datadog Native DLL and an integrations.json file, so this part seemed pretty straightforward. There were instructions from Datadog for how to install manually, as well. Setting up a machine with the Datadog agent seemed pretty straightforward, too. So I acquired a VM, installed the agent and configured it with the setting “apm_non_local_traffic: true,” per these instructions. The host was set up to accept traffic from Azure IPs on port 8126 for the metrics being sent over.

The first issue we faced was using the x64 version of .NET Core APM files causing the machine to not be instrumented and for no logs to show up from the APM. The application we were instrumenting was not a self-contained x64 application, so it required the x86 APM files.

With this change, our first sign of positive results! We were seeing logs that indicated that Datadog was attempting to send messages out.

[dotnet.exe] 126944: [info] CorProfiler::JITCompilationStarted() replaced calls from System.Net.Http.HttpClientHandler.SendAsync() to System.Net.Http.HttpMessageHandler.SendAsync() 100663951 with calls to Datadog.Trace.ClrProfiler.Integrations.HttpMessageHandlerIntegration.HttpMessageHandler_SendAsync() 167773341
[dotnet.exe] 126944: [info] CorProfiler::JITCompilationStarted() replaced calls from System.Net.Http.HttpClientHandler.SendAsync() to System.Net.Http.HttpMessageHandler.SendAsync() 100663951 with calls to Datadog.Trace.ClrProfiler.Integrations.HttpMessageHandlerIntegration.HttpMessageHandler_SendAsync() 167773341
[dotnet.exe] 126944: [info] CorProfiler::JITCompilationStarted() replaced calls from System.Net.Http.HttpClientHandler.SendAsync() to System.Net.Http.HttpMessageHandler.SendAsync() 100663951 with calls to Datadog.Trace.ClrProfiler.Integrations.HttpMessageHandlerIntegration.HttpMessageHandler_SendAsync() 167773341

However, on the other side, we were seeing that nothing was being received by the agent.

TRACE | INFO | (pkg/trace/api/api.go:147 in Listen) | listening for traces at http://localhost:8126
TRACE | INFO | (pkg/trace/api/api.go:342 in logStats) | no data received
TRACE | INFO | (pkg/trace/agent/service.go:63 in Run) | total number of tracked services: 0

The key here, I would find out later, was “listening for traces at http://localhost:8126” which indicated that the “apm_non_local_traffic” config setting didn’t take. Looking at an example config showed that the “apm_non_local_traffic” setting should be nested within “apm_config.” Fixing this and we started to receive traces!

TRACE | INFO | (pkg/trace/api/api.go:147 in Listen) | listening for traces at http://0.0.0.0:8126

TRACE | INFO | (pkg/trace/api/api.go:342 in logStats) | [lang:.NET interpreter:.NET Core 4.6.27617.05 tracer_version:1.2.0.0] -> traces received: 207, traces dropped: 0, traces filtered: 0, traces amount: 107659 bytes, services received: 0, services amount: 0 bytes, events extracted: 0, events sampled: 0
TRACE | INFO | (pkg/trace/agent/service.go:63 in Run) | total number of tracked services: 4

Notice now it says “listening for traces at http://0.0.0.0:8126.”

Writing the App Service extension

Now came the portion of writing the App Service extension, which can be seen here. I followed the example of the RayGun APM extension and figured out how to get the environment variables and install set up, which you can infer from the content of that repo. However, the site extensions website has been deprecated in favor of uploading the package to Nuget with packageType of “AzureSiteExtension.” Upload this to nuget.org and wait for it to validate and index the package. At this point the extension was available to install to Azure App Services!

--

--