Screenshot of the Azure Monitor dashboard generated using the corresponding GitHub repo

Monitoring and Logging for Terraform Enterprise — Azure Monitor

Peyton Casper
6 min readJul 6, 2020

--

Introduction

In the first post of this series, we explored Terraform Enterprise (TFE) and presented a starting point for monitoring TFE. In this post, we’re going to focus solely on Azure Monitor, including how to set up the OMS Agent, the various Log Analytics queries, and the Azure Dashboard featured above. All of the corresponding code and Terraform code to set up TFE on Azure can be found in the accompanying GitHub repo.

OMS Agent

Azure has a few different agents used to collect metrics from VMs, which is primarily based on the operating system, destination system, and functionality required. In our case, we’re running TFE on a Linux VM and will be streaming metrics to a Log Analytics workspace. Azure has both the MMA agent for Windows and the OMS agent for Linux VMs, which support sending metrics and logs to our Log Analytics workspace.

Azure also supports two methods of installing the OMS agent, either bare metal or via a Docker container. In our specific scenario, we will be using the Docker variant as it can tap directly into the Docker log socket and provides integration into the Azure Marketplace Container Insights that we will set up later in this post.

I’m going to start by assuming you either already have TFE deployed and have a Log Analytics workspace or have deployed the example resources that are detailed in this GitHub repo at the end of this post. In the latter case, an OMS agent has already been deployed as part of the startup script.

Installation

  1. Export Log Analytics Details
export RESOURCE_GROUP="azure_resource_group"
export LOG_ANALYTICS_WORKSPACE_NAME="log_analytics_workspace_name"

2. Retrieve Log Analytics Workspace ID

az monitor log-analytics workspace show --resource-group $RESOURCE_GROUP --workspace-name $LOG_ANALYTICS_WORKSPACE_NAME  --query "customerId"

3. Retrieve Log Analytics Workspace Primary Key

az monitor log-analytics workspace get-shared-keys --resource-group terraform-resources --workspace-name terraform-workspace103  --query "primarySharedKey"

4. Start the OMS Agent Docker

export WORKSPACE_ID="workspace_id"
export WORKSPACE_KEY="workspace_key"
sudo docker run --privileged -d -v /var/run/docker.sock:/var/run/docker.sock -v /var/log:/var/log -v /var/lib/docker/containers:/var/lib/docker/containers -e WSID="${WORKSPACE_ID}" -e KEY="${WORKSPACE_KEY}" -p 127.0.0.1:25227:25225 -p 127.0.0.1:25226:25224/udp --name="omsagent" -h=`hostname` --restart=always microsoft/oms

Note that the OMS agent above binds to ports 25226 and 25227 on the host. This allows you to configure VM Insights, which would create a port conflict with our OMS agent on the default ports.

Log Analytics

Log Analytics is Microsoft’s log monitoring and analysis tool suite that is built into Azure Monitor. The OMS Agent streams both metrics and log entries back to Log Analytics, which makes this the primary tool that we will use to define queries and metrics for our Azure Dashboard.

Log Analytics Workspace showcasing the schema explorer, query engine and results tab

The Schema Explorer runs down the left-hand side of the Log Analytics workspace and features a few different sub-headers which stem from the data our OMS Agent is streaming back. For this post, we’re only going to focus on data in the LogManagement and Container Monitoring Solution sections.

I’ll be the first to admit that generating these charts is not the most complicated. However, I did want to showcase one example below with supporting screenshots for anyone who is new to Log Analytics and might be trying to get a dashboard set up for TFE.

Container Monitoring is a Log Analytics Marketplace offering that has to be installed separately. If you’ve used the example repo, this has been installed already, otherwise enable it on the Log Analytics workspace that will be collecting TFE’s logs.

Generating Charts

First, we need to define our query within Log Analytics and then generate a chart that we can pin to our dashboard. Grab the RAM per Container query from here and paste it into the Log Analytics editor. Next, click the run button and flip over to the Chart tab. The initial chart won’t be a stacked bar chart, so use the selector to change it.

As mentioned above, we won’t be going through every chart that is covered in the first post, but all of these charts are documented within the Azure directory of the corresponding GitHub repo.

Screenshot outlining the “Chart” button and the resulting chart that is generated by Log Analytics

Pinning

Finally, after that painstaking process, we need to pin our chart to an existing dashboard or create a new one. The “Pin to dashboard” button is outlined in red below.

Screenshot outlining the “Pin to dashboard” within Log Analytics and the resulting pane

Health Checks

Healthchecks typically refers to some form of a continuous poll that queries a given interface for the status of the underlying service. Terraform Enterprise provides such an interface that returns a standard 200 OK status via an HTTP request. In a similar vein, Azure Blob Storage provides a simple availability metric which details the Azure region’s status. Unfortunately, Azure PostgreSQL does not offer as simple of an interface; thus, we will be utilizing a line chart that depicts the number of active connections vs. failed connections as a stand-in.

Application Insights

Azure’s Application Insights provides a mechanism to set up an HTTP Health Check monitor that can be used to poll TFE’s health check endpoint on a specified frequency. Create or utilize an existing Application Insights resource and navigate to the Availability tab to create a URL ping test. These menu items are outlined in red below.

Terraform Enterprise’s health check endpoint is documented here and should look something like this. http://tfe.company.com/_health_check.

Screenshot outlining the Application Insight’s “Availability” and “Add test” buttons along with the “Create Test” pane which is used to add the TFE health check.

Azure Blob Storage

As mentioned above, Azure’s Blob Storage provides a built-in Availability metric that we can leverage. Navigate to the storage account that TFE utilizes, scroll to the Metrics tab, and select the Availability metric, which showcases the availability of Azure Blob Storage within the corresponding region. These menu items are outlined in red below.

Screenshot outlining the Storage Account’s “Metrics” and “Availability” chart

Azure Database for PostgreSQL

The health check for Azure PostgreSQL utilizes active connections and failed connections, which act as a stand-in metric for availability. In a similar fashion to Azure Blob Storage, navigate to the Metrics tab of your Azure DB resource, select Active Connections as the first metric, add an additional metric, and set it to Failed Connections. These menu items are outlined in red below.

Screenshot showing a PostgreSQL chart that contains “Active Connections” vs. “Failed Connections”

Conclusion

Every monitoring solution has subtle differences, and I wanted to provide simple instructions for setting up the dashboard that was introduced in the first post. The next post will focus on setting up these same metrics on GCP’s Operations Monitoring toolset.

Did something trip you up? Have additional questions? Drop a comment below or send me a message on Twitter, and I’ll gladly offer any help that I can.

--

--