CodeShake
Published in

CodeShake

Ultimate Google Cloud Operations configuration for external services

As SRE, you already know the difficulty to manage applications across multiple cloud providers and managed services. Having a unique place to monitor all your services including third-party databases, queues and more has never been easier using OpenTelemetry.

Google Cloud operations (formerly stackdriver) provides a set of tools to monitor, troubleshoot, and improve cloud infrastructure, software, and application performance on GCP and AWS.
In this article, we will empower Cloud Operation Monitoring with an Elasticsearch service deployed on Elastic Cloud thanks to the OpenTelemetry collector (also known as otel-collector).

OpenTelemetry Collector, the vendor-agnostic glue for all your services

The otel-collector is a simple standalone agent to:

  • Receive or scrape data (receivers) from platforms (k8s cluster, vCenter, …), aggregators (CollectD, OpenCensus, …), services (Kafka, MongoDB, …), and more (receivers list).
  • Transform data (processors) by adding context labels, sampling data, reword metric name, and more (processors list)
  • Export data (exporters) to backends like Cloud Operations, Datadog, Kafka, and more (exporters list)

Why did I choose to illustrate this article with an ELS cluster ?

More and more customers use services from the marketplace. Most of the time, these services provide their self monitoring system. It’s also the case for elastic cloud. Cloud Operation does not provide (yet) native monitoring for Elastic cloud integration, if something goes wrong, MTTD will drastically increase.

The technical part

Let’s start with the otel-collector configuration.

As described in the service block, the otel-collector pulls data from my elasticsearch cluster, adds labels on metrics (useful to identify clusters), renames the metric with a usable metric key in Cloud Monitoring and then send it to GCP.

The resource and resourcedetectionprocessors helped me to add more revelents labels on the metrics and the service_resource_labels configuration on the exporter allowed to propagate them to Cloud Monitoring.
If you want to dig into labels, I strongly suggest you to add a file exporter to inspect labels set on the metric (not on the datapoints).

Now let’s start the collector locally using the Docker image (in production, take a look on this manifest to run it on a GKE cluster for example):

docker run -ti --rm \
-v $(pwd)/service-account-key-file.json:/etc/otel/key.json \
-v $(pwd)/otel-els.yaml:/etc/otel/config.yaml \
--env GOOGLE_APPLICATION_CREDENTIALS=/etc/otel/key.json \
--env OTEL_RESOURCE_ATTRIBUTES="service.namespace=my-deployment,service.name=test,service.version=6816xx" \
otel/opentelemetry-collector-contrib \
--config=/etc/otel/config.yaml

If your credentials are correctly configured, all components are started successfully:

Metrics on Cloud Monitoring

In Cloud monitoring, you can now use the metric explorer to list all metrics provided by the elasticsearch otel-receiver:

And then, enjoy all datas and labels:

Based on these new metrics in Cloud Monitoring, you can easily create alerts, dashboard or SLOs in a unique way for all your services in and outside of Google Cloud 🥳🥳

If you want to go deeper, take a look on Google opinionated extension of the otel-connector

In a production environment, don’t forget to get metrics from the collector itself. By default, it exposes its own metrics on port 8888.
You can add the prometheus receiver to scrape metrics and send them on Google using another service configuration.

Resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jérôme NAHELOU

Cloud rider at SFEIR the day, Akita Inu lover #MyAkitaInuIsNotAWolf