Google Managed Prometheus with OpenTelemetry

Pranav Dhopey
Google Cloud - Community
9 min readMay 3, 2023

Metric collection & visualization is one of the very important aspects of application reliability and also help business owners to make easy decisions for future goals. so in this domain, we have got a variety of services present in the market & Prometheus is one of the most popular open-source tools for metrics, but here we’ve some issues related to managing open-source Prometheus in the context of scalability, infra management, etc. which can be laid a nightmare for any big organization if something goes down.

So here we’ve one solution from Google Cloud to overcome all these hurdles & make it a very smooth experience with the fully managed offering called Managed Service for Prometheus.

What is Managed Service for Prometheus?
Google Cloud Managed Service for Prometheus is Google Cloud’s fully managed, multi-cloud, cross-project solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.

Managed Service for Prometheus collects metrics from Prometheus exporters and lets you query the data globally using PromQL, meaning that you can keep using any existing Grafana dashboards, PromQL-based alerts, and workflows. It is hybrid- and multi-cloud compatible, can monitor both Kubernetes and VM workloads, retains data for 24 months, and maintains portability by staying compatible with upstream Prometheus. You can also supplement your Prometheus monitoring by querying over 1,500 free metrics in Cloud Monitoring, including free GKE system metrics, using PromQL.

Managed Service for Prometheus is built on top of Monarch, a globally scalable data store used for Google’s own monitoring. Because Managed Service for Prometheus uses the same backend and APIs as Cloud Monitoring, both Cloud Monitoring metrics and metrics ingested by Managed Service for Prometheus are queryable by using PromQL in Cloud Monitoring, Grafana, or any other tool that can read the Prometheus API.

Managed Service for Prometheus splits up into multiple components:

Data collection:
It is handled by managed collectors, self-deployed collectors, or the Ops Agent, which scrapes local exporters and forwards the collected data to Monarch.

Query evaluation:
It is handled by managed collectors, self-deployed collectors, or the Ops Agent, which scrapes local exporters and forwards the collected data to Monarch.

Rule and alert evaluation:
It is handled by locally-run and locally-configured rule evaluator components, which execute rules and alerts against the global Monarch data store and forward any fired alerts to Prometheus AlertManager.

Data storage:
It is handled by Monarch, which stores all Prometheus data for 24 months at no additional cost.

The managed collection is supported for GKE and all other Kubernetes environments. It runs Prometheus-based collectors as a Daemonset and ensures scalability by only scraping targets on colocated nodes. You configure the collectors with lightweight custom resources to scrape exporters using pull collection, then the collectors push the scraped data to the central data store Monarch.

Let’s start with the configuration.

  1. Enable API

Go to APIs & Services, Click Enable APIs and Services and search for “Monitoring”.
In the search results, click through to “Cloud Monitoring API”. If “API enabled” is not displayed, then click the Enable button.

2. Enable Managed Service for the Prometheus feature

Create a GKE Cluster with Managed service for Prometheus feature enabled. Below is the sample command to create a GKE cluster with managed Prometheus feature enabled.

gcloud container clusters create CLUSTER_NAME --zone ZONE --enable-managed-prometheus

If you already have a GKE cluster running then you can update the existing cluster to enable Managed service for Prometheus with the following command.

gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --zone ZONE
OR
gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --region REGION

Google Cloud Managed Service for Prometheus with OpenTelemetry Collector

OpenTelemetry Collector is used to scrape standard Prometheus metrics and reports those metrics to Google Cloud Managed Service for Prometheus. The OpenTelemetry Collector is an agent that can be deployed and configured to export to Managed Service for Prometheus.

The Collector consists of three components that access telemetry data:
Receivers:
A receiver defines how data gets into the Collector (push or pull-based). Receivers may support one or more data sources like logs, traces, and metrics.
Processors:
Processors are run on data between being received and being exported.
Exporters:
It is how the data is sent to one or more backends/destinations.

OpenTelemetry Collector offers the following advantages:
1. You can configure different exporters in your pipeline to route your telemetry data to multiple backends.
2. Open Telemetry supports three types of signals metrics, logs, and traces
3. Open Telemetry supports various libraries and pluggable collector components which allows for customizability options for receiving, processing, and exporting your data.

This blog mainly focuses on using an Open Telemetry Collector with Managed service for Prometheus and the steps involved in integrating and collecting metrics from resources like Pod, and GCE VM using kubernetes_sd_config and node exporter and sending it to managed service for Prometheus.

Let’s Start with the Setup:

  1. Create an IAM Service Account

Create a Google IAM service account named “gmp-test-sa”.

gcloud config set project PROJECT_ID \
&&
gcloud iam service-accounts create gmp-test-sa

2. Grant the required permissions to the service account

Grant an IAM SA “Monitoring Viewer” and “Metric Writer” role.

gcloud projects add-iam-policy-binding PROJECT_ID\
— member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
— role=roles/monitoring.metricWriter \
— role=roles/monitoring.viewer

3. Create K8s Namespace

Now create a K8s SA named “monitoring” on the GKE cluster on which the Managed Service for Prometheus is enabled.

kubectl create namespace monitoring

4. Bind K8s SA with GCP IAM SA

Bind the GCP IAM SA named “gmp-test-sa” with K8s default SA under the monitoring namespace with the following command.

gcloud iam service-accounts add-iam-policy-binding \
— role roles/iam.workloadIdentityUser \
— member “serviceAccount:PROJECT_ID.svc.id.goog[monitoring/default]” \
gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \

5. Annotate K8s SA

Annotate the default SA under the monitoring namespace with GCP IAM SA.

kubectl annotate serviceaccount \
— namespace monitoring \
default \
iam.gke.io/gcp-service-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com

6. Deploy an example application

Now deploy an example application that emits the example_requests_total counter metric and the example_random_numbers histogram metric on its metrics port. The application uses three replicas.

kubectl create namespace app
kubectl -n app apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/main/examples/example-app.yaml

7. Create a config.yaml to set up the OpenTelemetry Collector

First, we need to create a config and place it in the config.yaml file which will be loaded in the OpenTelemetry Collector deployment as a ConfigMap. When the collector is deployed, it mounts the ConfigMap and loads the file.

receivers:
prometheus:
config:
scrape_configs:
— job_name: ‘kubernetes-pod’
kubernetes_sd_configs:
— role: pod
relabel_configs:
— source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: prom-example
— source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
— source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: $1:$2
target_label: __address__
— action: labelmap
regex: __meta_kubernetes_pod_label_(.+)

processors:
resourcedetection:
detectors: [gcp]
timeout: 10s

batch:
# batch metrics before sending to reduce API usage
send_batch_max_size: 200
send_batch_size: 200
timeout: 5s

memory_limiter:
# drop metrics if memory usage gets too high
check_interval: 1s
limit_percentage: 65
spike_limit_percentage: 20
# Note that the googlemanagedprometheus exporter block is intentionally blank

exporters:
googlemanagedprometheus:

service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch, memory_limiter, resourcedetection]
exporters: [googlemanagedprometheus]

This is a configuration file for OpenTelemetry Collector that includes a Prometheus receiver to scrape metrics from Kubernetes pods using the kubernetes_sd_configs mechanism, and export them to a Google Managed Prometheus instance using the googlemanagedprometheus exporter.

kubectl -n monitoring create configmap otel-config — from-file config.yaml

8. Deploy the OTEL collector

Create a file called collector-deployment.yaml with the following content. This deployment yaml file contains three sections and those are ClusterRole, ClusterRoleBinding, and Deployment.

Create the Collector deployment in your Kubernetes cluster by running the apply command in the monitoring namespace as follows.

kubectl -n monitoring create -f collector-deployment.yaml

9. Deploy the standalone Prometheus frontend UI

We can deploy a standalone Prometheus frontend UI to access and visualize ingested data.

curl https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/main/examples/frontend.yaml |
sed ‘s/\$PROJECT_ID/PROJECT_ID/’ |
kubectl apply -n monitoring -f -

Port-forward the frontend service to your local machine. The following command forwards the service to port 9090

kubectl -n monitoring port-forward svc/frontend 9090

You can access the standalone Prometheus frontend UI in your browser at the URL http://localhost:9090.

10. Deploy Grafana

Apply the grafana.yaml manifest on your cluster and port-forward the Grafana service to your local machine.

kubectl -n monitoring apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/main/examples/grafana.yaml

The following command forwards the service to port 3000.

kubectl -n monitoring port-forward svc/grafana 3000

You can access Grafana in your browser at the URL http://localhost:3000 with the username:password as admin:admin.

To configure the Data Source refer to the link :

https://cloud.google.com/stackdriver/docs/managed-prometheus/query#grafana-datasource

So till now, we have enabled managed Prometheus on the GKE cluster, deployed OTEL collector under the monitoring namespace, and also deployed example application in the app namespace.
We have also configured standalone Prometheus UI and ephemeral Grafana to visualize the data from Managed Service for Prometheus.

On the Prometheus dashboard, execute a query “up{job=”kubernetes-pod”}” and check the result.

We are able to see the pods for example application running under the app namespace.
We can also fetch a similar output in the Monitoring dashboard under the Metric Explorer in the PromQL Query section as below.

11. Node Exporter on Linux VM

Now we will see how to scrape the metrics from GCE VM using node exporter for that first configure the Linux VM in GCP with the below command.

gcloud compute instances create ubuntu-vm --zone=asia-south1-a
--machine-type=e2-medium --create-disk=auto-delete=yes,boot=yes,device-name=ubuntu-vm,image=
projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20230302,size=30,type=pd-standard

Configure the user for the node exporter service

sudo groupadd -f node_exporter
sudo useradd -g node_exporter --no-create-home --shell /bin/false node_exporter
sudo mkdir /etc/node_exporter
sudo chown node_exporter:node_exporter /etc/node_exporter

Download the desired version for the node exporter on the VM with the help of the below reference link.

Ref: https://prometheus.io/download/#node_exporter

wget https://github.com/prometheus/node_exporter/releases/download/v*/node_exporter-*.*-amd64.tar.gz
tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64
sudo cp node_exporter /usr/bin/
sudo chown node_exporter:node_exporter /usr/bin/node_exporter

Set up node exporter service with the below set of commands.

sudo cat << EOF | tee -a /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Documentation=https://prometheus.io/docs/guides/node-exporter/
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
Restart=on-failure
ExecStart=/usr/bin/node_exporter \
--web.listen-address=:9100

[Install]
WantedBy=multi-user.target
EOF


sudo systemctl daemon-reload
sudo systemctl start node_exporter

Ref: https://developer.couchbase.com/tutorial-node-exporter-setup

Once the VM is set up, update the config.yaml file created for OTEL collector under GKE with the below content and recreate the ConfigMap otel-config.

      - job_name: asia-south1-vms
gce_sd_configs:
- project: PROJECT_ID
zone: 'asia-south1-a'
port: 9100
- project: apt-memento-382004
zone: 'asia-south1-b'
port: 9100
- project: apt-memento-382004
zone: 'asia-south1-c'
port: 9100
relabel_configs:
- source_labels: [__meta_gce_instance_name]
regex: gke-.*
action: drop
- source_labels: [__meta_gce_tags]
action: keep
- source_labels: [__meta_gce_private_ip]
regex: '(.+):(.+)'
replacement: '${1}:9100'
target_label: __address__
- source_labels: [__meta_gce_instance_id]
target_label: instance_id
- source_labels: [__meta_gce_instance_name]
target_label: instance_name

The above job is to scrape the GCE VMs in PROJECT_ID in all three zones of the Asia-South1 region in GCP on Port 9100.

Once the new ConfigMap gets updated in the Pod, the OTEL collector will start scraping GCE VMs. We can Query a PromQL query up{job=”asia-south1-vms”} on a Monitoring dashboard as below.

We can see the VMs are scraped by Prometheus as per the scraping configuration passed in the OTEL config.

So far we have seen how the Otel Collector is used to scrape the target metrics of Pods and GCE VMs and send them to the Managed Prometheus.

In the next blog, we will see how to use OTEL collector to scrape the metrics of Lambda Function from AWS using Cloud Watch Exporter in the Managed Service for Prometheus.

Thank you for Reading!

--

--

Pranav Dhopey
Google Cloud - Community

Cloud Engineer | Devops | Linux Admin | Kubernetes | 3xGCP | Terraform