Develop (and debug) Prometheus configuration locally using data from remote Prometheus database

Miron Gofer
Israeli Tech Radar
Published in
6 min readJul 23, 2020

--

Prometheus operator installed on our GKE cluster, all default metrics and dashboards work great. It is now time to add our own app metrics. A very talented developer added the instrumentation and metric exposure. The service discovery did its magic and all new metrics are in the time-series database. Some cool Grafana dashboards were added and life is good.

My next task is to add alert configuration, meaning firing alerts and configure Alert-Manager to handle them. This is not a trivial task to me. I don’t have a lot of Prometheus configuration syntax experience and there are a lot of moving parts to sync. I don’t want to do it directly on my running cluster (used by others for dev and QAstuff) but I do want to test my configuration using real application metrics from my ‘live’ cluster.

So this is what I did:

first step: Setup a local Kubernetes cluster

I tried all sorts of local kubernetes solutions. I love k3s Managed with k3d. there are other options as well, such as: Minikube, Docker-desktop etc. As long as you get a working kubernetes cluster you can work with.

Using k3d I run

k3d create

Since k3d works so fast and easy, I re-create clusters all the time. To avoid adding a new cluster to the kubeconfig every time i recreate one, I just run:

export KUBECONFIG=”$(k3d get-kubeconfig — name=’k3s-default’)”

and my kubectl knows where to go.

Actually, this is it.

But an important note is in place here: I am going to work on 2 Kubernetes clusters in parallel. Therefore it is very important to make sure every kubectl (and helm) command will be executed against the right cluster. I use tmux to separate of environments. On one window I work with the local k3s cluster (exporting this KUBECONFIG) and on another window I work with my ‘live’ cluster where my app is running and the metrics are comming from. (in my case it is a GKE cluster)

Install Prometheus-operator

I used the same command I used on my GKE cluster. Since I use helm 3 no tiller installation required, Just helm. And then:

helm install prometheus stable/prometheus-operator

My base installation had no values.yaml file and used all the defaults of the helm chart. Developing the values.yamlis what I came here to do. I split it to three files, one for Grafana, one for Prometheus and one for AlertManager.

I wrote my configurations and rules into the values files. So the next command (after writing the values) was:

helm upgrade prometheus stable/prometheus-operator \
-f ./grafana/values.yaml \
-f ./prometheus/values.yaml \
-f ./alertManager/values.yaml

(I think you can guess my directory layout)

Now my new configuration (Prometheus alert rules and AlertManager configuration) are installed. I know my yaml syntax is correct and if I want to look at the resulting config I just run:

kubectl port-forward service/prometheus-prometheus-oper-alertmanager 9093

To look into AlertManager web ui at http://localhost:9093 and:

kubectl port-forward service/prometheus-prometheus-oper-prometheus 9090

To look into prometheus web ui at http://localhost:9090 (tmux can help here, I open 2 panes for the port-forward commands to run and use the 3rd for everything else)

This is great. But, remember that I want to test it ‘for real’. To see if my alerts works with real app metrics. The local Prometheus db is full of local Kubernetes metrics (and some Prometheus-operator metrics as well) but not my metrics. So nothing will ever trigger my new alerting rules. I need some real data.

Create the tunnel to the remote Prometheus (running on GKE cluster)

To work on my GKE installation I need to login with my Google account and connect the VPN. Let’s say I did it both. So, on another terminal (tmux window in my case) I use the Kubeconfig file that I got from GCloud by running:

gcloud container clusters get-credentials <my-cluster-name> — region <my-region>

So now, kubectl works with my GKE cluster. And I can port-forward my Prometheus service to a localhost port. I use the — — address 0.0.0.0 because I need it to be available on an IP that is not the loopback 127.0.0.1 . The loopback address can’t be used by the external-address endpoint I am about to create. Adding this flag ( — — address) enables me to access Prometheus from any IP in my laptop’s root namespace. (I work in a secure environment. I don’t think using this setting with a coffee shop wifi is a good idea. Don’t!)

kubectl -n monitoring port-forward — address 0.0.0.0 services/prometheus-prometheus-oper-prometheus 39090:9090

On the local cluster I run only Prometheus-operator so I allow myself to run it on the default namespace. On the real GKE cluster it gets its own namespace: monitoring. I use the local port 39090 because port 9090 is already taken by the local Prometheus. (And you can’t open low ports on 0.0.0.0 without being root.)

Create external endpoint and ExternalName service

Now I’m going to use one of the coolest features of Kubernetes service. Services in Kubernetes are used to route traffic to a group of pods selected by a selector. But, a service can actually route traffic to any endpoint. Including an external IP.
So first, I created such a service, using this yaml file:

apiVersion: v1
kind: Service
metadata:
name: federate
labels:
k8s-app: federate
spec:
type: ExternalName
externalName: 10.2.0.44
clusterIP: “”
ports:
- name: prom
port: 39090
protocol: TCP
targetPort: 39090

Let’s have a look at it: I named it federate since I am going to use the Prometheus federation to get all metrics from my remote Prometheus db. The type: ExternalName tells kubernetes that this service route to, well, external name (or ip). The externalName: points to the IP of my VPN connection. You can use the IP of your machine. (Again, do not do it if your IP comes from a public, unsecured network). ExternalName service is basically DNS translator. It returns the external name (or IP) instead of the service name. This is why it can not return localhost or 127.0.0.1. You must use a DNS name or some other IP.

The rest is a regular service configuration. The thing with ExternalName type services is that unlike other type of services (ClusterIP, NodePort or LoadBalancer) Kubernetes will not create an EndPoint object for you. You need to do it yourself. So I add this section to my .yaml:

---
apiVersion: v1
kind: Endpoints
metadata:
name: federate
labels:
k8s-app: federate
subsets:
- addresses:
- ip: 10.2.0.44
ports:
- name: prom
port: 39090
protocol: TCP

Now, Kubernetes creates all the necessary routing and I can access Prometheus server that runs on my GKE cluster via a local service in my k3s cluster running on my laptop. (Actually, I don’t really need the service object to connect to an external IP. EndPoint is enough. But, the ServiceMonitor object I am going to set next, needs a service object. EndPoint is not enough for it.) The only thing left to do is config my local Prometheus to get the metrics from the remote one. In Prometheus lingo we say: ‘to scrap the metrics from the federate endpoint of the remote server’. Luckily, we use Prometheus-operator so adding scrap configuration is as easy as adding a ServiceMonitor object to the Kubernetes cluster. Therefore I added this section to my yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: federate
labels:
k8s-app: federate
prometheus: kube-prometheus
spec:
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
k8s-app: federate
endpoints:
- port: prom
path: /federate
params:
match[]:
- ‘{namespace=”staging”}’
interval: 10s
honorLabels: true

ServiceMonitor finds a service to scrap using the namespaceSelector and selector fields. In my case you can see it points to the service I created earlier. The endpoint to scrap is the federation endpoint of Promehteus: /federate. But, to scrap it, a ‘matcher’ must be provided. In my case, I need all the metrics from the staging namespace (on my remote GKE cluster).

When everything is in place, I can go to the local Prometheus web-UI (remember? http://localhost:9090) and run queries on my app metrics. All the metrics are there. And, of course, Now I can test my alert configuration against real data.

To read more about:

Kubernetes ExternalName type services — https://kubernetes.io/docs/concepts/services-networking/service/#externalname

Promehteus federation — https://prometheus.io/docs/prometheus/latest/federation/

Prometheus-operator serviceMonitor CRD — https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md

--

--