Blackbox monitoring autodiscovery in kubernetes

Published in

Juro Tech

6 min readAug 4, 2023

In the world of software engineering, it’s essential to monitor the availability and performance of your applications. There are some crucial questions we can ask ourselves:

Is the system functioning as expected or are there any unexpected outputs or errors?
Is the system performing optimally in terms of speed, responsiveness, and throughput?
How well is the system performing from the end user’s perspective? Are the system’s load times, response times, and transaction success rates within acceptable limits?
Are there any problems caused by external factors, such as network issues, server overload, or third-party services?
Can we monitor and ensure the performance and functionality of different systems or services, irrespective of their internal design or technology stack?
Do we have a redundant monitoring layer that can catch system-level failures not detected by internal component-level monitoring?
Are we meeting the output and performance standards defined by our Service Level Agreements (SLAs) and Service Level Objectives (SLOs)?

Blackbox monitoring answers all of those questions and more.

Overview

What is Blackbox Monitoring?

Blackbox monitoring is a type of application monitoring that focuses on the behavior of an application from the outside, without needing any access to the application’s code or internal workings. It’s called blackbox monitoring because the application is treated like a black box — we can not leverage additional insights of the application and only the input/output behavior is monitored via the probe results.

How Does Blackbox Monitoring Work?

Blackbox monitoring involves monitoring the input and output of an application, without needing any knowledge of the application’s code or internal workings. It can be done in a number of ways, such as:

Network monitoring: Monitoring the traffic going in and out of the application, including HTTP/S, DNS, TCP, ICMP, gRPC requests, API calls, and database queries.
End-user monitoring: Monitoring the application’s user interface from the perspective of the end-user, including page load times, error rates, and user behavior.
Synthetic monitoring: Creating synthetic transactions to simulate real-world user behavior, and monitoring the application’s response to these transactions.

Why is Blackbox Monitoring Important?

Blackbox monitoring is important because it allows you to monitor the behavior of an application from the outside, without needing any access to the application’s code or internal workings. This means that you can monitor the performance of an application in production, where it’s difficult or impossible to modify the code or configuration. Blackbox monitoring is also useful for monitoring third-party services and APIs, which you don’t have control over.

Blackbox monitoring provides insights into the behavior of an application that can help you identify performance issues, diagnose errors, and improve the overall user experience. By monitoring the input and output of an application, you can detect problems such as slow response times, errors, and downtime, and take proactive measures to resolve them before they impact your users.

Blackbox monitoring in Kubernetes

What is blackbox-exporter?

Blackbox Exporter is an open-source tool developed by the Prometheus community that enables blackbox monitoring by sending probes to endpoints via the multi target exporter pattern. It generates metrics based on the probe results in a Prometheus-compatible format, making it easy to scrape and analyze the data using Prometheus or other compatible monitoring tools.

There are numerous guides on how to deploy blackbox-exporter in Kubernetes with common configurations. Below we’ll focus on how to configure auto-discovery of targets for the blackbox exporter in Kubernetes.

Blackbox-exporter autodiscovery

Most configurations of blackbox-exporter in Kubernetes are based on a list of static hosts. However, as the numbers services start to grow we need an automatic way find and monitor new services when they are added to our environment. This eliminates the need for manual configuration every time something new is added.

Prometheus already supports auto-discovery of Kubernetes endpoints, services and ingresses. We can just reuse that mechanism in the context of blackbox exporter. In Juro we currently use Grafana-Agent and Mimir as opposed to just using Prometheus. The Grafana-Agent supports adding Prometheus configurations via a secret.

Below is the auto-discovery for services and ingresses within Kubernetes.

---
apiVersion: v1
kind: Secret
metadata:
  name: additional-scrape-configs
  namespace: monitoring
stringData:
  ingress-and-services-autodiscovery.yaml: |
    # More can be found here:
    # - https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml
    # - https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml
    - job_name: "blackbox/services"

      metrics_path: /probe
      params:
        module: [http_2xx]

      kubernetes_sd_configs:
        - role: service

      relabel_configs:
        # Add "blackbox.io/enabled: blackbox" annotation to enable blackbox monitoring
        - source_labels: [__meta_kubernetes_service_annotation_blackbox_io_enabled]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_port_number]
          regex: "443"
          action: drop
        # Adds scheme depending on the service port name
        - source_labels: [__meta_kubernetes_service_port_name]
          regex: "(https?)"
          target_label: __scheme__
          replacement: "$1"
        # Overwrite the scheme using blackbox.io/scheme annotation
        - source_labels: [__meta_kubernetes_service_annotation_blackbox_io_scheme]
          regex: "(http|https)"
          target_label: __scheme__
          replacement: "$1"
        # If no annotation path specified default to /api/health
        - source_labels: [__scheme__, __address__]
          regex: "(.+);(.+):(.+)"
          target_label: __param_target
          replacement: "${1}://$2:$3/api/health"
        # Overwrite the path using blackbox.io/path annotation
        - source_labels:
            [
            __scheme__,
            __address__,
            __meta_kubernetes_service_annotation_blackbox_io_path,
            ]
          regex: "(.+);(.+);(.+)"
          replacement: ${1}://${2}${3}
          target_label: __param_target
        # Overwrite the module using blackbox.io/module annotation
        - action: labelmap
          regex: __meta_kubernetes_ingress_annotation_blackbox_io_module
          replacement: __param_module
        - target_label: __address__
          replacement: blackbox-exporter:9115
        - source_labels: [__param_target]
          target_label: instance
        - source_labels: [__meta_kubernetes_namespace]
          target_label: namespace
        - source_labels: [__meta_kubernetes_service_name]
          target_label: service
        # Specify 'Host' header for probe requests using blackbox.io/hostname annotation
        # https://github.com/prometheus/blackbox_exporter#prometheus-configuration
        - source_labels: [__meta_kubernetes_service_annotation_blackbox_io_hostname]
          target_label: __param_hostname
        - source_labels: [__meta_kubernetes_service_annotation_blackbox_io_hostname]
          target_label: hostname

        # https://grafana.com/docs/agent/latest/operator/add-custom-scrape-jobs/
        - action: hashmod
          modulus: $(SHARDS)
          source_labels:
          - __address__
          target_label: __tmp_hash
        - action: keep
          regex: $(SHARD)
          source_labels:
          - __tmp_hash

    # Example scrape config for probing ingresses via the Blackbox Exporter.
    #
    # The relabeling allows the actual ingress scrape endpoint to be configured
    # for all or only some services.
    - job_name: "blackbox/ingresses"

      metrics_path: /probe
      params:
        module: [http_juro_health] # This module will have to be added 

      kubernetes_sd_configs:
        - role: ingress

      relabel_configs:
        # Add "blackbox.io/enabled: blackbox "annotation to enable blackbox monitoring
        - source_labels: [__meta_kubernetes_ingress_annotation_blackbox_io_enabled]
          action: keep
          regex: true
        - source_labels:
            [
              __meta_kubernetes_ingress_scheme,
              __address__,
            ]
          regex: (.+);(.+)
          replacement: ${1}://${2}/api/health
          target_label: __param_target
        # Overwrite the path using blackbox.io/path annotation
        - source_labels:
            [
              __meta_kubernetes_ingress_scheme,
              __address__,
              __meta_kubernetes_ingress_annotation_blackbox_io_path,
            ]
          regex: (.+);(.+);(.+)
          replacement: ${1}://${2}${3}
          target_label: __param_target
        # Overwrite the module using blackbox.io/module annotation
        - action: labelmap
          regex: __meta_kubernetes_ingress_annotation_blackbox_io_module
          replacement: __param_module
        - target_label: __address__
          replacement: blackbox-exporter:9115
        - source_labels: [__param_target]
          target_label: instance
        - source_labels: [__meta_kubernetes_namespace]
          target_label: namespace
        - source_labels: [__meta_kubernetes_ingress_name]
          target_label: ingress
        # https://grafana.com/docs/agent/latest/operator/add-custom-scrape-jobs/
        - action: hashmod
          modulus: $(SHARDS)
          source_labels:
          - __address__
          target_label: __tmp_hash
        - action: keep
          regex: $(SHARD)
          source_labels:
          - __tmp_hash

Services:

In order to enable blackbox monitoring for a services we can just add the blackbox.io/enabled: true annotation.
The http schema is derived from the port number of the service. But it can be overwritten via the blackbox.io/schema annotation
The default path for a service health is /api/health but this can also be overwritten via the blackbox.io/path annotation.
The default module can also be overwritten via blackbox.io/module annotation. The main use case is for custom modules. For example, you have a lot of endpoints that use a custom module X but you want a specific endpoint to be tested via a different module Y .

Ingresses:

For ingresses the same annotations are available besides the schema overwriting.

Probe Custom resource

An alternative approach for Prometheus Operator users is to leverage the Probe custom resource instead of relying on the auto-discovery mechanism. This option requires the installation of the Prometheus Operator (or Grafana Agent Operator) in your cluster. It’s worth noting that the current version of the Probe CR lacks a straightforward method for discovering services or endpoints within the cluster. It only supports ingress or static probing. Consequently, if you need to probe service or endpoint objects, configuring a static probe may be necessary. In such cases, the annotation-based auto-discovery mechanism of targets not only scales better as the number of static targets increases but also offers the advantage of easier configuration through the use of annotations.