Best practice, k8s Node/POD resource usage and log monitoring system for multi-k8s cluster environment using Open source

4 min readAug 17, 2022

It is very difficult to operate kubernetes normally except for Observability. Kubernetes has strong advantages in terms of container orchestration such as deployment complexity, high availability, and reliability, but in terms of operations, it has high learning curves and low visibility. Because of this problem, we can see that Kubernetes’ monitoring domain is developing rapidly. Today, we will configure a monitoring system that can monitor resource usage and log of Node and POD in a multi-k8s cluster environment using open source.

Clymene-project(https://github.com/Clymene-project/Clymene)

The Clymene project is an open source consisting of agents that collect Metric and Log and back-end components in a scalable form. If you need to store a lot of traffic data due to a lot of observational targets, you can architecture using kafka, and if you want to configure it simply, the agent can store it directly in the database. In addition, the database supports a variety of databases, including elasticsearch, prometheus, and opensdb, so that users can use familiar databases. In this post, we will use kafka to configure the architecture of the high-load environment, assuming that kafka is deployed.

1. Metric backend Configuration

clymene-gateway deploy

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/3.clymene_gateway.yaml
$ kubectl create -f 3.clymene_gateway.yaml//      containers:
//        - name: gateway
//          image: bourbonkk/clymene-gateway:latest
//          imagePullPolicy: Always
//          ports:
//            - containerPort: 15694
//          args:
//            - --kafka.producer.brokers=kafka.kafka:9092  // kafka setting

clymene-ingester deploy(Simultaneously stored in prometheus and elasticsearch)

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/4.clymene_ingester.yaml
$ kubectl create -f 4.clymene_ingester.yaml//          args:
//            - --prometheus.remote.url=http://prometheus-server-http.prometheus:9090/api/v1/write
//            - --kafka.consumer.brokers=kafka.kafka:9092
//            - --es.server-urls=http://elasticsearch.es:9200
//          env:
//            - name: STORAGE_TYPE
//              value: prometheus,elasticsearch   // use prometheus, elasticsearch

clymene-agent deploy

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/1.clymene_rbac.yaml
$ kubectl create -f 1.clymene_rbac.yaml// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/2.clymene_agent.yaml
$ kubectl create -f 2.clymene_agent.yaml// Example of using service discovery function
//      - job_name: 'kubernetes-pods'
//        kubernetes_sd_configs:
//          - role: pod
//        relabel_configs:
//          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
//            action: keep
//            regex: true

2. Log backend Configuration

clymene : promtail-gateway deploy

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/2.clymene-promtail_gateway.yaml
$ kubectl create -f 2.clymene-promtail_gateway.yaml

clymene : promtail-ingester deploy(Simultaneously stored in loki and elasticsearch)

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/3.clymene-promtail_ingester.yaml
$ 3.clymene-promtail_ingester.yaml//          args:
//            - --kafka.consumer.brokers=kafka.kafka:9092
//            - --loki.client.url=http://loki.loki:3100/loki/api/v1/push
//            - --es.server-urls=http://elasticsearch.es:9200
//            - --log-level=info
//          env:
//            - name: STORAGE_TYPE
//              value: loki,elasticsearch

clymene : promtail-agent deploy

// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/1.clymene-promtail_agent.yaml
$ kubectl create -f 1.clymene-promtail_agent.yaml
// configmap - Example of using service discovery function
//    - job_name: kubernetes-pods
//      pipeline_stages:
//        - cri: {}
//      kubernetes_sd_configs:
//        - role: pod
//      relabel_configs:
//        - source_labels:
//           - __meta_kubernetes_pod_controller_name
//          regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
// ....

The back-end configuration of Clymene is now complete. If a k8s cluster exists that requires observation, you must change and deploy the cluster label before deploying promtail-agent and clymen-agent

// configmap - clymene-agent-config, promtail-config
....
            target_label: kubernetes_name
          - action: replace
            target_label: cluster
            replacement: target-cluster  // here
.....

Next, let’s import a Grafana example and visualize the Node/POD data of k8s.

https://github.com/Clymene-project/Clymene/blob/master/grafana/1.clymene-node-monitoring.json
https://github.com/Clymene-project/Clymene/blob/master/grafana/2.clymene-pod-monitoring.json

Node — Visualize

You can view the resource usage of a node with data from the node-exporter being collected. You can select a cluster and view only Node metrics for the selected cluster.

Detailed resource monitoring is possible by visualizing all resource metric on nodes.

You can check the logs of all PODs running on that node. Also, you can simply modify the query to check only the desired logs. By default, it shows the logs of PODs running on that node for 15 minutes.

POD — Visualize

Detailed verification may be required to locate the problematic POD. And to address performance issues and bottlenecks, you need to see the different resource usage.

You can check various resource usage in the POD and also check the logs in the POD. A combination of resource usage and logs can help you determine the cause of the problem faster.

We used Clymene, an open source for Metric and Log collection, to find out how to monitor the multi-k8s cluster environment. thank you for your support

GitHub - Clymene-project/Clymene: the Clymene is time-series data and Logs collection platform for…

The Clymene is a time-series data and logs collection platform for distributed systems inspired by Prometheus and…

github.com

Reference

https://grafana.com/grafana/dashboards/13770-1-kubernetes-all-in-one-cluster-monitoring-kr/