Best practice, k8s Node/POD resource usage and log monitoring system for multi-k8s cluster environment using Open source

allen
4 min readAug 17, 2022

--

It is very difficult to operate kubernetes normally except for Observability. Kubernetes has strong advantages in terms of container orchestration such as deployment complexity, high availability, and reliability, but in terms of operations, it has high learning curves and low visibility. Because of this problem, we can see that Kubernetes’ monitoring domain is developing rapidly. Today, we will configure a monitoring system that can monitor resource usage and log of Node and POD in a multi-k8s cluster environment using open source.

https://github.com/Clymene-project/Clymene

Clymene-project(https://github.com/Clymene-project/Clymene)

The Clymene project is an open source consisting of agents that collect Metric and Log and back-end components in a scalable form. If you need to store a lot of traffic data due to a lot of observational targets, you can architecture using kafka, and if you want to configure it simply, the agent can store it directly in the database. In addition, the database supports a variety of databases, including elasticsearch, prometheus, and opensdb, so that users can use familiar databases. In this post, we will use kafka to configure the architecture of the high-load environment, assuming that kafka is deployed.

1. Metric backend Configuration

  • clymene-gateway deploy
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/3.clymene_gateway.yaml
$ kubectl create -f 3.clymene_gateway.yaml
// containers:
// - name: gateway
// image: bourbonkk/clymene-gateway:latest
// imagePullPolicy: Always
// ports:
// - containerPort: 15694
// args:
// - --kafka.producer.brokers=kafka.kafka:9092 // kafka setting
  • clymene-ingester deploy(Simultaneously stored in prometheus and elasticsearch)
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/4.clymene_ingester.yaml
$ kubectl create -f 4.clymene_ingester.yaml
// args:
// - --prometheus.remote.url=http://prometheus-server-http.prometheus:9090/api/v1/write
// - --kafka.consumer.brokers=kafka.kafka:9092
// - --es.server-urls=http://elasticsearch.es:9200
// env:
// - name: STORAGE_TYPE
// value: prometheus,elasticsearch // use prometheus, elasticsearch
  • clymene-agent deploy
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/1.clymene_rbac.yaml
$ kubectl create -f 1.clymene_rbac.yaml
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-metrics/2.clymene_agent.yaml
$ kubectl create -f 2.clymene_agent.yaml
// Example of using service discovery function
// - job_name: 'kubernetes-pods'
// kubernetes_sd_configs:
// - role: pod
// relabel_configs:
// - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
// action: keep
// regex: true

2. Log backend Configuration

  • clymene : promtail-gateway deploy
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/2.clymene-promtail_gateway.yaml
$ kubectl create -f 2.clymene-promtail_gateway.yaml
  • clymene : promtail-ingester deploy(Simultaneously stored in loki and elasticsearch)
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/3.clymene-promtail_ingester.yaml
$ 3.clymene-promtail_ingester.yaml
// args:
// - --kafka.consumer.brokers=kafka.kafka:9092
// - --loki.client.url=http://loki.loki:3100/loki/api/v1/push
// - --es.server-urls=http://elasticsearch.es:9200
// - --log-level=info
// env:
// - name: STORAGE_TYPE
// value: loki,elasticsearch
  • clymene : promtail-agent deploy
// https://github.com/Clymene-project/Clymene/blob/master/k8s/clymene-logs/1.clymene-promtail_agent.yaml
$ kubectl create -f 1.clymene-promtail_agent.yaml
// configmap - Example of using service discovery function
// - job_name: kubernetes-pods
// pipeline_stages:
// - cri: {}
// kubernetes_sd_configs:
// - role: pod
// relabel_configs:
// - source_labels:
// - __meta_kubernetes_pod_controller_name
// regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
// ....

The back-end configuration of Clymene is now complete. If a k8s cluster exists that requires observation, you must change and deploy the cluster label before deploying promtail-agent and clymen-agent

// configmap - clymene-agent-config, promtail-config
....
target_label: kubernetes_name
- action: replace
target_label: cluster
replacement: target-cluster // here
.....

Next, let’s import a Grafana example and visualize the Node/POD data of k8s.

https://github.com/Clymene-project/Clymene/blob/master/grafana/1.clymene-node-monitoring.json
https://github.com/Clymene-project/Clymene/blob/master/grafana/2.clymene-pod-monitoring.json

Node — Visualize

You can view the resource usage of a node with data from the node-exporter being collected. You can select a cluster and view only Node metrics for the selected cluster.

Detailed resource monitoring is possible by visualizing all resource metric on nodes.

You can check the logs of all PODs running on that node. Also, you can simply modify the query to check only the desired logs. By default, it shows the logs of PODs running on that node for 15 minutes.

POD — Visualize

Detailed verification may be required to locate the problematic POD. And to address performance issues and bottlenecks, you need to see the different resource usage.

You can check various resource usage in the POD and also check the logs in the POD. A combination of resource usage and logs can help you determine the cause of the problem faster.

We used Clymene, an open source for Metric and Log collection, to find out how to monitor the multi-k8s cluster environment. thank you for your support

Reference

--

--

allen

I’m allen, the clymene-project author with a lot of interest in the observability domain