Google Managed Prometheus II (Kiali整合)

Published in

輕鬆小品-k8s的點滴

12 min readSep 20, 2022

上一個GMP的介紹中，我們提到了如何通過Managed Collector來收集App的Log(通過PodMonitoring)，並加映了如何通過設定OperatorConfig收集CAdvisor/Kubelet 裡的資訊，最後也展示了如何通過部署Frontend，部署一個類似Prometheus的DataSource來給Grafana使用。

最近被客戶問到，對於許多的開發者/SRE 來說，服務可視性的重要性是高於其他監控的，但為何我的文章主要都還是著墨在Cloud Monitoring, Logging, 甚至是Google Managed Prometheus。

想想也對，自己對於服務可視性，要追朔到 2020介紹SRE 時的Anthos Service Mesh (ASM)的文章。當時，Kiali還是個很陽春的工具，但這兩年的洗禮，Kiali的功能已臻大成，竟然剛好有人問，我們就來驗證一下GMP + Istio + Kiali的整合。

原理介紹：

Istio -> Prometheus:

在筆者寫這篇時，ASM 正進化到1.14 而OSS Istio已經到1.15了～這裡我們以ASM 1.14 (與OSS Istio 1.14ㄒ相同）為例。許多捧油一定很想知道到底Istio 是怎麼傳送資訊到Prometheus的，其實在Istio的標準文件中提到，Istio提供三種方式傳送資料：

Metrics Merging: 通過sidecar injection時，自動放置prometheus.io的scrape annotation，讓Prometheus自動來收集資訊。這個模式也是預設的收集方法。至於prometheus起始的scrape，在下一章節會介紹。
Customized Scraping Configurations: 若在現有的Prometheus，我們需要到prometheus的設定檔( or ConfigMap)中，額外設定以下兩個jobs

# 收集Istiod的數據 + Sidecar上的數據

Google Managed Prometheus(GMP):

上一篇的GMP說到有兩種收集模式，Managed Collection跟Self-Deployed Collection。在上一篇我們以前者為例，在這一篇我們使用後者為例（如果想知道怎樣使用Managed Collector來收ASM，可以留言跟我說）。

Self-Deployed Collection顧名思義是通過部署一個Prometheus Instance在自己的叢集內，這個Prometheus的Instance比Managed Collector模式的Frontend的使用方式，更近似於OSS Prometheus，對於一些需要Prometheus原生用法(e.g. 通過Job_name收集）的Metrics，這個方法可能更為直覺。但聰明如你，可能會問說，那這樣我到底跟部署Prometheus Operator or OSS Prometheus有什麼不同。在Self-Deploy模式裡，我們所部署的Prometheus Instance是一個GCP客制過的鏡像，與OSS差異最大的是這個Instance只負責通過Prometheus的方式收集數據，然後再丟入Google的TSDB：Monarch裡，作為管理者，我們Leverage Google Monitoring Service寫入資訊，而不需要維護對應的Local Storage。

假設上個步驟的ASM，使用的是Metrics Merging，我們先觀察一下，當我們使用Istio sidecar injection時，Metrics Merging到底會幫我們做什麼事？

(1) 會加註Annotation定義Prometheus.io/scrape: “true”

(2) 會在sidecar container上定義一個http-envoy-prom的port name.

因此，如果我們啟動一個pod去curl我們的sidecar-injected 服務，我們可以連線 curl http://pod-ip:15020/stats/prometheus 或是 curl http://pod-ip:15090/stats/prometheus 都可以取得Prometheus Metrics.

# Pod level
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/default-container: server
    kubectl.kubernetes.io/default-logs-container: server
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token"],"imagePullSecrets":null,"revision":"asm-1143-1"}'# sidecar container
image: gcr.io/gke-release/asm/proxyv2:1.14.3-asm.1
    imagePullPolicy: IfNotPresent
    name: istio-proxy
    ports:
    - containerPort: 15090
      name: http-envoy-prom
      protocol: TCP

至於istiod的控制平面呢？在ASM裏面我們會看到istiod 的deployment名稱為:

“istiod-asm-1143–1” (其中1143–1為版本名：1.14.3–1)，其中pod層也帶著prometheus.io/scrape: “true”的annotation (雖然這裡沒看到path，預設prometheus.io/path為/metrics)

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/port: "15014"
    prometheus.io/scrape: "true"

或是kubectl get endpoints istiod -n istio-system 也可以看到istiod的endpoint與對應的http-monitoring port.

apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2022-09-17T14:32:29Z"
  creationTimestamp: "2022-09-17T14:32:03Z"
  labels:
    app: istiodports:
  - name: https-dns
    port: 15012
    protocol: TCP
  - name: grpc-xds
    port: 15010
    protocol: TCP
  - name: https-webhook
    port: 15017
    protocol: TCP
  - name: http-monitoring
    port: 15014
    protocol: TCP

因此我們只要連線curl http://istiod-pod-ip:15014/metrics 就可以看到istiod所提供的prometheus metrics.

那我們該如何調整我們的Prometheus Config Map YAML呢？

Method 1: 抓取prometheus.io/scrape: “true”，通過這個可以自動將istiod sidecar所自帶的annotation，port, 與path送入prometheus採集工作。

Method 2: 加上兩個jobs, 分別抓取istiod endpoint上的http-monitoring port與sidecar 上對應的http-envoy-prom port

安裝流程

安裝Google Managed Prometheus

類似之前的文章，要先提供一個具有roles/monitoring.metricWriter權限的GCP ServiceAccount，再通過Workload Identity的方式，將

# 設定GCP Service Account
gcloud config set project ${PROJECT_ID}
gcloud iam service-accounts create gmp-test-sa
gcloud projects add-iam-policy-binding ${PROJECT_ID}\
  --member=serviceAccount:gmp-test-sa@${PROJECT_ID}.iam.gserviceaccount.com \
  --role=roles/monitoring.metricWriter# 設定Workload Identity，綁定 gmp-test這個namespace裡的default KSAkubectl create ns gmp-testgcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${PROJECT_ID}.svc.id.goog[gmp-test/default]" gmp-test-sa@${PROJECT_ID}.iam.gserviceaccount.comkubectl annotate serviceaccount --namespace gmp-test \
  default iam.gke.io/gcp-service-account=gmp-test-sa@${PROJECT_ID}.iam.gserviceaccount.com

下載Self-Deployed Prometheus YAML，並在對應的ConfigMap prometheus-test 加上前述的Method 1 or Method 2的對應工作(Jobs)

wget https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.4.3-gke.0/examples/prometheus.yaml

部署修改後的prometheus.yaml 到gmp-test的名稱空間。

安裝ASM

ASM提供Managed 以及in-Cluster Control Plane兩種模式，這一篇我們專注在in-cluster Control Plane的安裝上，我們測試的GKE Cluster為Standard v1.22.11-gke.400版 (目前Autopilot僅支援Managed ASM)。

# 下載asmcli
curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.14 > asmcli# 命令ASM 將metrics丟入prometheus，而不是Cloud Monitoring (如果使用地端
# Anthos, 預設就是使用Prometheus) 
./asmcli install \
  --project_id ${PROJECT_ID} \
  --cluster_name ${CLUSTER_NAME} \
  --cluster_location ${CLUSTER_LOCATION} \
  --output_dir ${DIR_PATH}  \
  --enable_all \
  --ca mesh_ca \
  --option prometheus

部署Kiali

在ASM部署時的 — output_dir下的istio-1.14.3-asm.1/samples/addons目錄下，ASM提供了對應的kiali.yaml，預設Kiali.yaml所指向的prometheus預設在istio-system名稱空間下的prometheus服務。由於我們上述的prometheus故意選在gmp-test的名稱空間中，我們也必須一併修改kiali.yaml裡的ConfigMap設定如下：

external_services:
  prometheus:
    url: "http://prometheus-test.gmp-test:9090"
  grafana:
    enabled: true
    # Grafana service name is "grafana" and is in the "telemetry" namespace.
    in_cluster_url: 'http://grafana.gmp-test:3000/'
  custom_dashboards:
    enabled: true
  istio:
    root_namespace: istio-system
    istiod_pod_monitoring_port: 15014
    config_map_name: "istio-asm-1143-1"
    istiod_deployment_name: "istiod-asm-1143-1"

成果展示:

經過一番的努力後，我們來看一下成果吧。由於kiali預設部署是Cluster IP，我們通過port-forward到 20001 Port

Google Managed Prometheus II (Kiali整合)

原理介紹：

安裝流程

成果展示:

Written by Shawn Ho