Istio Series — 1: Monitoring: Traffic Flow Diagram for applications

Anand Thiyagarajan
3 min readNov 7, 2021

--

Hi Folks, we always have a dream to visualise traffic flow to and from the applications in our environment. Let us see in this discussion how to achieve it. As each applications are dependent on some infra components like elasticsearch, redis, kafka, aerospike, etc. Tracking the traffic flow in and out of each application is very much useful to understand the overall architecture of our environment. Also this will be very helpful for identifying particular infra components on which most of the applications rely on as per our architecture.

Kiali is one of the handy tool comes in our mind to achieve this, especially when applications aided by a Service Mesh like Istio. But Kiali only shows Ingress-to-service traffic flow diagrams and corresponding metrics till the Passthrough-cluster. Also it shows services and components only within Kubernetes Cluster. So with this diagram we cannot visualise the complete architecture.

With the above drawbacks, I started building a sophisticated Traffic flow Dashboard with the help of service dependency graph plugin in grafana along with istio metrics stored in prometheus.

Plugin Installation

Installation of this plugin and setting up a panel is covered in the official plugin documentation page of Grafana.

Building our Dashboard

Let’s visit in detail on Promql queries used here and the configuration of the “Panel Options”.

PROMQL Queries:

A: sum without (instance,source_canonical_service,destination_app,source_app,connection_security_policy,destination_canonical_revision,destination_canonical_service,destination_principal,destination_service,destination_service_name,destination_service_namespace,destination_version,destination_workload,destination_workload_namespace,job,source_principal,source_canonical_revision,source_version,source_workload,source_workload_namespace) (label_replace(label_replace(istio_requests_total{reporter="$qrep",destination_workload_namespace=~"$namespace",destination_workload=~"$workload"},"dapp","$1","destination_canonical_service","(.*)"),"sapp","$1","source_canonical_service","(.*)"))B: sum without (instance,source_app, destination_canonical_service ,source_canonical_service,connection_security_policy,destination_canonical_revision,destination_app,destination_principal,destination_service,destination_service_name,destination_service_namespace,destination_version,destination_workload,destination_workload_namespace,job,source_principal,source_canonical_revision,source_version,source_workload,source_workload_namespace) (label_replace(label_replace(istio_requests_total{reporter="$qrep",source_workload_namespace=~"$namespace",source_workload=~"$workload"},"sapp","$1","source_canonical_service","(.*)"),"dapp","$1","destination_service","(.*)"))C: sum without (instance,source_canonical_service,destination_app,source_app,connection_security_policy,destination_canonical_revision,destination_canonical_service,destination_principal,destination_service,destination_service_name,destination_service_namespace,destination_version,destination_workload,destination_workload_namespace,job,source_principal,source_canonical_revision,source_version,source_workload,source_workload_namespace) (label_replace(label_replace(istio_requests_total{reporter="$qrep",destination_workload_namespace=~"$namespace",destination_workload=~"$workload",response_code=~"5.."},"dapp","$1"," destination_canonical_service","(.*)"),"sapp","$1","source_canonical_service","(.*)") / label_replace(label_replace(istio_requests_total{reporter="$qrep",destination_workload_namespace=~"$namespace",destination_workload=~"$workload"},"dapp","$1","destination_canonical_ service","(.*)"),"sapp","$1","source_canonical_service","(.*)"))D: sum without (instance,source_app, destination_canonical_service ,source_canonical_service,connection_security_policy,destination_canonical_revision,destination_app,destination_principal,destination_service,destination_service_name,destination_service_ namespace,destination_version,destination_workload,destination_workload_namespace,job,source_principal,source_canonical_revision,source_version,source_workload,source_workload_n amespace) (label_replace(label_replace(istio_requests_total{reporter="$qrep",source_workload_namespace=~"$namespace",source_workload=~"$workload",response_code=~"5.."},"sapp","$1","source_ canonical_service","(.*)"),"dapp","$1","destination_service","(.*)") / label_replace(label_replace(istio_requests_total{reporter="$qrep",source_workload_namespace=~"$namespace",source_workload=~"$workload"},"sapp","$1","source_canonical_service","(.*)"),"dapp","$1","destination_service","(.*)"))

Note: All the above queries should be enclosed with a rate function (rate()[x]). But if traffic is minimal in the environment (say staging), then for the traffic flow diagram to appear we can neglect the rate function.

PANEL Configuration:

 Component Column: sapp
Target Component Column: dapp
Request Rate Column: Value #A
Error Rate Column: Value #C
Request Rate Column (Outgoing): Value #B
Error Rate Column (Outgoing): Value #D

Note: We can also prepare queries that give us Response time of the traffic flow and use them for Response Time Column and Response Time Column (Outgoing) fields respectively. Also you can even optimize the above queries (I’m still a beginner).

That’s all, our dashboard is ready!!

Few Clippings of the my dashboard:

We can see the “play” button first inside the panel (at the right top inside the panel). On clicking it, we can visualise the traffic flow as running dots on the links (as shown below).

Above volume of the dots represents the volume of the traffic or request rate.
If we click on the particular node, it will show the traffic statistics in detail (as shown below):

Above Diagram shows traffic details of Istio-ingressGateway node.
Also the RED colour in the node represents the error rate in the total request.

Above diagram shows the incoming traffic to an elasticsearch node (which is running outside the Kubernetes cluster) from different applications.

This satisfies our need for a dashboard that visualises the traffic flow of our complete infrastructure.

I’m further working on improving this dashboard with more detailed view. And for now I’m contributing this dashboard to Grafana community, will share it soon.

--

--