Kiali Sprint #36 & #37 — GRPC and Kafka enhancements, caching and more
Two Sprints have passed since the Sprint update blog post. Of course, also two Kiali versions were released: 1.15 and 1.16. I was going to write the post for Sprint #36, but I suddenly got pulled to do an important fix — priorities changed.
If you still don’t know, be advised that Kiali had a security vulnerability. It was pretty bad and affected all versions since 0.4.0. A few members of the team put hands on fixing it and the secured Kiali is version 1.15.1. If you have a prior Kiali version, sorry for the capital letters, but UPDATE NOW!
The vulnerability is published in security bulletin 001. There is a mitigation path if you cannot update your Kiali version. This news was spread on several channels at the time of disclosure. I just wanted to be sure that people following Kiali blog are aware, just in case.
OK, going back… So, two Sprints finished since the last update. This also means there are two recordings (I won’t embed):
I encourage to watch the demos. But in case you prefer to read, here are the latest features for the last two Sprints:
GRPC status in metrics
Starting in Kiali 1.15, there is a new GRPC status option in metrics pages. This is where you can find this new option:
To add some background, GRPC status codes is also a new feature of Istio 1.5. Istio versions prior to 1.5 weren’t reporting GRPC status codes in the telemetry. Now that Istio 1.5 made this data available in the telemetry, Kiali can take advantage of this new information.
Please, note that this is available in the default telemetry. If you upgraded from previous versions of Istio or if you customized the default telemetry, you may not have this field available and Kiali won’t be able to take advantage of this new data.
Health calculation involving GRPC failures
Have you seen an indicator similar to the one at the left? It’s a Health indicator present on several places in Kiali. In the graph’s side panel there is a small one shown when you select a node. The Applications, Workloads and Services list pages and also the details pages show the Health indicator — the indicator shown in the image at the left is the one at the Applications details page.
The Health indicator looks exactly the same as in past versions of Kiali. Although not evident, starting in Kiali 1.15 the Health indicators are taking into consideration the new GRPC status codes in telemetry to calculate Health and show a degraded or error status, if applicable.
Kiali’s own caching enabled by default
Kiali is doing a lot of requests to the cluster API, Prometheus and Jaeger to compile and synthesize the Service Mesh data and show it in a friendly way. Naturally, the bigger your Service Mesh, the greater the time Kiali will need to load some screens.
To improve the load time, starting Kiali 1.8 a caching mechanism was added to reduce the number of requests to the cluster API, which is the most queried. By default, caching was turned off. Now that we are more confident of the stability of this feature, starting at Kiali 1.15 caching is enabled by default and we hope you see an improvement in performance.
If you want to customize settings of the Kiali caching mechanism, options are available in the CR of the operator.
New Istio config validations
There are five newly added validations. All these are well documented in Kiali’s website, so I’m just listing them and linking to the relevant section of the website for a detailed description:
- Sidecar validation: egress host not present in Istio’s service registry
- Sidecar validation: no matching workloads for selector
- Sidecar validation: more than one selector-less Sidecar in namespace
- Sidecar validation: multiple Sidecars affecting the same workload
- AuthorizationPolicy validation: host not present in Istio’s service registry
API doc feature superseded by Annotations feature
Kiali has had an API doc feature which was able to show the documentation of an API based service if it exposes a spec file. It was released in Kiali 1.3. You can read the short description in Sprint #26 post.
Some sprints later, a feature was incorporated to Kiali letting users to configure annotations to display in the Workloads and Services detail pages. This was released on Kiali 1.10. It’s currently undocumented, but you can check the pull request implementing the feature to have an idea about how to configure it.
The API doc feature got stalled and the Annotations feature is quite generic. It was chosen to remove the API doc feature and replace it with the Annotations feature. The Annotations feature was improved to include an icon and now it looks like in the following image:
The downside is that Kiali will no longer show the list of endpoints of the API. The Annotations feature will just show a link where that documentation should be available.
Notification of changes while in YAML editor
In Kiali’s Istio Config detail pages there is a YAML editor where resources can be updated. While you are editing, somebody else may edit first the resource. In Kiali 1.15 a poll mechanism was added while you are on the editor to notify you about changes in the resource. The notification is shown in an overlay:
Graph Overview in detail pages
In Kiali 1.15 a Graph Overview card was added to Applications, Workloads and Services detail pages. This card will show the graph of the neighborhood of the entity being displayed. By neighborhood, understand the nodes of the mesh that are receiving traffic or sending traffic to the displayed entity:
In the screenshot, you may realize that the Show on graph link that was available next to the page title is no longer there. That link was moved down under the kebab menu of the Graph Overview card because, contextually, we think it’s a better place for it:
Filter by label in Overview and list pages
It’s a typical practice to add labels to workloads, services, namespaces, etc. In general, any cluster object can be labeled and it’s a common practice to apply labels. To ease finding objects a label filter was added to the list pages. Also the label filter was added to the Overview page to ease finding namespaces.
The screenshot is for the Workloads list page, but the same option is available in Overview page, and Applications and Services list pages.
Unified graph display options in one drop-down
This is a screenshot of the old graph toolbar:
Look that there is a drop-down with Response time text which is the text of the selected option. This drop-down is the labels chosen to display in the graph edges. There is also the Display drop-down. Can you realize that both drop-downs share the idea about displaying something in the graph? Because of this, these drop-downs were unified and this is the new look of the toolbar:
A side motivation for this unification is to avoid wasted space for the graph. On some screen resolutions, some items of the toolbar were wrapping to the next line and this reduced the available space for the graph. With the unified drop-down we expect less wrapping to save space for the graph.
Improved graph traffic animation
Let’s see the traffic animation before improvements:
Then, see the animation after the improvement:
The difference that is, probably, more visible in these recordings is the animation of the TCP edges (the blue ones). But also watch closely the reviews box. You will notice that the animation on those edges is also different, which also has impact on the animation going to the ratings service.
We realized that the algorithm of the animation was not properly taking into account the volume of the traffic. It was giving the idea that traffic is equal in almost all edges, but that isn’t true. The improved algorithm will better reflect traffic volume across the graph.
Sparklines on graph side panel when selecting edge leading to Service Entry
When you select an edge in the graph, the side panel will show some information which includes charts (sparklines) about the traffic happening on that edge. In some cases, the sparkline is not provided. One such case was an edge leading to a Service Entry.
There was a reason for this omission, but I can no longer remember the reason 😕. Anyway, on a recent iteration we found this unblocked and now the sparklines are provided for a Service Entry edge in Kiali 1.16:
Improved Kafka compatibility
There were some issues when using Kafka and trying to inspect the traffic using Kiali, like missing charts, missing metrics, missing nodes in the graph, etc. Most of these issues were classified as bugs. To improve Kafka compatibility Joel Takvorian put hands on creating a Kafka setup to find all issues with it. Issue #2197 was the task where work to improve Kafka compatibility was recorded, check it if you want more details.
Several things were fixed and we hope you will have a better experience with Kiali, if your environment includes Kafka. Of course, in a test setup it’s not possible to find all issues that may be visible in a more complete setup. So, if you use Kafka and find more issues in Kiali, feel free to report them.
Latest videos in Kiali Website
Kiali website has a Latest in our blog section in the homepage — this post is part of the blog and should appear in that section 😃. But we also have a YouTube channel where we post sprint demos, tutorials, guides, etc. So, we added a Latest videos and Sprint demos videos sections to give more visibility to the content shared in the YouTube channel.
Visit the Kiali website to check it out!
Stay in touch
That’s the update of these two Sprints. Let’s hope there won’t be more security issues in Kiali. In case you find one, take into account that security issues can lead to stolen data. Report them to us, but not in a public way. We learned that we must have a way to contact the team privately and we now have a security policy. Read the SECURITY.md file in the main Kiali repository to learn how to report sensitive issues.