Starting with Istio

Published in

Wealth Wizards Engineering

4 min readJun 12, 2018

We’ve been talking about using Istio for nearly a year now. Digging into a recurring problem has now given us the excuse we’ve been looking for to get started with it.

Our apps each expose a “health” endpoint. This is really just a smoke-test to confirm that all the things the app relies on are available to it. E.g. downstream web APIs, databases, etc. request to /health will return a JSON payload and a relevant HTTP response code. The payload wil tell us what is or isn’t available and is therefore going to lead to a failed request.

Overtime we’ve seen these health checks flap and we often struggle to find what the cause for a given flap was. The downstream dependencies all appear to be healthy and show no signs of being failed (ocassionally we _will_ see a cascade of failures and know which one is the root is difficult, but that’s for another time).

We’re increasingly coming to suspect that a number of these health failures lie within the kubernetes networking subsystem. The kube proxy mechanism is a clever manipulation of IP tables to effectivly provide a cluster-wide, host neutral IP address to serve your kubernetes services from. This is part of the magic of Kubernetes and it’s what allows us as consumers of the kube service to worry less about where something is running. The problem we’re seeing suggests the loss of a request packet somewhere within the system. the kube proxy doesn’t provide a level of logging that helps us solve the problem and so we were a little stumped about where to go next.

We’re already making use of the weave network overlay within kube and so my first port of call was to see if any of the weave community has seen something similar or could suggest some answers. The reply I got was to use an application proxy which would handle things like retry requests. We’re already use nginx as an app proxy within the cluster in the role of ingress controller and while this does offer retriesm the ingress controller is really concerned about traffic coming into the cluster from without, or North-South traffic as it’s often called. We were seeing problems with traffic which was essentially already within the cluster, which is referred to as East-West. The folks on the weave slack channel offered Envoy as a possible solution. It’s a small light-weight application proxy which is finding friends with the kube community and this is where Istio comes in.

Istio is a “Service mesh” which makes use of the envoy proxy, building on it to provide a system-aware application layer within the cluster. Istio provides a control plane within the cluster and installs an envoy proxy into each application pod as a side car. By default, envoy will intercept all traffic into and out of a pod. Communication between the control plane and the proxies provides Istio with a network traffic view of what’s happening within the cluster but it also provides the operator with the ability to control how traffic flows within the cluster. It also allows us to wrap a futher layer of security around the traffic within the cluster by establishing TLS tunnels between each pod for communication, should you wish.

Further to this, because Istio deploys a traffic-intercepting agent into each pod it in the ideal location to understand where requests are failing, taking too long, etc. By sending these metrics to an external tool such as Zipkin or Jagaer, we’re able to built charts displaying how the services communicate with each other as well as where delays are occuring within the system.

In our scenario the benefits are wide ranging, without putting much effort into our Istio configuration we’re able to gain visibility and improved reliability as well as an opportunity to improve the security of the system as well as open up whole new paths of traffic management and application deployment options.

What does Istio add on top of kube? kubernetes is concerned about making sure your applications get deployed and are available on a relatively coarse-grained level. Sure, it’ll make sure you have the number of pods that you asked for, and it’ll take care of turning a pretty complex distribution and scaling problem into a simple declerative command but it’s much less concerned about your traffic on a request-by-request level and this is where Istio (other service meshes also available, please see linkerD, for example) steps in. In many ways it’s sits in a similar role to a traditional layer-7 loadbalancer (think F5) it’ less concerned about what traffic is coming into your cluster and more interested in what’s happeing within the cluster.

In part II I’ll cover what steps we’ve had to go through to install Istio, and the issues we found in doing so!

Starting with Istio

Written by Rich Marshall