Using OpenTracing with Istio/Envoy

Sidecar proxies offer a very simple way to get started with observability and get monitoring data without any in-process instrumentation. This is a big advantage because instrumentation can be considered the hardest part when deploying monitoring and specially tracing in large scale distributed systems. In this post we will look at advantages and drawbacks of this approach and how OpenTracing helps.


Saying that sidecar proxy requires no instrumentation for tracing is not totally true. It is necessary to pass some headers from inbound to outbound requests. If the service is at the end of the chain no instrumentation is required. This looks pretty amazing right?

In helloworld style apps this is very straightforward, however with increased complexity passing headers around might become inconvenient. Imagine adjusting business layer interfaces to pass set of headers, of course it can be done in a a less intrusive way, for instance storing headers in thread locals. If we will try to automate this we will actually endup writing sort of instrumentation of the framework we are using. Then we should first consider whether there are existing solutions already available for this problem.

In open-source there are several instrumentation libraries and tracers which can be used. In this post we will focus on OpenTracing.


Using only Envoy for tracing

In this section we will briefly look at how headers propagation may look like. The following code snippet shows Spring controller with chaining endpoint.

There is nothing really special, this code can be easily imported into a helper function to slim down the endpoint method. As we pointed out before the problem might be if calls to downstream services are made inside the business layer, then headers propagation might require some refactoring. Someone can eventually forget to pass the headers alltogether. In the next section we will solve this problem by using OpenTracing integrations.

Envoy and OpenTracing

Now we are going to add OpenTracing instrumentation to our application. Spring Boot can be instrumented just by adding a JAR to the classpath. Auto configurations will add all necesary tracing code to the app without any further developer interaction. The Spring Boot instrumentation artifact is opentracing-spring-cloud-starter [1].

OpenTracing is vendor neutral and therefore we also have to supply a tracer implementation. In this case we are going to use jaeger-java-client [2]. As the very last thing we have to instantiate and configure the tracer bean. Note that Envoy uses B3 propagation which is not enabled in Jaeger by default and has to be registered explicitly:

Now let’s deploy Istio, Jaeger and our application into Kubernetes running on minikube. All instructions can be found in the readme [3]. Once everything is up and running we can make some requests to /chaining endpoint.

Trace for /chaining endpoint

Generally speaking there are two latency rules which apply for proxy spans: the first is that duration of the proxy client span is always shorter as the original duration. The second rule is exactly the opposite and the duration of proxy server span is always longer. Proxy spans on the figure above have name default-route.

The first span is an ingress Istio span then follows a server span created in Envoy proxy for /chaining endpoint. The third span is created inside the monitored process via OpenTracing. The most interesting is the fourth span with name GET which represents a client request for /hello endpoint. This is interesting because of the duration difference to its proxy (child span). The duration captured inside the process is 102.92ms whereas the proxy span says that the request took only 14.24ms. Why is there so much difference? Maybe the HTTP client library is slow, maybe class loading took longer or creating a connection in Java. If we repeat the request the difference significantly decreases. The root cause of this latency problem was probably class loading in JVM.

Let’s continue with different span and have a closer look at a server span /chaining reported from the process using OpenTracing:

Within the logs associated with the span we can see which controller method was invoked. If there were redirects it would show in exactly what method the time was spent. This time we are lucky because everything worked es expected, however if there was an exception, instrumentation would capture it and put it into these logs alongside with the stack trace. Again it is a very valuable information for the root cause analysis.


Conclusion

We have seen that tracing with only Envoy is very simple to setup. It does not require any additional libraries, however there are still some actions needed for headers propagation. This can be done automatically with OpenTracing and it also adds more visibility into the monitored process.