Distributed Tracing in Micoservices using Zipkin, Sleuth and ELK Stack.
What is Distributed Tracing ?
One of the major challenges in microservices is the ability to debug issues and monitor them. A simple action can trigger a chain of microservice calls and it would be tedious to trace these actions across the invoked microservices. This is because each microservice runs in an environment isolated from other microservices so they don’t share resources such as databases or log files. In addition to that, we might also want to track down why a certain microservice call is taking so much time in a given business flow.
The Distributed Tracing pattern addresses the above challenges developers face while building microservices. There are some helpful open-source tools that can be used for distributed tracing, when creating microservices with Spring Boot and Spring Cloud frameworks. This blog walks through the installation steps and implementations of these tools.
The Tools
Spring Cloud Sleuth: A Spring Cloud library that lets you track the progress of subsequent microservices by adding trace and span id’s on the appropriate HTTP request headers. The library is based on the MDC (Mapped Diagnostic Context) concept, where you can easily extract values put to context and display them in the logs.
Zipkin: A Java-based distributed tracing application that helps gather timing data for every request propagated between independent services. It has a simple management console where we can find a visualization of the time statistics generated by subsequent services.
ELK Stack: Three open source tools — Elasticsearch, Logstash and Kibana form the ELK stack. They are used for searching, analyzing, and visualizing log data in real-time. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch. Kibana lets us visualize this data with charts and graphs.
How do they work together ?
Based on the below diagram (Image A), when the Orchestrator Service makes a HTTP call on the service `/order/{orderId}`, the call is intercepted by Sleuth and it adds the necessary tags to the request headers. After the Orchestrator Service receives the HTTP response, the data is sent asynchronously to Zipkin to prevent delays or failures relating to the tracing system from delaying or breaking the flow.
Sleuth adds two types of IDs to the log file, one called a trace ID and the other called a span ID. The span ID represents a basic unit of work, for example sending an HTTP request. The trace ID contains a set of span IDs, forming a tree-like structure. The trace ID will remain the same as one microservice calls the next.
The logs are published directly to Logstash in this example for convenience, but we can also use Beats. Beats is a simple data shipper that either sits on servers or on containers, that listen to log file locations and ship them to either Logstash for transformation or Elasticsearch.
Installation of the needed tools
The guide assumes the user has docker pre-installed. If not you can follow the steps for installation here.
1) Installing Zipkin
Run the first docker command to pull the Zipkin image from hub.docker.com and then the next docker command to start it on port 9411.
$ docker pull openzipkin/zipkin$ docker run -d -p 9411:9411 openzipkin/zipkin
Validate the setup by accessing the Zipkin web interface on the url: http://localhost:9411/zipkin/. The below screen (Image 1) should open up if there are no issues.
2) Installing ELK Stack
This install will be using the image `sebp/elk`, on this image we will be making changes to disable SSL and setup indexes for Elastic search on the Log-stash configuration files.
Create the 2 files with the configuration below:
Then create a `DockerFile` as below, using the configurations created above
Execute the below docker commands to build the image with tag `local-elk` and start all three components.
$ docker build . --tag local-elk$ docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk local-elk
The `docker run`, command starts up Kibana on port 5601, ElasticSearch on port 9200 and LogStash on port 5044.
Validate the kibana setup by accessing the web console on url ‘http://localhost:5601’. The below screen (Image 2) should show up on the browser.
Validate Elasticsearch with the below curl command
curl http://localhost:9200/_cat/indices
This completes our installations !
Example Microservices
As depicted in Image 3, we have three microservices. The Order service (running on port 8081) has operations to fetch an order based on a given Order ID. The Customer service (running on port 8082) has operations to fetch a customer based on a given Customer ID. The Orchestrator (running on port 8080) exposes an operation to get both the order and customer details for a given Order ID. The Orchestrator first calls the order service to get the order details then makes another call to the customer service to get the customer details and returns both the details.
You can find all the code here
To enable Sleuth, Zipkin and ELK stack, we need to make the below changes on all 3 microservices.
First change is the pom.xml, where we add the cloud-starter dependencies for both Sleuth and Zipkin, also the logback dependencies needed for logstash.
The second change is to add the URL, in the application.properties for spring to publish data to Zipkin.
The final change is the logback.xml, to publish the logs to LogStash. The appender publishes all the logs to Logstash running on port 5044, using an Async TCP Appender. Again as mentioned above Beats can be used to ship logs to Logstash too.
All the services that need to use the Distributed Tracing feature, will need the above three changes / additions.
Seeing the magic happen
Once all three services are up and running, we can test the setup by executing the below curl which returns the Order and Customer details.
curl -X GET http://localhost:8080/customer-orders/100
Now when we log on to the Zipkin dashboard and click the button “Find Traces”, you will notice a trace spanning 3 services. If you notice the trace has a Trace-Id on top and durations for each of the calls.
By stopping the customer-service, we can see the failure (Image 5) being flagged on Zipkin dashboard.
Now lets the ELK Stack logs
Before we see the logs we need to configure the Kibana dashboard to use the index we created on log stash.
- In the output configuration we, created the index with the name ‘logstash-local’. On the Kibana dashboard, click on `Create Index Pattern`, enter that as the index pattern and click on “Next Step”.
The next step would be to select `timestamp`, on the configure setting screen. Once the pattern has been created, you should see a screen as Image 7.
Now by clicking the discover link (the button as a compass on the left menu), we should be able to see the logs generated from the services. We can also filter them by the TraceId’s, from the Zipkin dashboard.
Conclusion
With tools like Sleuth, Zipkin and ELK Stack, Distributed Tracing doesn’t seem to be a very difficult problem to solve. As the application grows, these tools can provide us with much-needed information on where requests are spending their time and help tracing the flow. There are also some other tools that provide the same solution like Opentracing and Jaeger.