Distributed Tracing in Microservices

Amila Iroshan
The Fresh Writes
Published in
10 min readJan 25, 2024

Error Tracing — Backend Services

Error tracing in a Java backend application is a critical part of the development and debugging process. Effective error tracing helps you to identify and fix issues in your code, ensuring that your application runs smoothly.

Designing a robust error tracing system is crucial for identifying, diagnosing, and resolving issues in your software applications.

When working with Spring Boot applications, you have several options for implementing error tracing and monitoring systems. The choice of the best error-tracing system depends on your specific needs and preferences.

When considering error tracing there are some terminologies, those are as below.

· Tracing

· Logging

1). What is logging?

The purpose of logging is to track error reporting and related data in a centralized way. Log files can show any discrete event within an application or system, such as a failure, an error, or a state transformation. When something inevitably goes wrong, such transformations in state help indicate which change actually caused an error.

The most successful log files are not noisy, they shouldn’t contain extraneous or distracting information. Instead log files should log only what is absolutely necessary, such as actionable items.

· Log errors with different severity levels (e.g., INFO, WARN, ERROR, FATAL) to distinguish between different types of issues.

· Include context-specific data in logs, such as timestamps, request/response details, user IDs, and transaction IDs.

2). What is tracing?

A trace represents a single user’s journey through an entire stack of an application. It is often used for optimization purposes. When considering microservice architecture specifically distributed tracing coming to the picture, focuses on tracking the flow of requests as they move through various components and services within a distributed system.

Implement distributed tracing with tools like Zipkin or Jaeger to trace requests across microservices, helping identify bottlenecks and latency issues.

In brief distributed tracing is the process of tracing a request from the first microservice until the last one it travels to find out where the failure has happened.

Benefits of Distributed Tracing

· End-to-end visibility of the user request across the entire system of microservices

· Provides information about service dependencies

· Resiliency when the system encounters a failure

Every request will have a Trace ID, timestamp, and other useful metadata. With this, we can see how long the request spans across a particular microservice, and also, we can get the metrics to improve the latency.

Distributed tracing consists of two main concepts

· Trace Id

· Span Id

Trace Id is used to trace an incoming request and track it across all the composing services to satisfy a request.

Span Id is more of spans in between service calls to track each request that is received and to the response that is sent out.

Distributed tracing in Spring Boot can be achieved using various libraries and frameworks that integrate seamlessly with Spring Boot applications. These tools allow you to capture, trace, and analyze requests as they flow through different components of your distributed systems. Here are some popular options:

1). Spring Cloud Sleuth: Spring Cloud Sleuth is an official project within the Spring Cloud ecosystem that provides distributed tracing capabilities for Spring Boot applications. It integrates with other Spring Cloud components also.

2). Zipkin: Zipkin is a distributed tracing system that can be used with Spring Boot. It allows you to collect and visualize trace data, providing insights into request flows, latencies, and dependencies. Zipkin has a variety of instrumentation options, including Spring Boot.

3). Jaeger: Jaeger is an open-source, end-to-end distributed tracing system that is compatible with Spring Boot. It offers features like trace visualization and service dependency graphs.

4). New Relic: New Relic is another observability platform that offers distributed tracing as part of its feature set. It has Java agents that can be added to Spring Boot applications for trace collection and analysis.

Among the above solutions, Spring Cloud Sleuth and Zipkin are an official project within the Spring Cloud microservice ecosystem. Therefore, I’ll choose Spring Cloud Sleuth and Zipkin as distributed tracing frameworks.

3). How does distributed tracing work?

Figure 1. Tracing in between two microservices.

The incoming request doesn’t have any trace ID, the first service intercepting the call generates the trace id “ID1” and its span id “A”. The span id “B” covers the time from when the client at server 1 sent out the request, then server 2 received it, processed it, and sending out the response.

4). Let’s start distributed tracing with Spring Boot and Spring Cloud — Zipkin and Sleuth

Spring Cloud Sleuth(2.2.6 v):

Spring Cloud Sleuth is a framework that provides distributed tracing capabilities for Spring Boot applications. It helps you track and visualize the flow of requests as they travel through different microservices in your application. Spring Cloud Sleuth automatically adds unique trace and span IDs to log entries and injects trace information into HTTP headers.

Zipkin(2.24.2 v):

Zipkin is an open-source distributed tracing system that helps you collect, store, and visualize trace data from your microservices. It provides a web-based user interface for exploring traces and understanding how requests flow through your application.

4.1). Set Up a Zipkin Server

First, we need to set up a Zipkin server to collect and store tracing data. We can run Zipkin as a standalone service or use a Docker container. Please refer to the Zipkin Server installation guide.

4.2). Add Dependencies to Your Spring Boot Microservices.

In each of your Spring Boot microservices, we need to add dependencies for Spring Cloud Sleuth and a Zipkin client. Add these dependencies to your pom.xml

<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>

4.3). Configure Your Microservices.

In each microservice, we need to configure the service name and the URL of your Zipkin server. We can do this by adding the following properties to our application.properties

spring.application.name=your-service-name
spring.zipkin.base-url=http://localhost:9411

Note : Replace your-service-name with the actual name of the microservice

4.4). Trace Requests.

With Sleuth and Zipkin configured, we can now trace requests as they flow through our microservices. Spring Cloud Sleuth automatically adds trace and span IDs to our logs and HTTP headers, allowing Zipkin to correlate requests.

The usual logs will be printed in the below format

[application-name, traceid , spanid]

4.5). View Traces in Zipkin

Now, when we send requests through our microservices, the tracing data will be sent to Zipkin. We can access the Zipkin web UI by navigating to http://localhost:9411 (or the URL where we’ve deployed Zipkin).

In the Zipkin UI, we can search for and view traces, analyze the flow of requests, and identify latency issues, monitor and troubleshoot your microservices architecture.

Figure 2. Zipkin Dashboard

5). Integrate Zipkin with Grafana

5.1). What is Grafana?

Grafana :

Grafana is an open-source solution for running data analytics with the help of metrics that gives us an insight into the complex infrastructure and massive amount of data that our services deal with, with the help of customizable dashboards.

Grafana is an open-source solution that enables you to query, visualize, alert , and explore your metrics, logs, and traces wherever they are stored.

5.2). Why do we need to integrate Zipkin with Grafana?

The main objective of integrating Zipkin with Grafana is to provide a common dashboard for exploring our metrics, logs and error tracing.

5.3). How to integrate Zipkin with Grafana

Step 1: Install and Configure Zipkin

Before integrating with Grafana, make sure we have a running Zipkin server. We can either run Zipkin as a standalone service or use Docker.

Please refer to the Zipkin Server installation guide.

Step 2: Install and Configure Grafana

We can download Grafana from the official website (https://grafana.com/) and follow the installation instructions for our platform.

Step 3: Install the Zipkin Data Source Plugin

· In Grafana, navigate to the “Configuration” section and select “Data Sources.”

· Click on “Add data source” to open the data source configuration page.

· Search for “Zipkin” in the data source plugins and click on “Install.”

· Configure the Zipkin data source by specifying the URL of your Zipkin server (e.g., http://localhost:9411) and other settings as needed. Make sure to provide a name for the data source.

· Click “Save & Test” to verify that Grafana can connect to your Zipkin server successfully.

Please refer to the Zipkin DataSource configuration guide.

Step 4: Create Grafana Dashboards

Now that we have set up the Zipkin data source, we can create Grafana dashboards to visualize our Zipkin trace data.

· In Grafana, navigate to the “Create” section and select “Dashboard.”

· Add a new panel to the dashboard.

· In the panel configuration, select the “Query” section and choose “Zipkin” as the data source.

· Configure the query to fetch the Zipkin trace data you want to visualize. You can specify various filters, such as service names, duration, and tags, to narrow down the data.

· Customize the panel settings, visualizations, and layout according to our requirements.

· Add more panels or repeat the process to create additional panels for our dashboard.

· Once we have designed our dashboard, click “Save” to save it.

Step 5: View and Share Our Dashboard

We can now view and interact with our Grafana dashboard, which displays visualizations of our Zipkin trace data. We can use Grafana’s features to explore and analyze the traces.

Additionally, we can share our Grafana dashboard with our team or stakeholders by generating a shareable link or embedding it in other tools or websites.

6). AWS Solution for Error Tracing (AWS X-Ray)

6.1). What is AWS X-Ray?

AWS X-Ray is a distributed tracing service offered by Amazon Web Services (AWS) that helps developers analyze and troubleshoot applications built using microservices and serverless architectures. X-Ray allows you to track and visualize the flow of requests as they move through various components of your application, providing valuable key features and components of AWS X-Ray. Those are:

Distributed Tracing: X-Ray enables distributed tracing, which means it traces requests as they travel through multiple microservices, AWS resources, and Lambda functions within your application.

Trace Representation: Traces are represented as a directed graph, where each node represents a component or service involved in processing the request, and edges indicate the flow between them. This visualization helps you understand the request flow and identify issues.

Sampling: You can configure X-Ray to sample traces, allowing you to control the amount of trace data collected. Sampling helps manage costs and reduce the overhead of tracing in high-traffic applications.

Integration with AWS Services: X-Ray integrates seamlessly with various AWS services, including AWS Lambda, AWS Elastic Beanstalk, Amazon EC2, Amazon ECS, and Amazon API Gateway. It automatically captures trace data from these services.

Trace Analysis: AWS X-Ray provides a web-based console where you can view and analyze trace data. You can search for specific traces, view detailed trace information, and identify areas for optimization.

Service Maps: X-Ray generates service maps that visualize the dependencies between different components of your application. Service maps help you understand the architecture and identify points of failure.

Integration with AWS CloudWatch: You can integrate AWS X-Ray with Amazon CloudWatch to correlate trace data with metrics and logs.

6.2). How to integrate AWS X-Ray with a Spring Boot application

Step 1: Add the AWS X-Ray SDK Dependency

In our Spring Boot project, we need to add the AWS X-Ray SDK as a dependency. If we are using Maven, we can add the following dependency to our pom.xml:

<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-xray-recorder-sdk-core</artifactId>
<version></version> <!-- Use the latest version -->
</dependency>

Step 2: Adding a tracing filter to your application

Add a Filter to our WebConfig class. Pass the segment name to the AWSXRayServletFilter constructor as a string.

@Configuration
public class WebConfig {
@Bean
public Filter TracingFilter() {
return new AWSXRayServletFilter("SpringBootXRayDemoApplication");
}
}

Step 3: Activating X-Ray to your application

By annotating methods or classes with @XRayEnabled, we enable X-Ray tracing for those components, and AWS X-Ray will automatically capture trace data for requests that pass through these components.

To activate X-Ray tracing in our application, our code must extend the abstract class BaseAbstractXRayInterceptor by overriding the following methods.

· GenerateMetadata() — This method allows customization of the metadata attached to the current function’s trace. By default, the class name of the executing function is recorded in the metadata. We can add more data if we need additional information.

· XrayEnabledClasses() — This method is empty and should remain so. It serves as the host for a pointcut instructing the interceptor about which methods to wrap. Define the pointcut by specifying which of the classes that are annotated with @XRayEnabled to trace. The following pointcut statement tells the interceptor to wrap all controller beans annotated with the @XRayEnabled annotation.

@Pointcut(“@within(com.amazonaws.xray.spring.aop.XRayEnabled) && bean(*Controller)”)

Step 4: Monitor and Analyze Traces

Once our Spring Boot application is running and integrated with AWS X-Ray, we can use the AWS X-Ray Console to monitor and analyze traces.

Figure 3. X-Ray Console

6.3). How to integrate AWS X-Ray with Grafana

Direct Using X-Ray data source plugin.

Please refer to the AWS X-Ray Plugin configuration with Grafana guide.

Conclusion

Spring Cloud Sleuth is part of the Spring Cloud ecosystem and can be used with Spring-based applications. Zipkin is an open-source distributed tracing system that can work with various programming languages and frameworks. This combination provides flexibility for developers working in a Spring-based environment or a mixed technology stack.

Besides that, AWS X-Ray is tightly integrated with other AWS services, making it a convenient choice if our application is hosted on AWS. It can automatically capture trace data for AWS resources like Lambda functions, EC2 instances, and more. AWS X-Ray is a fully managed service, which means AWS takes care of scaling, availability, and maintenance. We don’t need to manage the X-Ray infrastructure by ourselves.

References

[1] Distributed Tracing in Microservices / Spring Boot [Online]. Available:https://medium.com/javarevisited/distributed-tracing-in-microservices-spring-boot-125272b58ad8

[2] Zipkin Data Source [Online]. Available: https://grafana.com/docs/grafana/latest/datasources/zipkin/

[3] What is AWS X-Ray [Online]. Available: https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html

[4] AWS X-Ray Data Source [Online]. Available: https://grafana.com/grafana/plugins/grafana-x-ray-datasource/

Thank you for read this article and If you like this article, do follow and clap 👏🏻.Happy coding, Cheers !!😊😊

--

--

Amila Iroshan
The Fresh Writes

Software Engineer | Open Source Contributor | Tech Enthusiast