Understanding Tracers: Monitoring and Debugging Distributed Systems

Shubham Chadokar
DevGlossary
Published in
4 min readJul 18, 2023

--

Photo by Tobias Rademacher on Unsplash

A tracer, in the context of distributed systems and software development, is a tool or component that helps developers and system administrators monitor and analyze the flow of requests and activities across different parts of a system. It is commonly used in distributed tracing systems.

The main purpose of a tracer is to capture information about the execution of a specific operation or request as it moves through various services and components of an application. This collected information is often referred to as “tracing data” or “spans.”

Here’s how a tracer typically works:

  1. Instrumentation: Developers add code to their application to “instrument” it with the tracer. This means that certain parts of the code are modified to start and stop tracing at specific points in the execution flow. Usually, the start of a request is marked with a “span start,” and the end is marked with a “span end.”
  2. Span: A span is a fundamental building block of tracing data. It represents a single unit of work in the application and contains information such as the start and end timestamps, the name of the operation being traced, any relevant metadata, and sometimes annotations or tags to provide additional context about the span.
  3. Context Propagation: As a request flows through different services, the tracing context (i.e., the information about the ongoing trace) needs to be propagated to ensure that all related spans are correctly linked together. This allows the tracer to reconstruct the entire trace path.
  4. Centralized Collection: The traced data (spans) are usually sent to a centralized data collector, which can be a service like Zipkin, Jaeger, or any other distributed tracing system. The collected data is stored and analyzed in this centralized location.
  5. Visualization: Once the tracing data is collected and analyzed, it can be visualized in a user interface provided by the tracing system. This visual representation allows developers and system administrators to understand the performance and behaviour of their applications, identify bottlenecks, and troubleshoot issues.

Tracers are valuable tools in complex, distributed systems where a single request can span multiple services, databases, and network calls. They provide crucial insights into the interactions and latencies between various components, helping developers optimize performance and improve the overall reliability of the system.

Zipkin Tracer is a tool used to track and monitor the flow of requests in a distributed system. It helps developers understand how different components of their applications communicate with each other and identify performance issues or bottlenecks.

In simple terms, imagine you have a large application with various microservices or components that work together to serve user requests. When a user makes a request that requires multiple services to handle it, the Zipkin tracer records information about that request as it moves through the different services. This information includes timestamps, the services involved, and any data relevant to that request.

Here’s a simple example to illustrate the concept:

Example 1

Let’s say you have an online store with several microservices:

  1. User Service: Responsible for managing user accounts.
  2. Catalogue Service: Handles product information.
  3. Order Service: Takes care of processing orders.

When a user visits the online store and wants to buy a product, the following steps take place:

  1. The user’s request first goes to the User Service to check if they are logged in.
  2. Then, the request goes to the Catalog Service to fetch product details.
  3. Finally, the request reaches the Order Service to complete the purchase.

With Zipkin, you can instrument your code in each of these services to record the start and end times of each request and how much time each service takes to process it.

Example 2

Let’s say you have an online shopping website with three microservices: “Frontend” (for handling user requests), “Inventory” (managing product stock), and “Payments” (processing payments).

  1. A user visits the website and searches for a product.
  2. The “Frontend” microservice receives the search request and needs to find the product’s availability from the “Inventory” service.
  3. The “Frontend” service sends a request to the “Inventory” service to check the stock.
  4. The “Inventory” service checks its database and responds to the “Frontend” service with the product’s availability.
  5. The “Frontend” service then requests the “Payments” service to initiate the payment process.
  6. The “Payments” service processes the payment request and sends back the payment confirmation.

Throughout this process, the Zipkin tracer records the interactions between these services. So, when you view the Zipkin dashboard, you’ll see a timeline or flow chart of how the request moved through the “Frontend,” “Inventory,” and “Payments” services. This way, you can see the time taken at each step and identify any delays or issues that might arise.

--

--

Shubham Chadokar
DevGlossary

Tech Writer and Author actively seeking collaboration opportunities with software products. I have previously collaborated with Brevo and Redis.