Velotio Technologies
Feb 1 · 7 min read

Introduction

Recently, there has been a lot of discussion around OpenTracing. We’ll start this blog by introducing OpenTracing, explaining what it is and why it is gaining attention. Next, we will discuss distributed tracing system Jaeger and how it helps in troubleshooting microservices-based distributed systems. We will also set up Jaeger and learn to use it for monitoring and troubleshooting purposes.

Drift to Microservice Architecture

Microservice Architecture has now become the obvious choice for application developers. In the Microservice Architecture, a monolithic application is broken down into a group of independently deployed services. In simple words, an application is more like a collection of microservices. When we have millions of such intertwined microservices working together, it’s almost impossible to map the inter-dependencies of these services and understand the execution of a request.

In case of a failure in a monolithic application, it is much easier to understand the path of a transaction and do the root cause analysis with the help of logging frameworks. But in a microservice architecture, logging alone fails to deliver the complete picture.

Is this service the first one in the call chain? How do I span all these services to get insight into the application? With questions like these, it becomes a significantly larger problem to debug a set of interdependent distributed services in comparison to a single monolithic application, making OpenTracing more and more popular.

OpenTracing

What is Distributed Tracing?

Distributed tracing is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.

How OpenTracing Fits Into This?

The OpenTracing API provides a standard, vendor-neutral framework for instrumentation. This means that if a developer wants to try out a different distributed tracing system, then instead of repeating the whole instrumentation process for the new distributed tracing system, the developer can simply change the configuration of the Tracer.

Here are some basic terminologies of Opentracing:

Span — It represents a logical unit of work that has an operation name, the start time of the operation, and the duration.

Trace — A Trace tells the story of a transaction or workflow as it propagates through a distributed system. It is simply a set of spans sharing a TraceID. Each component in a distributed system contributes its own span.

OpenTracing is a way for services to “describe and propagate distributed traces without knowledge of the underlying OpenTracing implementation.

Let us take the example of a service like renting a movie on iTunes (or other movie rental service). A service like this requires many other microservices to check that the movie is available, proper payment credentials are received, and enough space exists on the viewer’s device for download. If either one of those microservice fail, then the entire transaction fails. In such a case, having logs just for the main rental service wouldn’t be very useful for debugging. However, if you were able to analyze each service you wouldn’t have to scratch your head to troubleshoot which microservice failed and what made it fail.

In real life, applications are even more complex and with the increasing complexity of applications, monitoring the applications has been a tedious task. Opentracing helps us to easily monitor:

  • Spans of services
  • Time taken by each service
  • Latency between the services
  • Hierarchy of services
  • Errors or exceptions during execution of each service.

Jaeger: A Distributed Tracing System by Uber

Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including:

  • Distributed transaction monitoring
  • Performance and latency optimization
  • Root cause analysis
  • Service dependency analysis
  • Distributed context propagation

Major Components of Jaeger

Jaeger Client Libraries — Jaeger clients are language-specific implementations of the OpenTracing API.

Agent — The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the collectors away from the client.

Collector — The Jaeger collector receives traces from Jaeger agents and runs them through a processing pipeline. Currently, the pipeline validates traces, indexes them, performs transformations, and finally, stores them. Jaeger’s storage is a pluggable component which currently supports Cassandra, Elasticsearch, and Kafka.

Query — Query is a service that retrieves traces from storage and hosts a UI to display them.

Ingester — Ingester is a service that reads from Kafka topic and writes to another storage backend (Cassandra, Elasticsearch).

Running Jaeger in a Docker Container

1. First, install Jaeger Client on your machine:

$ pip install jaeger-client

2. Now, let’s run Jaeger backend as an all-in-one Docker image. The image launches the Jaeger UI, collector, query, and agent:

$ docker run -d -p6831:6831/udp -p16686:16686 jaegertracing/all-in-one:latest

TIP: To check if the docker container is running, use: Docker ps.

Once the container starts, open http://localhost:16686/ to access the Jaeger UI. The container runs the Jaeger backend with an in-memory store, which is initially empty, so there is not much we can do with the UI right now since the store has no traces.

Creating Traces on Jaeger UI

1. Create a Python program to create Traces:

Let’s generate some traces using a simple python program. You can clone the Jaeger-Opentracing repository given below for a sample program that is used in this blog.

The Python program takes a movie name as an argument and calls three functions that get the cinema details, movie showtime details, and finally book a movie ticket.

It creates some random delays in all the functions to make it more interesting, as, in reality, the functions would take a certain time to get the details. Also, the function throws random errors to give us a feel of how the traces of a real-life application may look like in case of failures.

Here is a brief description of how OpenTracing has been used in the program:

  • Initializing a tracer:
  • Using the tracer instance:
tracer = init_tracer('booking')
  • Starting new child spans using start_span:
with tracer.start_span('CheckCinema', child_of=get_current_span()) as span:
  • Using Tags:
span.set_tag('Movie', movie)
  • Using Logs:
span.log_kv({'event': 'CheckCinema' , 'value': cinema_details })

2. Run the python program:

$ python booking-mgr.py <movie-name>Initializing Jaeger Tracer with UDP reporter 
Using sampler ConstSampler(True)
opentracing.tracer initialized to <jaeger_client.tracer.Tracer object at 0x7f72ffa25b50>[app_name=booking]
Reporting span cfe1cc4b355aacd9:8d6da6e9161f32ac:cfe1cc4b355aacd9:1 booking.CheckCinema
Reporting span cfe1cc4b355aacd9:88d294b85345ac7b:cfe1cc4b355aacd9:1 booking.CheckShowtime
Ticket Details
Reporting span cfe1cc4b355aacd9:98cbfafca3aa0fe2:cfe1cc4b355aacd9:1 booking.BookShow
Reporting span cfe1cc4b355aacd9:cfe1cc4b355aacd9:0:1 booking.booking

Now, check your Jaeger UI, you can see a new service “booking” added. Select the service and click on “Find Traces” to see the traces of your service. Every time you run the program a new trace will be created.

You can now compare the duration of traces through the graph shown above. You can also filter traces using “Tags” section under “Find Traces”. For example, Setting “error=true” tag will filter out all the jobs that have errors, as shown:

To view the detailed trace, you can select a specific trace instance and check details like the time taken by each service, errors during execution and logs.

The above trace instance has four spans, the first representing the root span booking, the second is the CheckCinema, the third is the CheckShowtime and last is the BookShow. In this particular trace instance, both the CheckCinema and CheckShowtimeinvocation have reported an error, indicated by the error=true tag.

Conclusion

In this blog, we’ve described the importance and benefits of OpenTracing, one of the core pillars of modern applications. We also explored how distributed tracer Jaeger collect and store traces while revealing inefficient portions of our applications. It is fully compatible with OpenTracing API and has a number of clients for different programming languages including Java, Go, Node.js, Python, PHP, and more.

References

******************************************************************Velotio Technologies is a software engineering firm, with core expertise in Data Science, Machine Learning and DevOps. Our modus operandi is working with the latest transformative tech to turbocharge customer success.

Interested in learning more about us? We would love to connect with you on our Website, LinkedIn or Twitter.

*******************************************************************

Velotio Perspectives

Thoughts and ideas on startups, enterprise software & technology by the Velotio team. Learn more at www.velotio.com.

Velotio Technologies

Written by

Velotio Technologies is an outsourced software and product development partner for technology startups & enterprises. #Cloud #DevOps #ML #UI #DataEngineering

Velotio Perspectives

Thoughts and ideas on startups, enterprise software & technology by the Velotio team. Learn more at www.velotio.com.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade