X-Ray Vision

Or how to set up AWS ECS Fargate Service with X-Ray tracing using Typescript CDK

Timur Abduljalil
The Startup
5 min readSep 4, 2020

--

For larger organizations microservice architectures have become de facto standard. Admittedly, such architectures usually come with a set of trade offs. One of these trade offs is a level of complexity that such systems introduce. A single http request may trigger a web of services working together to satisfy it. As you can imagine, this can result in a significant operational complexity.

Amazon and Netflix microservices are depicted in a death star diagram and showing the complexity of the systems
Microservice Deathstars

Observability tools like distributed tracing have become a critical component in observability. They allow us to trace a lifecycle of the request through the entire system. AWS X-Ray is one such tool. In this post, I will show you how you can set up X-Ray distributed tracing for your services that run in ECS. Here is what the result should look like:

How does X-Ray tracing for ECS work?

X-Ray Architecture

AWS documentation does a good job explaining how tracing with X-Ray works, but let’s summarize some of the concepts here. Our goal is to assemble a big picture of what request lifecycle looks like. In order to accomplish this goal, we just need every request through the system to carry some context that would group them as the part of the same lifecycle. This is done through a tracing header id. With that in place, each time the request comes in, we can parse the context from the header and add our own context to the header for downstream requests.

Having tracing header id to support context passing is only one part of the puzzle. The other thing that is necessary is for us to collect all the traces within the same lifecycle and put them together. This part is known as reporting. Each component needs to report its own trace info to a single entity that would build the big picture for us. The reporting within AWS X-Ray is done using X-Ray API.

This implies that we must send traces to X-Ray API every time there is a request in the system, having a potential of adding an unnecessary overhead. The way that AWS solves this problem is by introducing a notion of an X-Ray Daemon. X-Ray Daemon is a process that runs along side your service. The job of X-Ray Daemon is to collect the traces and send them to the API in a performant manner. Instead of sending an http request to an X-Ray API in a service, the instrumentation will report tracing info to the daemon using a lighter protocol (UDP) and that daemon in turn handles reporting to X-Ray API in an optimal manner.

Interestingly, this is how AWS handles reporting internally for other AWS resources. For example, if we turn tracing on for a lambda function, then lambda function will report traces to an X-Ray Daemon that AWS runs internally for that service.

This is what all of this looks like in practice:

  1. Instrument our code to collect metrics. Instrumentation simply means that we parse incoming requests for external context and wrap outgoing requests with our own context. This is done using a AWS X-Ray SDK. The SDK would automatically emit data on UDP port 2000.
  2. Set up X-Ray Daemon as a side car to our service. It will listen to UDP port 2000, collect the traces and pass them to AWS X-Ray API.
  3. Make sure that role executing the tasks has permission to write to AWS X-Ray API.

Let’s see the code

Just a reminder that this article primary concerns itself with how to do this in CDK + Typescript. How ECS works and how to set it up is out of scope. For a full example, however, refer to the repo here. But please remember that it is just an example and needs refinement to be used in production.

Step 1: Instrument our code

The process of instrumentation depends on language of your service. In this article I will show you how to instrument a node service. Look through the docs to see if instrumentation is available for your language.

Node application is really easy to instrument. We just need aws-xray-sdk to do the heavy lifting. More info on it here. The code below is a very simplistic example of how a node express service can be set up.

From the above, line 5, sets up X-Ray SDK to trace every call to an AWS SDK. We therefore see that on line 15, we create an SNS client using our wrapper. Line 6 and 7, wrap http and https globally to handle tracing of any outgoing requests for axios.

Step 2: Set up an X-Ray Daemon

Setting up a side car with X-Ray Daemon is super simple. Assuming we run in ECS and have our node app define a task. then all we need to do is to add a container that would run X-Ray Daemon to the same task. Here is an example:

Step 3: Add Permission to report traces to X-Ray API

Last, but not least, the task role must have permission to do the reporting to X-Ray API. It is as simple as doing this in cdk:

Conclusion: The Good, the Bad and the Ugly

That’s it! X-Ray is easy to set up and adds a ton of utility by enabling distributed tracing. However, it is not without its own downsides. For example, right now instrumenting an architecture for SNS → SQS → Lambda might not produce the desired results. The service is evolving and with time we expect the these gaps to be filled. In the meantime keep an eye on AWS announcements to stay updated!

--

--