Tracing Requests in Serverless Systems with AWS CloudWatch ServiceLens
Tracing, the process of being able to follow a single incoming request and its subsequent effects across multiple services, has been an important topic in system architecture ever since microservices have become a mainstream way of doing things. When the whole consists of small services each focused on handling specific dedicated parts, understanding the event chains behind actions occurring in individual services is critical in areas such as troubleshooting and performance optimization.
With serverless architectures, tracing becomes even more important. Because your building blocks are fundamentally at the function level, each endpoint of an API can be thought of as a separate service — a nanoservice, if you will. As these technologies mature, a significant part of a cloud platform’s serverless prowess will be the services they provide to handle tracing and subsequent monitoring. If a cloud platform can offer their own managed services that integrate natively and can rival traditional solutions even partially, it would be a huge selling point.
The base managed tracing solution AWS offers is AWS X-Ray. If you instrument your application code with the X-Ray SDK, and provide an X-Ray daemon alongside your application (automatically provided for AWS Lambda functions, takes a bit of work for EC2 or ECS/EKS), you can trace your requests through their entire journey in the AWS ecosystem.
Of course, doing the actual trace is only half the battle — the information also needs to be handily available and searchable in an usable way. This is where X-Ray has fallen a bit short so far, especially for serverless architectures. The UI, with its service map overview, is not very well designed to be able to handle the hundreds of nodes and entry points a serverless application may have, suffers from poor performance at times, and finding a certain trace can be quite laborious even if you know exactly what you are looking for.
AWS CloudWatch ServiceLens
In true AWS fashion, the next iteration of providing tracing based information is a cryptically named service all the way on the other side of the AWS management console — under CloudWatch, with the name ServiceLens, released in November 2019. At its core it’s simply a new UI for the same X-Ray data collected in the same way, with some additional monitoring tools and integrations for things like alarms and logs.
The improved UI is actually quite useful without any further tinkering at all: the service map view similar to the X-Ray one (and clearly being based on it) scales better with subgraph views and more controls, and along with a new separate list view actually serves as a handy dashboard to provide an overview of what’s going on. The time window restrictions are still the same: the total scope of a search is capped at 6 hours, which is understandable considering the volume of traces possible — the X-Ray sampling policy is configurable, but by default the first request every second is traced, plus 5% of other requests.
Beyond Simple Tracing
The idea behind ServiceLens is to combine better with the other monitoring aspects of CloudWatch, including CloudWatch Alarms. While it’s certainly useful to see alarms related to a resource on the service map, the alarms themselves are often quite cumbersome to work with if you want to provide any context for the reason they fire, even if you use Metric Math to be able to fine-tune the alarm thresholds further.
The most important aspect of this kind of monitoring remains the log handling. ServiceLens can automatically provide the logs for the initial Lambda handler along with the trace (utilizing CloudWatch Insights), which is already a significant improvement for troubleshooting. However, the holy grail of tracing would be automatic aggregation of all logs relevant to a whole trace from all appropriate services. ServiceLens can be set up to do this as well —but there’s a big catch: currently it only works with the X-Ray Java SDK. The SDK restriction applies for a lot of other aspects of X-Ray as well — using Java unlocks a whole slew of additional features. Hopefully they will be available on other SDKs sooner rather than later.
So far, the way to go with AWS has been to use something like ELK or Graylog on top of CloudWatch for anything beyond the basic needs for more complicated production systems. Will ServiceLens and similar platform-provided services supersede these traditional centralized logging and monitoring solutions? Probably not. Can they be a more time and cost effective solution for the vast majority of use cases? Certainly. Time will tell which track advances faster.