Request log tracing in NodeJS services

by Deepak Narayana Rao and Gaurev Katoch

Photo by Roman Romashov on Unsplash

As CasaOne started growing, we started breaking our system into multiple services for better organization of our engineering efforts. In this distributed system setup, troubleshooting issues became more challenging as a lot of requests flow through multiple services in the system.

The ability to trace logs across services became the need of the hour. We divided the problem to build following capabilities in the system

  1. Aggregated log store to search & view logs
  2. Ability to correlate logs for a request served by multiple services

1. Log Aggregation

Our services run as containers in AWS ECS and logs from these containers are stored in CloudWatch log groups. We built a log aggregation pipeline using functionbeat to push logs from AWS CloudWatch to ElasticSearch, Kibana is used as the frontend for searching and viewing logs.

This topic itself can be a separate post with more details. We’ll focus more on log correlation in this post.

2. Log Correlation

Our system predominantly consists of NodeJs services. We wanted to make sure our log correlation implementation is easy to introduce in existing services and scales well for building new services. We laid down the following requirements

  • Each service should either receive or generate unique tracing-id per request. This request could be an HTTP request or a message from the Queue
  • Each service should add this tracing-id in all log statements
  • Each service should add tracing-id to the downstream HTTP requests and the events published to the event bus
  • All the above requirements should be achieved using nonobtrusive code

This is relatively easier to achieve in a platform that uses thread per request model. One would use a ThreadLocal variable to store the tracing-id in request context and make it available for the logging framework to add it in each log statement.

NodeJS uses a single-threaded event loop with async calls, which made it slightly trickier. But, thanks to continuation-local-storage & cls-hooked, we get a construct similar to ThreadLocal storage to maintain the context across multiple async calls within a single request boundary

Our implementation of tracing solution shown above has the following touchpoints

1. Incoming HTTP requests: The ExpressJS middleware in the API layer of services generates tracing-id if not present and sets the tracing-id in the context for this request. This trace-id will be available for all statements executed within this request.

2. Logger: The custom log format used in the logger reads the tracing-id from context and adds it to every log statement.

3. Outgoing HTTP requests: The HTTP request wrapper library used in all services, passes the tracing-id to the downstream system as an HTTP header. The downstream service also uses the above-mentioned ExpressJS middleware to set CLS context and the custom format for logging

4. Outgoing Messages: The event publisher library(a wrapper for SNS client in our case) used in all services adds the tracing-id as a message attribute in each message published.

5. Incoming Messages: Similar to HTTP request handling, all messages from the queues(SQS) are consumed using a queue consumer wrapper library. Queue consumers using this library read the tracing-id from message attributes and set it in the request context. The logger reads the tracing-id from context and adds it in each log statement as configured above.

6. HTTP Response: The ExpressJS middleware mentioned above adds tracing-id in the response headers as well. This helps client-side applications to log the trace-id or embed it in the error shown to the user (we employ this method in our internal application). This makes it easy to get the trace id for a request or an error.

Results

This solution has enabled us to debug issues in minutes instead of hours of digging logs across services. Using right tools and libraries, we were able to roll out this solution in short duration and it has proven to be worth the time invested in building this our platform

PS: We are looking for great engineers to join our team and grow with us. Checkout latest jobs here

--

--

--

Engineering efforts to convert furnitures to utility

Recommended from Medium

Javascript Async programming with Async-Await and Async-ification

The Complete Guide to Your (awesome) React Website

react default webpage

Day 34 — To do App

Document Object Model

Palindrome JavaScript

Everything about DataRaptor in Omnistudio Salesforce Vlocity

Comparing dates in Elixir

Localization using Translate Service in Angular

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Deepak Narayana Rao

Deepak Narayana Rao

Human being running on curiosity

More from Medium

scaling nodejs based micro services

scaling-microservices

Microservices API Gateway vs. Traditional API Gateway

Using Telepresence to intercept microservices on a Kubernetes cluster

Logging in Microservices