Keeping request context over the function call stack

Published in

SSENSE-TECH

8 min readMay 20, 2022

Introduction

The SSENSE ecosystem is composed of multiple applications and services, each of which may have multiple entry points or execution triggers. Some of our services serve HTTP APIs, some are queue consumers, and others are both at the same time.

Oftentimes the maintainers of these applications are interested in finding all events and transactions related to a particular request or transaction. For example, upon a failure when placing an order, we might want to investigate which services failed, misbehaved, or delayed the flow.

In order to get good observability across domains, we came up with a simple solution: we’d write a log entry for every request within our stack, including the response code, response time and a request-identifier, which we call a requestId, a simple hash. When different services perform API calls they are expected to pass along the requestId as part of the request headers. When services handle an API call they are expected to accept the requestId they have received and pass it along to other services in case they perform additional API calls.

If a service receives a call without any requestId, it creates one and uses it as if it was received from the client. This newly built requestId will be passed along to other services and used on all of their log entries.

In other words, all requests end up being identified somehow with a hash, and that hash is passed along whenever possible. This allows us to search our log facility for all entries bound to a specific request identifier and obtain a trace of all requests related to a specific action including relevant information such as the timestamp, user-agent (or event publisher), the response time, and response code.

Maintainers can then search all log entries with a specific requestId to obtain a list of all services and actions involved in the sequence of such specific operations, allowing us to identify bottlenecks, performance degradation, bugs, unexpected behavior and other kinds of problems.

Below is an example of a call between two services:

This seems to work well, in theory. But in practice there’s a hiccup: in order to expose the requestId on every remote call and log output, we need to first ensure that requestId is available across the entire function call stack. This means that, in a normal situation, we’d need to pass the requestId as a parameter from the application entry-point, through all controllers, models, repositories, and API clients. As such, most of the functions within the call stack will need to be modified to receive an extra parameter, the requestId itself.

Can you imagine all the changes that would be needed to pass a requestId all the way from the main API entrypoint to the DB adapter used within a repository? It would be very tedious. See below the changes highlighted in green:

There must be a simpler way to keep context information across all functions in a single request, in such a way that all calls within a service are aware of the context around their invocation. And yes, there is: in Python the name is ContextVars; in NodeJS, the name is Asynchronous context tracking. This feature is known as thread-local storage. It’s a feature that allows you to store a state when executing an async function, and then make it available to all the code paths within said function.

The diagram below shows a context, highlighted in yellow. Note the requestId is not passed as parameter anymore, but it became part of the context itself:

By leveraging this feature, we are able to access the context on which a function has been called; and this, without passing any extra parameter to the function. The downside is an increase in memory usage and some slight performance penalization, expect a 5 to 10% performance hit. This is due to NodeJs being single-threaded, the performance hit comes from “simulating threads”’ when, in fact, there’s only a single thread. Moreover, threaded languages — imagine for example a PHP application running on Nginx as a web server — also have a penalty on resource consumption given that each thread needs to boot the entire application, something that does not happen with NodeJs.

Now let’s look at how to add context-tracking capabilities into two HTTP servers; one written in Python 3.7 and FastAPI, and another one written in NodeJs and Express.

A Word of Caution

All the examples below are meant to work exactly as they were written. Note that the implementations below are very naive, as the main purpose is to show the capabilities of the languages, but the code is not meant to be at production-level quality.

NodeJs Notice

For NodeJs, different frameworks and libraries offer their own integration layer with asynchronous local storage. Depending on the frameworks you use in your project, you might need to leverage their recommended implementation rather than writing an adapter from scratch.

Although the async_hooks module is considered experimental at the time of this writing in Node 18, the AsyncLocalStorage class provided by such a module is considered as stable and ready to be used in production environments.

Python Implementation With ContextVars

What does our context contain?

For our example we’ll consider our context as being:

The username that initiated the request. For these examples we’ll use a query parameter named “username” by appending “?username=example” into all URLs. When no username is provided we’ll fallback to “unknown”.
The user agent, which is sent by the browser automatically. When no user-agent is provided we’ll fallback to “unknown”.
The requestId, which is expected to be kept throughout all internal calls but to change on every HTTP request.
A mechanism to determine how much time has been spent so far on this specific request.

Hands on!

1. First of all, we need to define our context class, and which information will be held. In this case we want to log the username, the user-agent, the time when the request was initiated and the time spent on serving this request. The context class looks as follows:

2. Then we’re going to create a database class that will pretend a query is being executed. Here’s the sample code:

3. Then we’ll define a logging format to set what logged lines will look like. This will be done by leveraging the Python logger module and the fileConfig option, which allows the configuring of a logger in an external file as long as the right format is used.

4. Once everything has been set up, we can start writing our main API Handler.

Finally, you might want to run this code in your local environment. You can find below a Dockerfile to build the sample image, and a docker-compose file that would let you run the example with a simple docker-compose up command.

Spin up your docker container with docker-compose up and wait for the image to be built. Once the service is running in your computer you can try hitting these endpoints:

http://127.0.0.1:8080/ — you’ll see a Hello World message and the description of the context.
http://127.0.0.1:8080/pretend-query/word?username=some-user — you’ll receive a random list of numbers. The console will show the logged information, including the context; something similar to the following lines:

[database] Func[pretend_sql_query()] Line[16 ] — DB First Log: {“user_agent”: “Mozilla/5.0 …”, “started_at”: “2022–02–12 16:46:02.205214+00:00”, “username”: “some-user”, “request_id”: “b90afd07–557e-4038–87b0–96a2def4aa5c”, “time_elapsed”: 0.0005727999960072339}[database] Func[pretend_sql_query()] Line[18 ] — DB Second Log: {“user_agent”: “Mozilla/5.0 …”, “started_at”: “2022–02–12 16:46:02.205214+00:00”, “username”: “some-user”, “request_id”: “b90afd07–557e-4038–87b0–96a2def4aa5c”, “time_elapsed”: 2.0036450999905355}

Note the invocation of pretend_sql_query did not receive any parameter, but it still logged the context information, including the requestId, the username, and the time spent since the request was received.

Python Key Take Away

There are two ways to execute code within a context in Python. One is using the with keyword with a helper Context class, as shown above. Alternatively, you can create a context using the ContextVar module, and then using the run method of the context you created. This second alternative works similarly to the NodeJs implementation. And talking about NodeJs, let’s jump to the…

NodeJs Implementation With Async Local Storage

In NodeJs the implementation isn’t as straightforward as it is with Python. This is due to how NodeJs works internally. When asynchronous operations are done, NodeJs creates a new context. Hence, some internal functions, like for example console.log, will create a new context when they are called.

Due to this limitation, the NodeJs official documentation suggests creating a custom synchronous logger.

1. Create a synchronous logger leveraging the fs module.

2. Prepare a mock database service to simulate queries. The code is simple, we’re returning the results after 300ms. Two versions are exposed, one as a promise and another one as a callback, for those who prefer a classic JavaScript approach:

3. Define the context module. This module will rely on the AsyncLocalStorage class from the async_hooks module.

4. Create the actual web service. The service will listen on port 8080 and will contain three main endpoints: the index or root, an example using an async function, and an example using callbacks.

5. With all files in place, the last step is to create the Docker container and the Docker-Compose stack.

The Dockerfile is very simple. Dependencies are hardcoded here to avoid the need for pushing a package.json. Save the snippet below as Dockerfile:

And the Docker-Compose stack to be saved as docker-compose.yml:

With all these files created you can start testing the application. Run docker-compose up in your terminal. Once the container starts you can visit the index at http://0.0.0.0:8080/ and click on the two examples. Feel free to change the URL to see how context changes too.

Upon hitting the different examples you’ll see some lines being logged in the console, showing the context information for each call. Here’s an example:

Within pretendSqlQueryCallback {
   currentContext: {
     userAgent: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36’,
     username: ‘SOMEONE’,
     reqId: ‘40a39e79-c499–4a15-bc4c-06f2a8041889’,
     startedAt: 2022–02–23T00:43:39.101Z,
     getExecutionTime: [Function: getExecutionTime]
  }
}

Note the username and the user-agent weren’t passed as parameters when calling functions on the mock database class, but they were still available through the Context.

Conclusion

Both Python and NodeJs seem to have good support for Asynchronous Local Storage; these features are considered stable in both languages. Although using this feature has an impact on memory and CPU usage, the resulting code ends up being cleaner and easier to maintain in the long term. That being said, assuming Moore’s law is still on point, such performance impact will gradually fade away as technology evolves.