Investigating Node.js Performance: Event Loop and Network I/O (Part 2)

Xue Cai
The Airtable Engineering Blog
14 min readSep 9, 2021

Deep Dive with Code Examples and Node.js Async Hooks

By Xue Cai

In a previous blog post, we discussed a performance investigation that uncovered an unexpected interaction between Node.js event loop and network Input/Output (I/O), which significantly slowed down loading Airtable bases. We solved the problem by increasing sendBufferSize from 8 KB (MacOS) / 208 KB (Linux) to 4 MB.

In this blog post, we provide simplified code examples to help others experiment with Node.js performance in a more concrete way. You can also find all code and output logs in this Github repo. Today, we’ll cover various configurations that illustrate:

  • Basic Node.js web app
  • Node.js web app with large sendBufferSize
  • Node.js web app with large sendBufferSize and a concurrency queue
  • Node.js web app with large sendBufferSize and a concurrency queue and async gaps
  • Node.js web app with a strict concurrency queue

Basic Node.js web app

Below is a simple Express/Node.js web app. For each incoming request, the requestHandler serializes a big JSON object and sends the serialized string as the response. JSON.stringify is an expensive synchronous operation, especially when serializing big objects. In the example below, the object is around 55 MB after serialization.

Basic Node.js web app

Controlled Experiment: Sending five concurrent requests

When sending five requests at the same time,

(for x in $(seq 5); do time curl --unix-socket /tmp/sock http://localhost/ > /dev/null & done)

we get log output that look like this:

Example app listening on Unix domain socket /tmp/sock!
[0] Serializing response for request 1…
[340] Sending 55MB response for request 1…
[495] — Handler done for request 1 -
[496] Serializing response for request 2…
[804] Sending 55MB response for request 2…
[940] — Handler done for request 2 -
[940] Serializing response for request 3…
[1244] Sending 55MB response for request 3…
[1384] — Handler done for request 3 -
[1386] Serializing response for request 4…
[1688] Sending 55MB response for request 4…
[1825] — Handler done for request 4 -
[1831] Serializing response for request 5…
[2130] Sending 55MB response for request 5…
[2269] — Handler done for request 5 -
[2418] — Took 1613ms to flush response for request 2 —
[2425] — Took 1181ms to flush response for request 3 —
[2428] — Took 298ms to flush response for request 5 —
[2429] — Took 741ms to flush response for request 4 —
[2432] — Took 2093ms to flush response for request 1 —

The above behavior provides a bad user experience, because early requests are “waiting” for later requests to be flushed together. The more concurrent requests, the longer the delay for all requests! In the example above, all five requests took nearly two and half seconds to finish from the end user (curl)’s perspective. This bad behavior is exactly what we observed in our production environment described in the previous blog post.

Controlled Experiment: Sending just one request

As a comparison, here is the log output for sending only one request, which only took a half second, with breakdown:

  • JSON.stringify: 332 ms
  • res.send: 466–332 = 134 ms
  • res.send end to res.on(“finish”): 554–466 = 88 ms
Example app listening on Unix domain socket /tmp/sock!
[0] Serializing response for request 1…
[332] Sending 55MB response for request 1…
[466] — Handler done for request 1 -
[554] — Took 222ms to flush response for request 1 —

So why are early requests “waiting” for later requests? In the previous blog post, we hypothesized that in each loop, the Unix domain socket was only able to send 8 KB. Because the loop was often blocked by processing of later requests, we did not make much progress flushing the responses of the early requests.

Let’s use Node.js Async Hooks to examine this hypothesis in detail.

Analyze with Node.js async hooks

The Node.js async_hook module provides an API to track lifecycle events of asynchronous resources. An asynchronous resource represents an object with associated callbacks. For example, a request is a resource and its handler is an associated callback.

Four hooks are defined for asynchronous resources:

  • init: called during resource construction
  • before: called just before a resource callback is called
  • after: called just after a resource callback has returned
  • destroy: called when the resource is destroyed

Synchronous operations can be tracked by before and after hooks. Here is a simple example:

Analyze with Node.js async hooks (full code)

After adding async hooks to the basic web app, we can exercise their functionality by sending just one request for simplicity. This yields log output like below. Since the full output can be overwhelming, we’ve retained only the most interesting messages:

  • asyncId: 8, type: “PIPESERVERWRAP”: the web server listening to Unix domain socket
  • asyncId: 14, type: “PIPEWRAP”: wrapper for a request, caused the initialization of asyncId: 16 and asyncId: 22
  • asyncId: 16, type: “HTTPINCOMINGMESSAGE”: roughly equivalent to req
  • asyncId: 22, type: “WRITEWRAP”: roughly equivalent to res.on(“finish”)
  • requestHandler of asyncId: 16 took 459 ms in one synchronous operation, and annotated with “event loop blocked!”. 459 ms ~= 328 ms (JSON.stringify) + 129 ms (res.send)
>> init: executionAsyncId: 0 | asyncId: 8, type: "PIPESERVERWRAP", triggerAsyncId: 0, resource: Pipe
...
Example app listening on Unix domain socket /tmp/sock!
...
>> init: executionAsyncId: 0 | asyncId: 14, type: "PIPEWRAP", triggerAsyncId: 8, resource: Pipe
...
>> init: executionAsyncId: 8 | asyncId: 16, type: "HTTPINCOMINGMESSAGE", triggerAsyncId: 14, resource: HTTPServerAsyncResource
...
>> before: executionAsyncId: 16 | asyncId: 16
...
[0] Serializing response for request 1...
...
[328] Sending 55MB response for request 1...
...
>> [456] init: executionAsyncId: 16 | asyncId: 22, type: "WRITEWRAP", triggerAsyncId: 14, resource: WriteWrap
[457] - Handler done for request 1 -
>> [457] after: executionAsyncId: 16 | asyncId: 16, durationMs: 459, event loop blocked!
...
>> [457] before: executionAsyncId: 23 | asyncId: 23
>> [457] after: executionAsyncId: 23 | asyncId: 23, durationMs: 0
...
>> [457] destroy: executionAsyncId: 0 | asyncId: 23
>> [543] before: executionAsyncId: 22 | asyncId: 22
...
>> [544] res.finish: executionAsyncId: 22 |
[544] -- Took 215ms to flush response for request 1 --

By tracking synchronous operations from the end of res.send to the start of res.on(“finish”), we can roughly determine how many async gaps are needed to flush the response. You can see how the stats are calculated by referring to the code.

Between request 1 resSendEnd and request 1 resFinish
— total time: 87031 us
— total cpu time: 396 us
total idle time: 86635 us
— number of sync operations: 5

From the stats above, for request 1, the event loop needed 86.6 ms of idle time to flush the response (given an 8 KB sendBufferSize). There were 5 synchronous operations in between, but fortunately they were all short and only spanned 396 us in total.

To summarize, the graph below illustrates the major events of request 1, with short synchronous operations omitted for clarity. The event loop was first blocked for 459 ms because of one synchronous operation that does JSON.stringify and res.send. The loop then took 86.6 ms to flush the 55 MB response, at which time there were no long synchronous operations that blocked the event loop.

Next, let’s look at the stats when sending 5 concurrent requests. This time we track synchronous operations from the end of res.send to the start of either res.on(“finish”) or JSON.stringify (of the next request), whichever comes first. We know that JSON.stringify is going to block the event loop for more than 300 ms, so if JSON.stringify happens first, the request delay is going to add another 300 ms for sure.

Between request 1 resSendEnd and request 2 serializeStart
— total time: 1727 us
— total cpu time: 941 us
total idle time: 786 us
— number of sync operations: 9
Between request 2 resSendEnd and request 3 serializeStart
— total time: 486 us
— total cpu time: 116 us
total idle time: 370 us
— number of sync operations: 5
Between request 3 resSendEnd and request 4 serializeStart
— total time: 2008 us
— total cpu time: 133 us
total idle time: 1875 us
— number of sync operations: 6
Between request 4 resSendEnd and request 5 serializeStart
— total time: 558 us
— total cpu time: 119 us
total idle time: 439 us
— number of sync operations: 5
Between request 5 resSendEnd and request 4 resFinish
— total time: 169364 us
— total cpu time: 118 us
total idle time: 169246 us
— number of sync operations: 5

We can see that before request 2 began to serialize, request 1 only got 786 us of idle time, which is not enough time for the event loop to flush the response. Only when all requests finished serializing did the event loop get 169 ms of idle time with which to flush responses.

The graph below visualizes what happened (for simplicity, we only visualized the case when sending 2 concurrent requests (full logs)). Request 1 had less than 1 ms of idle time to flush its response, then the event loop was occupied by JSON.stringify and res.send of request 2, which delayed request 1 for more than 459 ms.

Summary

When the response size is 55 MB and sendBufferSize is 8 KB, Node.js needs ~86.6 ms idle time to flush the response.

Any synchronous operations before the end of flushing will further delay the request. The longer the synchronous operation, the worse the delay.

Node.js web app with large sendBufferSize

As mentioned earlier, we solved the problem by increasing the Unix domain socket’s sendBufferSize to 4 MB. The following code example does that via setSendBufferSize. For simplicity, we only show the code snippets that differ from the basic app.

Node.js web app with large sendBufferSize (full code)

However, the log output still doesn’t look good. Why is this happening?

Example app listening on Unix domain socket /tmp/sock!
[0] Serializing response for request 1…
[327] Sending 55MB response for request 1…
[461] — Handler done for request 1 -
[463] Serializing response for request 2…
[767] Sending 55MB response for request 2…
[896] — Handler done for request 2 -
[904] Serializing response for request 3…
[1212] Sending 55MB response for request 3…
[1342] — Handler done for request 3 -
[1343] Serializing response for request 4…
[1640] Sending 55MB response for request 4…
[1774] — Handler done for request 4 -
[1774] Serializing response for request 5…
[2067] Sending 55MB response for request 5…
[2207] — Handler done for request 5 -
[2223] — Took 1896ms to flush response for request 1 —
[2224] — Took 1457ms to flush response for request 2 —
[2239] — Took 1027ms to flush response for request 3 —
[2240] — Took 600ms to flush response for request 4 —
[2242] — Took 175ms to flush response for request 5 —

Analyze with Node.js async hooks

To figure out why a larger sendBufferSize doesn’t solve the problem, we use async hooks to inspect, as we did earlier. (full code)

When sending just one request, the event loop needed 7.4 ms of idle time to flush the response. This is much smaller than before (86.6 ms)! It’s because sendBufferSize is now much larger (8 KB to 4 MB). So the event loop required fewer iterations, and therefore less idle time, to flush the full response.

Between request 1 resSendEnd and request 1 resFinish
— total time: 7845 us
— total cpu time: 425 us
— total idle time: 7420 us
— number of sync operations: 5

When sending five concurrent requests, unfortunately the idle time before the next request’s serialization is not long enough (516 us). As a result, the responses of all five requests are flushed at the end after all requests have finished serializing.

Between request 1 resSendEnd and request 2 serializeStart
— total time: 928 us
— total cpu time: 412 us
total idle time: 516 us
— number of sync operations: 5
Between request 2 resSendEnd and request 3 serializeStart
— total time: 407 us
— total cpu time: 118 us
total idle time: 289 us
— number of sync operations: 5
Between request 3 resSendEnd and request 4 serializeStart
— total time: 549 us
— total cpu time: 188 us
total idle time: 361 us
— number of sync operations: 5
Between request 4 resSendEnd and request 5 serializeStart
— total time: 2074 us
— total cpu time: 111 us
total idle time: 1963 us
— number of sync operations: 5
Between request 5 resSendEnd and request 1 resFinish
— total time: 8891 us
— total cpu time: 159 us
total idle time: 8732 us
— number of sync operations: 6

Summary

When the response size is 55 MB and the sendBufferSize is 4 MB, Node.js needs ~7.4 ms idle time to flush the response.

A larger sendBufferSize can reduce the amount of idle time that the event loop needs to flush the response, but if there is not enough idle time, the response will still be delayed.

Node.js web app with large sendBufferSize and a concurrency queue

The workerChild process described in the previous blog post uses a concurrency queue to process requests. It uses async.queue to control how many requests are processed concurrently. The code example shows how the queue is used:

Node.js web app with large sendBufferSize and a concurrency queue

Per the above, only one request can be processed concurrently. However, there is one subtlety: response flushing of the previous request overlaps with processing of the next request. This is because async.queue’s callback() is called right after requestHandler(task), which doesn’t include response flush time that ends at res.on(“finish”). Therefore, the processing of the next request can still block response flushing of the previous request, as shown in the log output below:

Example app listening on Unix domain socket /tmp/sock!
[0] Serializing response for request 1…
[327] Sending 55MB response for request 1…
[463] — Handler done for request 1 -
[465] Serializing response for request 2…
[757] Sending 55MB response for request 2…
[894] — Handler done for request 2 -
[898] Serializing response for request 3…
[1192] Sending 55MB response for request 3…
[1321] — Handler done for request 3 -
[1321] Serializing response for request 4…
[1623] Sending 55MB response for request 4…
[1753] — Handler done for request 4 -
[1754] Serializing response for request 5…
[2055] Sending 55MB response for request 5…
[2181] — Handler done for request 5 -
[2198] — Took 1870ms to flush response for request 1 —
[2209] — Took 586ms to flush response for request 4 —
[2210] — Took 1453ms to flush response for request 2 —
[2217] — Took 1024ms to flush response for request 3 —
[2217] — Took 162ms to flush response for request 5 —

Analyze with Node.js async hooks

Analysis with async hooks yields the same conclusion as above: the idle time is insufficient for the event loop to flush the response.

Between request 1 resSendEnd and request 2 serializeStart
— total time: 2745 us
— total cpu time: 1003 us
total idle time: 1742 us
— number of sync operations: 19
Between request 2 resSendEnd and request 3 serializeStart
— total time: 231 us
— total cpu time: 90 us
total idle time: 141 us
— number of sync operations: 2
Between request 3 resSendEnd and request 4 serializeStart
— total time: 123 us
— total cpu time: 71 us
total idle time: 52 us
— number of sync operations: 1
Between request 4 resSendEnd and request 5 serializeStart
— total time: 5595 us
— total cpu time: 75 us
total idle time: 5520 us
— number of sync operations: 1
Between request 5 resSendEnd and request 1 resFinish
— total time: 23329 us
— total cpu time: 237 us
total idle time: 23092 us
— number of sync operations: 7

Node.js web app with large sendBufferSize and a concurrency queue and async gaps

What if we introduce some async gaps at the beginning of each request’s processing? Will these async gaps help flush the response of the previous request? Let’s find out!

Node.js web app with large sendBufferSize and a concurrency queue and async gaps (full code)

Request 1 took half a second to complete, request 2 took less than a second, request 3 took 1.4 seconds, request 4 took 1.8 seconds, and request 5 took 2.2 seconds. Early requests no longer “wait” for later requests to be flushed together!

Example app listening on Unix domain socket /tmp/sock!
[18] Serializing response for request 1…
[358] Sending 55MB response for request 1…
[497] — Handler done for request 1 -
[505] — Took 147ms to flush response for request 1 —
[513] Serializing response for request 2…
[817] Sending 55MB response for request 2…
[928] — Handler done for request 2 -
[937] — Took 120ms to flush response for request 2 —
[943] Serializing response for request 3…
[1252] Sending 55MB response for request 3…
[1369] — Handler done for request 3 -
[1377] — Took 125ms to flush response for request 3 —
[1384] Serializing response for request 4…
[1690] Sending 55MB response for request 4…
[1800] — Handler done for request 4 -
[1807] — Took 117ms to flush response for request 4 —
[1814] Serializing response for request 5…
[2120] Sending 55MB response for request 5…
[2235] — Handler done for request 5 -
[2244] — Took 124ms to flush response for request 5 —

Analyze with Node.js async hooks

Let’s use async hooks to look at the details. (full code)

By deliberately yielding with setTimeout, we were able to give each request equal or more than 7 ms of idle time to flush its response. As a result, every request finishes before the next request’s most expensive operation: JSON.stringify.

Between request 1 resSendEnd and request 1 resFinish
— total time: 10066 us
— total cpu time: 798 us
total idle time: 9268 us
— number of sync operations: 16
Between request 2 resSendEnd and request 2 resFinish
— total time: 8947 us
— total cpu time: 1024 us
total idle time: 7923 us
— number of sync operations: 17
Between request 3 resSendEnd and request 3 resFinish
— total time: 8109 us
— total cpu time: 544 us
total idle time: 7565 us
— number of sync operations: 17
Between request 4 resSendEnd and request 4 resFinish
— total time: 7379 us
— total cpu time: 465 us
total idle time: 6914 us
— number of sync operations: 18
Between request 5 resSendEnd and request 5 resFinish
— total time: 8762 us
— total cpu time: 84 us
total idle time: 8678 us
— number of sync operations: 3

Summary

What we see here is that the solution is a combination of increased sendBufferSize and request concurrency control that deliberately overlaps response flushing with sufficient async gaps. In the code example above, we use 14 loop iterations to add async gaps. If there were fewer loops, we wouldn’t have been able to add enough idle time to flush responses fast enough. As an alternative, we could add more async gaps and leave sendBufferSize untouched (8 KB), and still be able to flush responses in a timely manner.

Node.js web app with a strict concurrency queue

Another solution is to remove the overlapping of response flushing and request processing completely. This is the idea mentioned in the previous blog post: “what if we employed…better priority/concurrency management of request processing vs. response flushing?” In this approach, we call async.queue’s callback() after responses have finished flushing in res.on(“finish”). This also means we needn’t deliberately introduce async gaps, and we don’t need to increase sendBufferSize.

See simplified code example below.

Node.js web app with a strict concurrency queue (full code)

We got the same good results:

Example app listening on Unix domain socket /tmp/sock!
[0] Serializing response for request 1…
[330] Sending 55MB response for request 1…
[476] — Handler done for request 1 -
[578] — Took 248ms to flush response for request 1 —
[578] Serializing response for request 2…
[886] Sending 55MB response for request 2…
[1020] — Handler done for request 2 -
[1113] — Took 227ms to flush response for request 2 —
[1113] Serializing response for request 3…
[1427] Sending 55MB response for request 3…
[1540] — Handler done for request 3 -
[1632] — Took 206ms to flush response for request 3 —
[1632] Serializing response for request 4…
[1933] Sending 55MB response for request 4…
[2072] — Handler done for request 4 -
[2173] — Took 240ms to flush response for request 4 —
[2173] Serializing response for request 5…
[2475] Sending 55MB response for request 5…
[2587] — Handler done for request 5 -
[2693] — Took 219ms to flush response for request 5 —

Conclusion

In this blog post, we’ve shown you how the Node.js event loop interacts with network I/O through code examples and analysis using Node.js Async Hooks. These code examples are simplified versions of our production code, and the performance problems discussed here are exactly what we encountered in the real world (see the previous blog post)!

Special thanks to Adar Lieber-Dembo and Kah Seng Tay for reviewing this blog post!

If you found this blog post interesting, come join us to work on interesting and challenging problems! We are hiring!

--

--