What you should know to really understand the Node.js Event Loop

Daniel Khan
Jul 20, 2017 · 8 min read

Node.js is an event-based platform. This means that everything that happens in Node is the reaction to an event. A transaction passing through Node traverses a cascade of callbacks.
Abstracted away from the developer, this is all handled by a library called libuv which provides a mechanism called an event loop.

This event loop is maybe the most misunderstood concept of the platform.

I work for Dynatrace, a performance monitoring vendor and when we approached the topic of event loop monitoring, we put a lot of effort into properly understanding what we are actually measuring.

In this article I will cover our learnings about how the event loop really works and how to monitor it properly.

Common misconceptions

Image for post
Image for post

Let me cover the (my) most popular misunderstandings.

Misconception 1: The event loop runs in a separate thread than the user code

Misconception

Reality

Misconception 2: Everything that’s asynchronous is handled by a thread pool

Misconception

Reality

Misconception 3: The event loop is something like a stack or queue

Misconception

Reality

Understanding the phases of an event loop cycle

Image for post
Image for post
Ticks and Phases of the Node.js Event Loop

Let’s discuss those phases. An in-depth explanation can be found on the Node.js website.

Timers

IO Callbacks

IO Polling

Set Immediate

Close

Monitoring the Event Loop

Tick Frequency

Tick Duration

As our agent runs as a native module it was relatively easy for us to add probes to provide us this information.

Tick frequency and tick duration metrics in action.

In the following scenario I am calling an express.js application that does an outbound call to another http server.

There are four scenarios:

  1. Idle
    There are no incoming requests
  2. ab -c 5
    Using apache bench I created 5 concurrent requests at a time
  3. ab -c 10
    10 concurrent at a time
  4. ab -c 10 (slow backend)
    The http server that is called returns data after 1s to simulate a slow backend. This should cause something called back pressure as requests waiting for the backend to return pile up inside Node.
Image for post
Image for post

If we look at the resulting chart we can make an interesting observation:

Event loop duration and frequency are dynamically adapted

If the application is idle, which means that there are no pendings tasks (Timers, callbacks, etc), it would not make sense to run through the phases with full speed, so the event loop will adapt to that and block for a while in the polling phase to wait for new external events coming in.

This also means, that the metrics under no load are similar (low frequency, high duration) to an application that talks to a slow backend under high load.

We also see that this demo application runs ‘best’ in the scenario with 5 simultaneous requests.

Consequently tick frequency and tick duration need to be baselined factoring in the current requests per second.

While this data already provides us with some valuable insights, we still don’t know in which phase the time is spent and so we researched further and came up with two more metrics.

Work processed latency

High work processed latency indicates a busy/exhausted threadpool.

To test this metric I created an express route that processes an image using a module called Sharp. As image processing is expensive, Sharp utilizes the thread pool to accomplish that.

Image for post
Image for post

Running Apache bench with 5 concurrent connections against this a route with this image processing function reflects directly on this chart and can be clearly distinguished from a scenario of moderate load without the image processing in place.

Event Loop Latency

A high event loop latency indicates an event loop busy with processing callbacks.

To test this metric, I created an express route that calculates fibonacci using a very inefficient algorithm.

Image for post
Image for post

Running Apache bench with 5 concurrent connections against this a route with the fibonacci function shows that now the callback queue is busy.

We clearly see that those four metrics can provide us with valuable insights and help to understand the inner workings of Node.js better.

Still all of that needs to be seen in a bigger picture to make sense of it. Therefore we are currently collecting information to factor in these data into our anomaly detection.

Tuning the Event Loop

Image for post
Image for post

Utilize all CPUs

Tune the Thread Pool

Offload the work to Services

Summary

  • The event loop is what keeps a Node.js application running
  • Its functionality is often misunderstood — it is a set of phases that are traversed continuously with specific tasks for each phase
  • There are no out-of-the-box metrics provided by the event loop so the metrics collected are different between APM vendors
  • The metrics clearly provide valuable insights about bottlenecks but deep understanding of the event loop and also the code that is running is key
  • In the future Dynatrace will add event loop telemetry to its root cause detection to correlate event loop anomalies with problems

For me there is no doubt that we just built the most comprehensive event loop monitoring solution on the market today, and I’m really happy that this amazing new feature will be rolled out to all of our customers within the next few weeks.

Credits

I hope this post did shed some light on the topic. Please follow me on twitter @dkhan. I’m happy to answer all your questions there or on the comment section below.

If you still want to learn more about the inner workings of the event loop and how to leverage them as a developer, I recommend this post by my friends at RisingStack.

If you want to give our Node.js monitoring a try, Download our free trial feel free to share your feedback with me anytime — this is how we learn.

Node.js Collection

Community-curated content for the millions of Node.js

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store