How does non-blocking IO work under the hood?

Hielke de Vries
Nov 20, 2019 · 8 min read

At ING, we host several applications that have to handle 10000s of requests per second. For instance, we run a cluster of Nginx web servers and reverse proxies that serve all our customer-facing applications. With millions of customers using our applications every day, peak loads come up to 20K requests per second. Some of our internal applications have to handle even higher loads. An ING squad in Friesland has built the application TracING, that uses Vert.x to handle more than 50K requests per second during peaks.

Image for post
Image for post
how do we handle lots of traffic?

To deal with these amounts of concurrent connections, thread-per-connection models do not suffice. Nginx and Vert.x instead use non-blocking models to achieve such high levels of concurrency. As someone who likes to figure out how things work under the hood, this made me want to understand the following:

How does the caller of a non-blocking/asynchronous API gets notified when data is ready? How does this work under the hood, at low level?

After calling a non-blocking API, at some point later in time the caller must be notified that the data is ready. But how does the caller know when the data is ready? Does it get signalled? Is there some mechanism that constantly checks if the data is ready? When you think about the fact that a CPU can only execute instructions sequentially, how could it be possible that software or hardware can “listen” for an event or notification when the data is ready?

The only “listener” that is natively available in most computers is the hardware interrupt processor. But hardware interrupts do not scale well and lack flexibility. And what happens when there are 100k events per second, will there be 100k hardware interrupts per second? This also seems unlikely.

My first guess was that there should be some kind of constant checking, like an infinite loop, that polls to see if data is ready. But that also seems inefficient at first sight. An infinite loop must be taking quite some CPU time, right? First let’s understand some basics about IO and blocking.

Blocking IO

  • A file stored on a hard drive must be transferred through SATA cables and main board buses to the CPU
  • The data from a network resource located on a server far away must travel through network cables, routers and eventually the network interface card (NIC) in your computer to the CPU

Calling an API that requests data from IO will cause the running thread to “block”, i.e. it is waiting until the requested data has returned to the caller. When a thread is blocked in Linux, it will be put in a Sleep state by the kernel until data has returned to the caller. Threads in sleep state immediately give up its access to the CPU, so to not waste CPU time. After IO is ready, the thread is taken out of the Sleep state and put in Runnable state. Threads in this state are eligible to be executed on the CPU again. The thread scheduler will put the thread on a CPU when one is available. The process of taking threads on and off the CPU is called context switching.

A thread can be in different states. The names of these states are different for different programming languages and operating systems. For instance, in the Linux kernel there are only two states: running and not running. Java threads have six different states.

Why non-blocking IO?

Types of blocking

  • CPU-bound blocking
  • IO-bound blocking

CPU-bound blocking

In this case the thread gets blocked because of some CPU intensive task it performs takes more time than “instantly”. For example when generating a bunch of prime numbers or rendering a 3d model. With CPU-bound blocking the thread is blocked because it’s actively being executed on the processor.

Image for post
Image for post
CPU-bound blocking

IO-bound blocking

Here, the thread gets blocked because it has to wait for data to return from an IO source, such as a network or a hard drive. The kernel will notice that there is no data available from IO and will therefore put the thread in some “sleep” state. Hence, with IO-bound blocking the thread is not¹ actively being executed on the processor.

Image for post
Image for post
IO-bound blocking

Non-blocking IO

When data has returned from IO, the caller will be notified that the data is ready. This is generally done with a callback function that has access to the returned data.

callback

There are other ways to express a non-blocking or asynchronous action with e.g. futures, promises or coroutines. These constructs are only syntactically different. Under the hood they are all based on a routine (function) that is called the moment data has returned from IO.

Network IO and sockets

Image for post
Image for post

Non-blocking IO under the hood

”event”, “FD readiness” and “FD is ready for data” are synonyms in this text

You might think that an event loop can be expensive on the CPU if it’s endlessly running, but there are some optimizations to make them more efficient.

Let’s zoom a bit in on the event loop to see how these optimizations work. Each (major) operating system provides kernel level APIs to help create an event loop. In Linux there is epoll or io_uring, BSD uses kqueue and Windows has IOCP. Each of these APIs is able to check FDs for ready data with a computational complexity of around O(number_of_events_occurred). In other words, you can monitor 100.000s of FDs, but the API’s execution speed only depends on the amount of events that occur in the current iteration of the event loop. Compared to the older poll and select APIs this is a huge improvement:

operations    |  poll  |  select   | epoll
10 | 0.61 | 0.73 | 0.41
100 | 2.9 | 3.0 | 0.42
1000 | 35 | 35 | 0.53
10000 | 990 | 930 | 0.66

From: The Linux programming interface (Michael Kerrisk), table 63–9

Another important “optimization” is that a call tokqueue(..) or epoll(..) blocks if there is no ready data in the FDs of the interest list. This means that there are no unnecessary iterations in the event loop when there is no ready FD data.

Further optimization of the event loop can be done with what epoll calls “edge triggered mode” and kqueue calls EV_CLEAR. When there is new data available in at least one of the FDs in the interest list, and when this data is not (completely) read/emptied from the FDs, then in the next iteration of the event loop this FD will not be seen as having new data ready. This will save unnecessary iterations in the case that a FD is not completely read when the next iteration of the event loop is executed.

Here is an outline of how an event loop using kqueue would look like:

event loop outline using kqueue

Here is a complete TCP echo server example using a kqueue event loop

IO_uring

From kernel 5.1 Linux offers io_uring, which can even further optimize an event loop. It does so by minimizing expensive system calls and minimize copying of data.

Other ways to achieve non-blocking IO

Another possibility is using signals, which are used for inter-process communication. A signal could also be used as an “event” to signal ready IO data. Catching signals is expensive and therefore also not preferred over event loops in high event rate applications.

Hardware interrupts and signals do not offer the flexibility and scalability that an event loops offer. Of course, it this all depends on your needs. E.g. in embedded systems, hardware interrupts are often preferred over event loops.

Conclusion

¹ depending on the load and the thread scheduling algorithm.

Thanks to Bart Warmerdam for reviewing this text.

ING Blog

Blogs from ING employees

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store