Concurrency models for Cloud applications

shashank Jain
7 min readAug 5, 2018

--

With the advent of cloud computing and the elasticity introduced, we see the demand of more scalable/concurrent applications growing. Majority of the times we find those applications are i/o intensive applications, meaning that they need to capture network packets and with some business logic applied transfer them to DB.

So if we tend to classify, the workload characteristic is majorly network oriented and involves network sockets. So then the question comes to mind as to what is the right concurrency model for such applications and why.

The concepts we discuss today are agnostic of the cloud platforms you run them on (CloudFoundy,K8s,Mesos etc)

We see atleast 3 models of concurrency prevalent

1. Thread per connection — This is the model mostly used by Java, where each connection is modelled as a Thread on the operating system. With 1:1 mapping this hits the scaling limits soon as number of connections are then bound to the OS resources like memory (thread resources like stack etc).

2. Event loop — This model advocated by Nodejs where one Thread runs in an event loop and listens on events on the different sockets (each socket being a File descriptor). Events vary from connection events to data events on specific sockets. Nodejs uses libuv behind the scene which is based on the epoll mechanism in Linux Kernel

3. Userspace Threads — This model is what golang uses and has threads in userspace called goroutines. Then they have a scheduler in userspace which maps m such userspace threads to n OS threads(processes).

So if we see above the approaches 2 and 3 of handling concurrency are better placed as they allow less threads to do more work by multiplexing a huge number of connections over less number of threads.

So the question is that how do they achieve this? This is primarily the intent of this blog, as to cover the under the hoods of such a concurrency model and try to understand what makes an event loop scalable.
To understand that one has to understand the underlying mechanism of eventloop and which is basically the epoll mechanism in linux. One also has to understand the nuances of what is blocking and what is non blocking i/o.

So lets start with blocking and non-blocking I/O. Everything in Linux can be treated as a file. So a socket, IPC Primitives like Pipes,Queues, Devices can all be treated as files. Whenever a file is opened, the kernel returns back a handle to userspace in form of a file descriptor. So a process having a socket means it has a file descriptor handle, over which it interacts with the socket.

The file descriptor can be opened in blocking and non blocking mode. The difference is that in case of blocking if there is no data either to be received or if the buffers are full and process cannot put data for transmission, the process is put to sleep by kernel. In case of non-blocking the behavior is different and in case of no data or buffer full, the system call of read/write returns with EWOULDBLOCK and EAGAIN error. This means there is no blocking and the process can retry the read/writes.

So in non blocking mode one way for a process is to keep checking if it can read/write , but as the number of fds(say number of open sockets) increase , this easily becomes a non scalable solution.

Luckily Linux provides some system calls like select and poll, with which we can register what fds we want to check on for data readiness. These select and poll themselves block till the time they see if FDS are ready. The problem with them is that everytime we need to check what fds are ready, we need to pass the list of fds to these calls, and they iterate over the list and check internally for readiness. This also leads to scaling issues. To mitigate this, Linux came up with epoll, which is an object created in Kernel and which internally checks the readiness statuses of the fds which process asks it to keep an eye on.

Epoll provides a few system calls

1. Epoll_create — It creates an epoll object and returns a FD back to the calling process. This FD itself is a handle to the epoll object.

2. Epoll_ctl — This system call provides the process a mechanism to register interest for FDs. These are the file descriptors which the process is asking the epoll to monitor. Whenever epoll sees that there is data on those FDs, it moves those fds to the interest list. Actually there is a slight technicality involved here. Though we say interest list is of the fds, but its not entirely true as the interest list is of the file descriptions and not the file descriptors. A file description is a kernel structure and can be shared across file descriptors in different process. But for simplicity sake in this blog we will consider file descriptors as part of interest list.

3. Epoll_wait — The process makes this system call to check if there are fds ready to either send and receive data on them. The kernel returns the ready fds when this system call is made. If no ready fd is available then this call blocks. So the process can keep checking the status of fds via epoll_wait and only the ready fds are returned else the process is blocked. This is different then the select/poll mechanism where the mechanism was to go over complete fd list everytime whenever the system call was made.

There is one more nuance with epoll in terms of delivery of the events. They are

1. Level Triggered — In case of level triggered notification, say for example the process makes a system call to the epoll_wait and there is an fd which is ready and has say 500 bytes of data. If the process has buffers to read only 200 bytes, there will still be 300 bytes remaining in network buffers to be consumed. In case of level triggered in next epoll_wait call the process is informed that there is more data on the specific fd to be consumed and the process can then drain that data.

2. Edge Triggered — Unlike level triggered, in case of Edge triggered mechanism, the epoll_wait say gets the read fd on first epoll_wait. If it drains partial data, in next epoll_wait the process will not be informed of the fd again and so will block, till the time there is more data on same fd. Else the process will just be blocked. This can be mitigated by putting the fds in non blocking mode which means that now the process drains till it encounters the EWOULDBLOCK and EAGAIN error. If you observe carefully, same is not true in Level based mechanism. So in case of Level triggered, the FDs can be either in blocking or non blocking mode, where as in case of edge triggered they have to be in non-blocking mode.

So as we read above, to write scaling apps, one has to probably understand the guts of the system, to make a judicious call as to whether for a specific workload type, it makes sense to choose something like an event loop and within that also whether the software we use is using select, poll or epoll like mechanisms. The goal of the blog was to understand conceptually as to what are the pros and cons of each of the above approaches and how does one choose the right scaling/concurrency model for their apps. Since majorly we see the workloads are about network i/o it makes lot of sense to evaluate the event based concurrency models. This also allows to use the resources better, rather then directly looking for horizontal scale without even using the existing resources on single node better.

Another aspect which is worth a mention is about i/o multiplexing. What we just read about in terms of epoll, it acts as an i/o multiplexer. Now for i/o multiplexing there are two variations to it

  1. Reactor — This type of multiplexing is what we have discussed so far. The multiplexer (epoll in the case above) allows registration of events and whenever it sees data readiness on the specific file descriptors, it wakes up the process blocked on the epoll_wait. The handlers then take care of doing the actual i/o.
  2. Proactor — This type of multiplexing is what we term as the asyn i/o. The difference here being, that instead of informing about readiness of fd, the multiplexer itself does the i/o and fills the process buffers. It informs the handlers of i/o completion where as in case of reactor based multiplexing, it was informing about i/o readiness.

So i/o can be defined as blocking, non blocking from the context of the file descriptors. It can be classified as synchronous and async based on responsibilities. So one can say that a synchronous non blocking i/o would be classified under reactor pattern where in the handler will handle the actual reads and writes till it gets EWOULDBLOCK or EGAIN and again depending on edge and level triggers. In case of async non blocking, the i/o is done by OS and the i/o completion is what the process is notified about.

What the blog also explains is that the term blocking is majorly used in context of how a file descriptor is opened. So a blocking file descriptor with multiplexing mechanisms like epoll and in level triggered mode scales equally as a non blocking one in same mode. But for Edge triggered notification delivery, with the epoll like multiplexer, the fds have to be in non blocking mode for scale. So understanding this nuance is important when one designs applications for cloud.

In next blog would like to touch base on some of the possible glitches of epoll and primarily in context of using file descriptions, rather then file descriptors as part of the active and ready list. Also would like to see if we can write a small event loop program using epoll.

Updated with example of epoll usage

Disclaimer : The views expressed above are personal and not of the company I work for.

--

--