Building High Performance Applications — Lessons Learnt, Part 3

Published in

Webmagic

6 min readJun 14, 2018

Welcome to the Part 3 in the series. In this Part, I will suggest you to adopt a staged approach to event driven concurrency and introduce you to an architecture that will allow you to build high performance services that support high concurrency demands. If you are trying to build a service oriented architecture or even micro services architecture to build a highly scalable application than this architecture will definitely solve problems for you that come along with thread based and event based concurrency models. To name a few, high resource usage, high context switch overhead, lock contention among others.

In the Part 1 of this series, I made some suggestions that I learnt during my journey in building high performance applications. May be now is a right time to read it again as it will help you understand some of the factors that affect service performance. Part 2 was mostly history and evolution of software architectures. In this part, I will strongly emphasise you to adopt a staged approach for building highly demanding internet applications. This approach was originally proposed by Matt Welsh, Principal Engineer at Google and I came across it while struggling to solve the problems I was facing mentioned above. The results I obtained was beyond imaginable and I really like what this approach does by bringing the best of both worlds together i.e threads and events. For the curious ones, I implemented this in C++.

The Problem With Threads and Events

Let me start by making a statement that “Threads are fundamentally the wrong interface”. I know some of you may object to this, but here’s my argument. Threads were never designed for high concurrency, rather they were designed for time sharing of resources. A common problem faced with multi threaded architectures is the relatively high context switch overhead when compared with event based architectures. This overhead is primarily due to preemption, which requires saving registers and other states during context switches. So naturally, when you tend to increase the number of threads beyond a certain point, the context switch overhead becomes larger resulting in throughput meltdown. The solution is to avoid the overuse of threads, but than this will result in limiting the number of clients you are serving. And this is definitely not our design goal. Hence the statement!!

With event driven approach to achieving high concurrency, a service is constructed with a limited number of threads, typically equal to the number of CPUs, that processes events of different types from a queue. The task at hand is represented as a finite state machine and transition from one state to another in the FSM is triggered by events. There may exist variations of different event driven patterns, but this is the overall idea. Event driven systems tend to perform better with respect to high load. As the number of tasks increases, the server throughput increases until the CPU usage goes full. Beyond this point, the server’s event queue retains the events hence the throughput remains constant for a large range for number of events. Although the threads in an event driven architecture emphasise the use of Nonblocking I/O, however they can still block due to various reasons, interrupts, page faults, shared locks. Thus raising various additional challenges for the application developers to handle, including the scheduling and ordering of events to process. Also, adding new functionality becomes difficult due to tight coupling of event handling threads.

The Staged Approach Towards Event Driven Concurrency

The staged approach emphasises that the application may be viewed as a series of discrete stages where each stage has it’s own event queue. Each stage only does a subset of processing before passing the request to the event queue of the next stage in the application.

Each stage comprises of following key elements -

An Event Queue — holds events to be processed
A Thread Pool — A set of worker threads for the stage
A Thread Pool Controller — For dynamic thread pool sizing for optimizing resources.
Nonblocking I/O — SEDA emphasises the use on Nonblocking I/O

This approach allows you to write services and help you achieve following design goals:

Supports Massive Concurrency — Avoids, high context switch overhead with limited threads. Discrete stages ensure that there is no shared data between the stages and hence minimizing lock contention.
Modularity of Application — Discrete stages ensure that application is modular and extending application logic is independent of other stages.
Enable Load Conditioning — Allows you to dynamically examine the queue sizes in each stage and adjust the application logic accordingly. i.e you may accept, reject or provide a degraded service under high load conditions.
Dynamic Thread Pool Sizing — A dynamic resource controller can be added to adjust the thread pool size of a stage to reduce or increase the number of threads as load decreases or increases.

Different event queue for each stage allows you to alter the application behaviour dynamically depending upon the load. This is referred to as load conditioning. This is achieved by implementing dynamic thread pool sizing by adding a resource controller. It keeps check on the incoming event queue size and may increase or decrease the thread pool size as and when appropriate.

SEDA makes an assumption that nonblocking I/O is applied and none of the threads block on a queue. For this, nonblocking I/O primitives are used. On Linux, you must have used select, poll or epoll for this. A nonblocking wait on the event queue is very critical to the effective SEDA design. Let’s take a look at pseudo code for achieving the same:

Example Implementation of A Nonblocking Wait on a Queue

/* A pipe is attached to the event queue for notifications */
int mPipeFd[2];

/* select fd set */
fd_set mReadFDs;

/* prepare the pipe for read notifications when something is 
   pushed in the event queue */
function void initialize() {
    pipe(mPipeFd);

    //watch the read end of the pipe.
    WatchPipeReadEnd(mPipeFd[0]);
}

function void WatchPipeReadEnd(fd) {
    //make non blocking
    int flags = fcntl(fd, F_GETFL, 0);
    flags |= O_NONBLOCK;
    int ret = fcntl(fd, F_SETFL, flags);

    FD_SET(fd, mReadFDs);
}/* main thread run loop */
function void run() {
   
    while (1) {

        fd_set rfds;
        FD_ZERO(&rfds);

        rfds = mReadFDs;        /* Wait up to three seconds. */
        struct timeval tv;
        tv.tv_sec = 3;
        tv.tv_usec = 0;        int ret = select(1, &rfds, null, null, &tv);        if(ret == -1) {
            //error in select
            handle();
        }

        else if(ret) {
            //We got something in the queue
            Event e = mQueue.pop();

            /* Handle the Event */
            HandleEvent(e);
        }

        else {
            /* Nothing in queue in the last 3 seconds */
            /* This is idle time. You can do some additionl work */
        }
    }  
}

/* push an event in the queue */
function void push(event e) {
    mQueue.push(e);

    //notify that there is something in the queue
    notify();
}

/* notify main thread */
function void notify() {
    /* We will notify the main thread by writing to the socket 
       and wake up the select */
    if(mPipeFd[1] > 0) {
        const char* c = "x";
        ::write(mPipeFd[1], c, 1);
    }
}

function void HandleEvent(Event e) {
    /* Event Handler Function */
}

Note: The above pseudo code, gives a starting point to implement nonblocking wait on event queues. I have skipped error handling code.

Additionally, a batching controller can be used to control the number of events consumed by each thread. But choosing a batching factor would slightly tricky as larger batching factor will result in high throughput and a small batching factor will result in lower response time of event processing. The ideal batching factor will be a smallest number with a constant throughput. I know this can be tricky, but once you implement and define your stages well this will start to seem obvious by some trial and error.

The entire implementation is very difficult to be given in a post. But, I have some news to share.

Open Source Implementation

To conclude this post, I would like to announce that, I will be starting an open source project for a complete staged implementation of an event driven service on github. This will allow developers to quickly implement SOA and Microservices based architectures for high throughput and scalability.

Please write your email address in the comments section if you would like to be notified about the project by email. You can also write to me if you would like to contribute in the project or share your views with me. I can be reached at vikas@lightwsd.com

Building High Performance Applications — Lessons Learnt, Part 3

The Problem With Threads and Events

The Staged Approach Towards Event Driven Concurrency

Open Source Implementation

Written by Vikas Sood