WSGI Is Not Enough Anymore — Part II

In the first part of this series we discussed the problems and limitations which inheres within WSGI-based Python web applications.

in this part we will discuss what concurrency is and what is an event driven architecture

Web applications need to be able to handle multiple requests simultaneously. A web server which can only process one request at a time is only good for development purposes or for very specialized applications. A good example of such a web server is the development server provided by Django. This server is single threaded, meaning it runs only one the main thread created by the server process. All requests are handled synchronously, one after another:

This method is obviously not useful when the server needs to handle multiple requests simultaneously.

WSGI-based servers solve this problem by creating a new thread for each incoming request, and delegating the job of processing requests asynchronously to the OS. When multiple requests arrive a WSGI server uses a thread pool to create a new thread which will handle the request and return the response. The scope of this thread would be for the single request. Context switching between threads is handled by the OS. The developers have little to no control with regards to context switching between threads, and thus between requests.

The best way to make sure that WSGI servers will be able to handle a large number of requests is to write efficient code and to make sure the architecture is optimized for the shortest request/response cycles. Different applications may have different choking points, or bottlenecks. Those are operations which take up time to execute and also happen frequently. Database access and data serialization are the most obvious examples. Accessing resources outside the scope of the application (such as 3rd party services) are not only slower but may also not be in the developer’s control.

I/O Bound Applications

Web applications do not rely on application servers alone. Instead, they utilize other resources such as data stores, caches and external APIs to deliver meaningful responses to clients’ requests. A request that is processed by the application server, in most cases involve some I/O operations. Such operations can be: accessing the file system, querying the database or sending requests to other services. The time it takes for the application server to perform I/O operations depend on many factors, mostly unrelated to the computational power of the application server itself. When the application server waits on an I/O operation it is essentially blocked until the operation is completed.

These applications are often referred to as I/O bound applications, which means applications which are bottlenecked by I/O operations and not CPU or memory.

It is safe to say that most of what I/O bound applications do is wait. They wait because they perform I/O operations sequentially, in a blocking manner, that is one after the other. There is only a single stream of operations.

The following diagram illustrates this:

Colors in this diagram represent all the operations done for a single request.

Concurrent Programming

In a concurrent program, several streams of operations can execute concurrently, or side by side. We can consider processing a client’s request a stream of operations which the application server executes sequentially. However, streams can communicate and interfere with one another. A single stream of operations may relinquish its running state to another stream which needs to perform its operations.

In other words, while a single request is waiting on an I/O operation, such as a query from the database, it gives up control to another request who may have already finished its I/O operation and wants to move on to its next operation.

The following diagram illustrates

Like the example above, colors in this diagram represent all the operations done for a single request. However, in this diagram, operations from multiple requests become woven together as the running state is passed from one stream of operations to another, or rather from one request to another. This allows for better utilization of the application server’s computational power.

Concurrent programming turns an I/O bound application into a more efficient process of mechanism which can execute multiple streams of operations concurrently.

Concurrency Model and the Event Loop

To counter the essence of synchronous execution of operations, which results in blocking I/O calls, another approach is required. This is often referred to as event driven architecture. This architecture yields a very different type of execution flow.

The event driven architecture is operated on a event loop and an event queue which relies on a single threaded execution model.

Operations are executed in order by the event loop, which processes them in the queue.

The following diagram illustrates this:

In the context of a web application, requests are executed in a single thread, rather than one thread per request. Each request consists of various operations which are executed in order. Regular operations are executed sequentially by the call stack, while concurrent actions are fed into the event loop. Inside the event loop operations are executed in order of arrival and priority.

Using an event driven architecture by employing an event loop is not enough to achieve concurrency. In order to do that the operations which are pushed into the event loop should be implemented in a non-blocking manner. Only then the event loop can achieve context switching by waiting for an I/O operation to complete, and in the mean time switch the execution to another operation within the event loop. Code which is implemented to run in the concurrent model has to explicitly yield back the execution to the event loop while waiting for I/O operations to complete. The execution is then passed to another piece of code who is done waiting for its I/O operation. Web applications that employ an event loop cannot reduce latencies of CPU-bound operations, as they are usually running on a single core (per process). But the latency of I/O-bound operations can be reduced in a significant manner due to the fact that while the I/O operation is executed, all the server does is wait. In addition, because all requests are handled in a single thread, the creation, destruction and switching of threads is not performed.

Why is the event driven architecture suitable for web applications?

  1. Web applications are meant to handle multiple requests simultaneously
  2. Web applications are mostly I/O bound and therefore spend most of their time waiting on I/O

By implementing a non-blocking concurrent code and an event driven architecture, a single instance of a web application can utilize its computational resources in a more efficient way. This results is lower latency and improved scalability. But this is only just the beginning and there are other advantages inherited by this approach.

In the next post we will discuss Python libraries which implement concurrency and provide the mechanism for implementing an event driven architecture. Also, we will discuss in practice how such libraries are used for developing web applications.