Software Architecture Concurrency


This article originates from a personal research on software architecture concurrency. During the study phase of a small integration project using unix socket files, I was looking for guidelines on how to not stop running the application while waiting for reading data from the file (non-blocking I/O) and I I realize that when we talk of concepts relating to concurrency, we often navigate in a mist of misguided sources that leave much to the free interpretation.

After collecting different materials and ideas, I have preferred to group them together and create this article on concurrency and how it impacts the scalability and architecture of our applications, especially if you need to handle millions of requests or contemporary processes.

N.B. The code, and much of the next content, are taken from the excellent Sherif Ramadan’s article.

This article will undergo changes in the coming months in order to explain competition through other approaches.


Having to choose a language to explain the concepts that we will address, I chose PHP, first because I’m passionate about it and then because it has strong features such as learning curve, community, material availability, documentation quality, and “time trial”.

The following concepts are easily transferable and references to other programming languages are frequent, but it is important to understand what language is no longer a limit nowadays from an architectural point of view.

In fact, we can say that any Turing-Complete language can approximate any runtime function of another language.

PHP multithreading

The HTML was not conceived with the necessary constructs of programming such as, for example, loops and branching if-then (which allow code sections to be skipped under certain circumstances). It is in fact Turing-incomplete and initially was the Web Server that was responsible for the logic needed to show the content based on what the user required. PHP stems from the need to add more advanced functional and logical constructs without having to produce internal libraries every time on the Web Server itself.
For this reason, it does not originate with a threading library or asynchronous multi-tasking library, as it was normally used to address state-less nature of HTTP, a request at a time.

The advent of WebSockets led to a revolution that made it possible to create state-full services, and Node.js became homonym of this architecture. Shortly afterwards, the various architectures and their back-end applications were challenged, demonstrating how Go and Erlang (languages created for concurrency programming) could really thrive in this area.

But the developers already had their own libraries written in other languages such as PHP and Ruby, and few were actually willing to abandon everything they had been working on and switch to something new just for fashion. The discussions arising from these novelties, however, have undergone an analysis of the structural compromises to which they expose us, so as to understand the impact on the software architecture.

Software Architecture

First, we should analyze what software architecture is and why it plays such an important role.

The main goal of architecture is Do everything to prevent failure.

Where “Do everything” means that you have to do compromise, sacrificing something to get another. For example, it might be preferable to use a small amount of extra memory to get a slightly faster calculation.

Typical case of this kind of choice is represented by the array: in PHP are ordered hashmap. In fact, thanks to their hashing properties, allow access to their own values with greater speed, even if they are required more data to be stored. Since most PHP requests live only for a few milliseconds, this choice is considered acceptable: It is preferable to use more memory to get faster execution of requests, facilitating the use of a PHP array as dictionary or list.

Another example of a compromise that makes PHP is evident in its concurrency model.

PHP is able to serve hundreds of simultaneous requests because it is embeddable and extensible for design.

This is not commonly referred to as a “quality attribute” and is often referred to as a non-functional requirement, but since PHP can be directly embedded in another program, such as the Apache httpd server or PHP Fork Process Manager (PHP-FPM), You do not have to worry about the concurrency within the application code. The PHP runtime itself can be duplicated in a multi-threaded context (usually used only on Windows where no forking is possible) or a multi-process context (the typical configuration of linux). This is possible through the server API (SAPI), which allows you to allocate and disallow the memory of each request independently.

original image at

The implementation of what is described in the Apache httpd web server is quite easy with the use of SAPIs in the mod_php module that uses a pre-fork Multi Process Manager (or MPM). Each child process (httpd worker) has a built-in copy of the PHP runtime and so the web server can handle a process request simultaneously (in case above 3).

What’s probably surprising is that PHP has focused on this architectural style for the most part in the last 20 years. So we can only wonder why? I mean, the fact that it still works effectively, it has to tell us about the choices PHP has made from the beginning.

“Everything fails, always”
Werner Vogels, CTO and VP

Since PHP does not share anything between requests, it does not matter much if it fails to do something within one of them. So for example, we say there is a weird mistake in PHP that crashes once every 100,000 requests. Well, on a very load server that has hundreds of requests per second, it means a critical error every 15 minutes or so. Since a PHP request lives in a single process in the architecture above, the error is quite acceptable. This is because if a process freezes only one request suffers.

Let’s note how this architecture is the same in use in most PHP applications deployed and used on the web every day. If we use mod_php or PHP-FPM, we are relying on this same “shared-nothing Architecture”.

If we consider the alternative, where PHP lives in a single process and uses threads to execute requests, a crash that occurs in a single thread causes all the other requests served by that same process to drop. So you end up losing hundreds of requests every 15 minutes instead of just one. This would be a detail of poor architectural design for PHP, and in fact we does not see many PHP servers in Windows (which, without having the ability to make forks, creates threads). It does not make sense.


If we want to talk about concurrency, we must try to understand exactly what it means and why it plays such an important role from an architectural point of view. Concurrency has two forms: There is hardware concurrency, where two things occur simultaneously at the physical level and then there is software concurrency, where we will have the feeling that two things are happening simultaneously in our software but (as regards Hardware) are happening sequentially (one after the other).
People often confuse with asynchronous, concurrency, parallelism, or threading terms. These do not mean exactly the same thing, although we can correlate them. To solve any confusion, it usually helps to think about these things in terms of processing and execution.

  • The Process is the object or container that defines the memory, instructions, and context of a software program.
  • The Thread is the executing unit that occurs in relation to a process.

Starting from these definitions we may mistakenly deduce that most processes are single-thread, since each one corresponds to a running thread. However, a process can define multiple run threads and then become multi-threaded.

We are therefore on the path of parallelism, because threading creates an additional context within a process and can be helped by hardware concurrency. All this explains how a process can create concurrency based on how many running threads it has inside.


When we think of synchronous and asynchronous, however, we refer to the way we see execution from a “context of a process”. The context of a process is the running thread itself.

Both for a single-thread process or multi-thread we can see the execution context as synchronous or asynchronous.

We may also think that the internal execution of each thread can be synchronous or asynchronous.

origina image at
The word “synchronous execution” or “blocking” (blocking) means a routine or set of instructions that are executed in a particular thread and can block other routines.

Since a thread is generally run on the CPU from a single physical or logical core, it is subject to hardware sequential execution. However, this does not mean that we can not write code in order to create additional threads within the main one, allowing you to go through the various execution threads within the same process thread. We then tackle the topic of blocking thread from the CPU loop point of view (and later we will analyze the I / O concept).

Let’s look at how synchronous code-level execution is. Let’s start from the concurrency with a function that checks if a number is first:

We can say with great simplicity that this loop requires the synchronous execution of each step. What you do is take the two numbers in $numbers and try to figure out if each of these is a prime number.

It is easy to see that this test (indicated by the isPrime() function) does not have an efficient solution. In fact each time we call the isPrime() function within our cycle, we also need to execute a synchronous instruction n number to verify that it is first, by checking each time that the number in question can not be divisible by element n. As we can see if our function isPrime() requires many steps, it will block any other call to isPrime() in our previous loop.

Since number 7 is really a prime number, you will need a total of 5 steps before returning the result. So it is easy to understand how each step in the preceding foreach cycle can be considered a synchronous task that involves every step in the for loop in our function isPrime().
So how can we write this code asynchronously? We use PHP asynchronous coroutins through generators, which can be summarized as functions that can be interrupted by maintaining their state and resuming, optionally with a different return value.

Be careful, we still need the same number of steps to complete the verification but as they are executed asynchronously, the only real difference is that we can get the result for number 4 (not prime).
In the synchronous version, we have to wait for all 5 steps to complete the number 7 test, before starting to check the number 4.

In fact, it is faster because those who wait for the result isPrime(4) are independent of those who are expecting the result of isPrime(7), otherwise, there is really no advantage! Because in terms of time actually spent in both tasks, they take the same amount of time, no matter if you are viewing the execution steps as synchronous or asynchronous.

We can therefore deduce that:

In addition to the asynchronous algorithm, asynchronous management of the result is important!

Hardware Concurrency

Let’s imagine that these two tasks are two independent requests that are made by two different people to the PHP server. If PHP had a single thread and a single process and both of these requests were met simultaneously, in our first synchronous implementation, the person waiting for the result for isPrime (4) might have to wait much longer for the person who sent the request By isPrime (7). Actually, the PHP concurrency model is based on the actors. Each request is an actor who passes a message to PHP through SAPI.

orignal image at

However, every instance of the running PHP interpreter is considered a separate process or program that can have its own executable thread tied to a core of the CPU.

In a multi-core system this causes the request A to get an answer within two hardware clock cycles while the B request within five cycles. This is real concurrency.

In our previous asynchronous model, within a single thread we had to wait four complete hardware clock cycles to finish the first activity. While here we have real-time concurrency and we have to wait only two. Of course this does not mean that every single running process on the system necessarily obtains true hardware concurrency.

There are systems where hardware concurrency is not always possible, for example in a single-core or embedded system or wherever there are multiple processes running at the same time as the available core. There is also the logic core concept as in hyper-threading Intel x86 technology, where a single physical core can produce the same kind of asynchronous execution we have demonstrated in the above PHP execution with multiple execution threads (defined as micro- architecture).

Concurrency Error handling

Threading is normally aided by hardware concurrency to speed up the execution unit consisting of the thread of the process. All this is possible thanks to the coordination between the operating system and the micro-architecture through something called task scheduler. This is what makes multi-tasking both physically at hardware level and virtually at software level when hardware capabilities are satured. Because CPUs are very fast, you never notice the difference between hardware and software concurrency, but you notice it very well when it fails.

original image at

Let’s return to our synchronous Vs asynchronous implementation of the calculation of the prime numbers. What happens in the synchronous version of this implementation if an exception is generated from the inside of the cycle in the isPrime() function? At best, if we intercept the exception from inside the foreach loop and manage to handle it, we will have to start over again. It’s not a big deal when we’re testing number 7 and we happen to fail at point 3 or 4 but it’s a significant problem if fail at 789340 during isPrime(789343). We should retrace tens of thousands of new steps only To handle that case of bankruptcy. A madness!

Through asynchronous deployment, it is possible to resume from an error at a particular step without having to restart the entire task from scratch or block the execution of other activities within our foreach loop. Since generators are run in two ways in PHP, we can send the information to the generator at a certain step to resume from the point of error (assuming recovery is possible) and this result is possible thanks to our architectural choice. This analysis shows that an asynchronous code is not the maximum for debugging and its management is far more complicated than a typical synchronous code, so it is preferable to use similar constructs only where it is actually needed.

Blocking I/O

Taking a look at what is shown at the time and taking Node.js as an example for defining “non-blocking I/O”, the main difference between this and PHP is that in Node the code runs in a persistent process context Which exists until the Node.js server is running. So, for example, if you write a value in a global variable in a request and read it in another request, you will get the same value.

On the opposite, PHP create a new context for each request and so, global variables written in a request are lost when the script that handles the request ends. Node is then based on a single-process, single-threaded and software-based concurrency model that exposes the entire code to all the errors in this same model. True, Node has a thread pool to not block I/O, but a single thread error is enough to send down the entire server and every competing request it was dealing with.

Node allows us to write routines that retrieve data from external sources without worrying about the latency of receiving them (network times or reading / writing on our driver) because it will execute the code when the data is available. In PHP there are deployments of the event loop, the most famous is React but we can name dozens such as icilie, amp, kraken, etc … (there are also some suggestions for integrating some of these methodologies into future versions of PHP) but, But … Using external libraries (at least for now) implies that your standard code does not work and you will need to implement the specific arguments for the library you have selected.

SAFETY NOTE: Sharing memory using a multi-threaded, rather than multi-process architecture means giving anyone access to the thread to other process variables. So if we handle most requests in the same process, we would share variable space and this potentially means exposing our application to someone else’s ability to read data from other requests, and this could be very serious by the amount of sensitive data we handle ( Username and password, session data, security keys, and credit card information for e-commerce sites).

PHP does not really put these issues with its model of computational concurrency based on the actors. In PHP, every process can make blocking calls but it will only affect the request that has been made to that process. In this way we can expand this same model by extending it with a code engine by not blocking any other process in our architecture.

We can therefore perform multiple background processes that can exchange messages, facilitating a multi-process and multi-actor model of competition, leveraging both hardware-to-competition and asynchronous application-level competition. In the example for data retrieval from an external resource, we could implement a template that several PHP processes retrieve resources and, when received, sends a message to the queue. Listening processes on it can then consume data by performing a single task. As our application grows, horizontal scalability will be easy and debugging will not be a hell.