Let’s Tackle PHP Swoole Solemnly

Published in

Beyn Technology

11 min readMar 12, 2022

I’d like to talk about Swoole in this article but it won’t be covering the cumbersome processes such as installing, configuring, using, and so on. To be frank, it’s the easiest part. I’d like to cover the philosophy behind it. Why do we need it in the PHP world? Which scenarios are the best to use this kind of external package/extension? Otherwise, it’d be rote obviously. Come with me and let’s dive into the deep together.

https://www.zend.com/blog/creating-basic-php-web-server-swoole

What’s Swoole at a glance?

Swoole is an external extension like the other PHP extensions such as Opcache, Imagick, Redis, etc. Basically, Swoole brings some concepts. For instance, server patterns, native coroutine, coroutine scheduler, multiplexing, and async I/O, coroutine scheduler, and async I/O states management, process management, high-performance in-memory Key-Value storage, etc. Swoole gives developers a hand to create and manage a TCP, HTTP, WebSocket server programmatically with PHP syntax many PHP developers have already known. Unlike PHP-FPM only supports stateless models, with Swoole you can provide cache and manage states within the server to increase the performance. It entirely provides a new running model of PHP applications.

Fibers with PHP 8.1

Fibers are a low-level established to manage control flow. They provide us to build (synchronous) functions in such an aspect that they ensure stopped and continued. It is up to the person developing this function to define where this function can be paused and what event it waits for to resume execution. Therefore Swoole automatically manages the I/O operations with the event loop. This is one of several reasons why Swoole is more practical compared with the other async PHP frameworks. Fibers don’t work for us without Swoole. (For now.) Fibers themselves don’t schedule these executions, but they allow an additional scheduler to resume a paused Fiber. In any realistic environment, this would be handled through an event loop implementing the reactor pattern like Swoole.

Now, I’d like to cover the idea behind Swoole and PHP. We’ll see execution units that run our apps, processes, threads, coroutines, or event loops. These concepts are so important to understand the general approach. In addition to that, we’ll see programming languages async features and concurrency ways. Finally, we’ll take a look at Swoole in PHP.

Executor Units

We call an independent execution unit to run an order of execution as a runner. The executor unit is running within a situation or under an environment. We call the execution environment to be a context. Each type of executor has different functionality and different running cost, it uses up resources. So, we should build and refer to them resourceful to make the whole performance better.

https://www.geeksforgeeks.org/how-software-is-made/

Services

We can say each service is an execution unit. Each service carries some data, serves some functions and features. Each service can communicate with other services or communicate with humans. Micro-services are one of the most popular systems architecting methodological during the past few years. It’s more scalable than a monolith system. Each service can scale independently either developing by teams or running on production. But the downside of microservices is there are communication costs and overhead compared with monolith systems. The states you ask at the moment may stay on another server in the same data center or even a remote server over the internet.

Machines or Instances

Each web server or virtual server instance on the cloud can be called an executor unit or execution environment. It handles data in the memory or on the disk. Each server can communicate with the other servers with IP addresses and ports by establishing TCP/UDP connections. The scope of a server or instance is smaller compared to service.

Processes

The most common executor unit in GNU/Linux OS processes. One task can be run within a process. Different applications are running simultaneously in one box as multiple different and isolated processes. Within a process, it holds the current states of the application in memory at any time point. Generally, the memory cannot be accessed or modified externally. Using more processes won’t make your application faster. Firstly, multiple processes may have to use the same resources as reading the same file on disk. The sum of multiple readers is actually slower than one single reader. Secondly, there is an overhead of managing more processes by the OS and extra there is an overhead of managing more processes by the OS and extra context switching costs. If you have got the experience of managing PHP-FPM servers, you should know you cannot serve more requests by just increasing the PHP-FPM process number. More processes may use more CPU and reduce the performance. In general, only CPU-attached calculations can be sped up by multiple processes. But most of our applications and web systems are I/O bound, so we’re supposed to solve the problem in another way.

Threads

Threads are more lightweight compared with processes. Multiple threads are executed within one process and share some global states. Each thread can access and modify the global states at any time point by default. In order to keep the data in sync, locks are used to prevent the data loss caused by global states being modified by multiple threads. If multiple threads access the same mutable state variable without appropriate synchronization your program is broken. It is known very difficult to write and debug high-performance concurrent programs with threads and locks. We have to deal with the problems like deadlock, data races, etc. Some modern programming languages even don’t have the concept of userland threads like Node.js or PHP.

https://www.slashroot.in/difference-between-process-and-thread-linux

Coroutines

Coroutines are user-space light-weight threads created within one process. Each coroutine can give the execution rights to other coroutines cooperatively. We can call Swoole coroutines Fiber because Swoole schedules the coroutines based on I/O wait or ready status and also provides the preemptive scheduler to switch to another coroutine once one coroutine talks too much time slice of the CPU. You can also find coroutine implementation in Lua, Go, etc. programming languages. The concepts and ways to use them are almost the same.

Concurrent I/O models

Concurrent I/O models mean an application dealing with the load. As a server or service, one or multiple processes or threads on one or multiple CPU cores are responding to a large number of requests. The requests may be coming in different patterns. Like most computer problems, no one answer fits all the situations. You have to understand the tradeoff of these different approaches and situations. You have to understand the tradeoff of these different approaches and choose the concurrent I/O model based on your traffic pattern.

Single process blocking I/O

This is how the very old computer works. The CPU is not shared by multiple tasks. All the tasks are processed one by one even if there is I/O included and the CPU may be free. We cannot refer to this I/O model as a client-server to serve real-time requests. We may run one PHP script to pull data from a message queue and start the execution. Or execute our PHP script with GNU/Linux cron.

Multiple processes blocking I/O

This is how PHP-FPM or most CGI scripts work. The server creates one process for each request and handles blocking calls. Some servers like PHP-FPM or Apache may optimize to reuse the process to save the process creation cost. The benefit is it is straightforward to understand and handle. But the disadvantage is thinking about when 1000 requests are coming in concurrently, the server has to bust to 1000 processes. Because there are quite large costs to launch the processes and also the context switch cost on a large number of processes, the memory cost of each process, this approach is not scalable in the situation of high concurrency. This model is used by most CGI scripts like Ruby or Python, even the AWS Lambda is also used in this pattern.

Multiple threads and blocking I/O

This is how Java applications work. Threads are lightweight compared with processes and share the memory and state of the application. Each request is running on an independent thread. Threads are always handled with a pool to decrease the re-create cost. Java has the non-blocking I/O available in the language but is not used by most web servers. The I/O operations are still blocked. When there is a large number of concurrent connections, we still have the context switch cost. The utility of this approach is we can save and re-use the state, this makes us feel Java applications are faster than PHP. We know Java has also been able to perform non-blocking I/O since version 1.4, but most applications or servers are still running in the blocking mode because of the overhead to write and use the non-blocking I/O API.

Single thread non-blocking I/O

When Node.js was launched back in 2009, it's surprised lots of people. We can hold the state into memory but also operate the non-blocking I/O by using an event-driven callback way. This is brilliant for most I/O bound applications. Each time you’re supposed to do something with I/O, you need to make a request and give a callback function which will be run once the I/O operation is done. Underlayer, you put the tasks including I/O into a queue, and the process handles the tasks and calls your callback function when it is done. The execution cycle is called the event loop. The downside of this approach is everything is executed within the same cycle. Even the non-blocking en queue calls are very fast, there are logics within the cycle that perform CPU calculation. One request may block the other requests within the same cycle. The core concept of this approach is assuming I/O operations are the slowest part of the request. You may not see the advantage of this approach as the request is only performed by pure CPU calculation compared with the thread model. The more connections are processed, the slower the response time for each request. If the request is super large, you are not able to execute it on multiple CPU cores to utilize the power of multiple cores.

Multiple threads and coroutines non-blocking I/O

This is how the Go programming language works. It creates user-land lightweight thread goroutines to perform I/O operations. When the I/O operation is started, the scheduler puts the goroutine into sleep mode. Then wake it up once the I/O operation is done. Compared with Node.js, Go let you map the I/O operation or concurrency with a lightweight thread, and automatically schedule these threads at userland. In the real world, the goroutines are also mapped to multiple OS threads to use all the CPU cores. It ends up with the best of both worlds: non-blocking I/O and your code look blocking and easy to understand and safeguard.

Swoole

People have implemented and tried several I/O models to solve the concurrent problem as we can see in this section. Multiple I/O models are supported by Swoole PHP, you can choose which one based on your project. We will introduce more about the I/O models implemented in Swoole in the other sections. Swoole PHP Coroutine can be understood as userland threads. There is no Linux OS process context switch that occurs when coroutine context switches. Coroutine context switches are much less expensive compared with OS-level context switches. Only the local stack of the coroutine is switched, the heap is remaining the same. The local stack of a coroutine can be only 8KB compared with several MB OS level stacks. There are several reasons why Swoole is delivered as a PHP extension. The first reason is about performance, although you can implement async I/O and event loops with pure PHP codes and some event loop extensions. This approach means you have to implement all client libraries with PHP but if some client libraries such as MySQL libraries are implemented with PHP the performance may downgrade significantly. Secondly, you can’t use the native PHP client-side functions such as curl, you have to rewrite the code with a new syntax to manage the I/O on the event loop. Swoole can replace these functions which are used by lots of third-party libraries in the PHP extension transparently. Swoole is more practical than the other async PHP frameworks, you don’t have to build up a whole new PHP ecosystem and rewrite all the PHP packages. The third reason is about supporting coroutines, you can use Swoole coroutine syntax and create lightweight coroutines to manage concurrent I/O.

Compare with the lifecycle of PHP-FPM

Unlike this shared-nothing design in PHP-FPM, Swoole reuses the context to process the request. So, you’re not supposed to bootstrap the kernel context of your application framework for each request. This is helping a large cost saving for any PHP-FPM applications.

The lifecycle of a PHP-FPM request:

1- Receive the request
2- Load and compile the PHP files and codes
3- Initialize the context object the variables
4- Execute functions
5- Send the response
6- Recycle the resources

All the above 6 steps are handled for each request. The lifecycle of a Swoole request:

1- Load and compile the PHP files and codes
2- Initialize the context object the variables
3- Receive the request
4- Execute functions
5- Send the response
6- Recycle the resources

https://blog.resellerspanel.com/hepsia-control-panel/swoole-async-php-network-framework-enabled.html

When PHP-FPM gets a new request, it has to run a new state for the request. Each request is reserved and shares anything between them.

https://www.amazon.com/Mastering-Swoole-PHP-performance-concurrent-ebook/dp/B0881B227S

Event Loop

Swoole creates multiple threads to handle I/O events in the PROCESS mode. The single thread event loop model is handled by Node.js and Redis, so why does Swoole use multiple threads? That is because one thread model is not scalable in some cases. Picturing you are streaming a 10GB file to the connection, the single thread will be full for a while, it may affect the other connections. A general issue of full Redis servers is only one CPU core is busy, the workload is not evenly distributed to all the CPU cores. The newest Redis server figured out this issue and also adopted multiple threads I/O models. There is also an event loop to handle the I/O in stateless PHP-FPM. PHP-FPM is a server that can handle multiple concurrent TCP connections when the request is ready, it passes the request to a PHP process and executes it in the blocking and stateless way. So, there are always multiple PHP processes launched and managed by PHP-FPM.

Conclude

I hope I was able to change your mind regarding Swoole and PHP world async functions in a positive way. Adopting the features and Swoole into your company or team will largely increase the performance, reliability, scalability of your apps. Compared with the other programming languages, we may increase the PHP system from two different directions: I/O severe operations and CPU severe operations. As of the release of PHP 8 JIT, we can anticipate a major improvement from the CPU operations side. Swoole is a grown solution for optimizing the I/O severe operations. So we anticipate expecting a major improvement of the PHP ecosystem by combining the power of Swoole async I/O, coroutines with PHP 8 JIT. We can face more and more use cases we have never seen before. If you have additional questions please feel free to catch up with me. Thanks for reading!