Consuming AMQP messages in PHP

Have you ever tried to setup AMQP consumer for PHP application? If you did, you probably know how complicated it could be.

There are overall two popular approaches to run AMQP consumer for PHP applications. First, is to run a pure PHP process which receives messages directly from AMQP, manages AMQP connection and process messages. Second, is to run an application which would consume messages from AMQP and then run php-cli with a particular PHP script to process them.

Pure PHP consumer

This was my first attempt to build AMQP consumer, and it has proven to be unreliable quite quickly.

Implementing AMQP consumer in pure PHP is very easy, you get one of the AMQP libraries for PHP, create a little PHP script to start your consumer and then setup supervisor to make sure the script is always running.

There are few PHP libraries which implement AMQP protocol in pure PHP: php-amqplib, bunny, and even my own butter-amqp, as well as php-amqp, a PHP module which implements AMQP protocol support.

Memory usage and shared state

Most PHP code runs in a scope of a single request, every request starts from scratch and everything allocated during request processing is destroyed after. Many PHP libraries and frameworks consider this and manage memory accordingly. You can find plenty of PHP code which uses global or static variables to keep short time cache or request specific data, have little memory leaks which are invisible unless you run code for few minutes and so on. If you try to run a long-running process you will quickly see your RAM filling up as well as your application misbehaving because of the shared state.

I believe, this issue can be avoided if you are using very fewer 3rd party libraries, probably no frameworks and manage memory carefully. Although, realistically most of the code bases are using web frameworks, ORM’s etc which are not designed to run in a long-running process. So, finally, you may end up restarting your consumer every few messages to free some memory.

Parallel processing

If you are running pure PHP consumer, you can process a single message at a time (well, because PHP runs in a single thread). Unfortunately, there is no way to work this around. I have tried to build something with pthread, but never get it working. So, the only way to process more than one message at a time is to run more than one PHP consumer on the same server. And this is where PHP hits memory usage again. A single, freshly started, PHP process with a popular framework would already consumer few hundreds of MB of RAM. If you run 10, you would need few GB, then combine with the previous issue and your cloud computing bill will be skyrocketing. And worst of it, memory usage is high even if there are no messages in the queue.

Broken pipe

A network is unreliable, sometimes connections are getting broken without properly notifying clients, and when it happens your consumer is simply hanging in memory waiting for a next package from an AMQP server which never arrives. AMQP protocol is designed to mitigate this issue and offers a heartbeat mechanism. An AMQP server sends a special package to a client within a given interval, if a client does not receive a package within the interval, connection considered to be broken. Unfortunately, heartbeat support in PHP libraries is quite limited, so it does not always work.

This issue can be solved with proper heartbeat configuration, or at least with a cronjob of some sort which would restart consumer periodically.

rabbit-cli-consumer

After fighting with pure PHP consumers for a while, I found rabbit-cli-consumer. It’s an application written in Go, which consumes messages from AMQP and runs a cli command for each message. Then, acknowledges message based on exit code. This approach is working for me for a few years now, after a while I had to write my own version of cli consumer to implement some project specific features, but the idea of processing messages via php-cli remained the same.

This approach solves all issues listed above, it provides a perfect amount of isolation among processes, allows parallel processing, and perfectly supports heartbeat. The only issue with it is cli overhead. Every message would need to start and initialize PHP interpreter, which lowers message throughput, especially on large code bases.

Any other option?

After I hit few walls with pure PHP consumer I had to give up on the idea of long-running processes in PHP. That time everybody was telling me it’s a bad idea and this is just not how PHP code runs. But, apparently, I had to prove it to myself 😅

Web applications in PHP typically have a Web Server which handles network and FastCGI backend (PHP-FPM) which keeps a pool of PHP workers to process every single request. So, I figured, why not leverage the same approach for AMQP messages? We can use PHP-FPM to process messages in an isolated and efficient way! All we need is an application which would consume a message from AMQP server and perform a FastCGI request to PHP-FPM. So, I’ve built amqp-cgi-bridge to do exactly this.

I have tried to run some benchmarks, but they haven’t shown anything interesting for a small test script. Since message processing is done by PHP in every approach, a performance of PHP script defines performance of the whole system. As expected rabbit-cli-consumer was a bit slower because of cli overhead. Although it was not too much with my small test script, throughput was just about 9% lower. Pure PHP and amqp-cgi-bridge performance were pretty much the same.

Let me know what do you think!

If you are using some other approach to run AMQP consumer for PHP, feel free to share your experience. I would also really appreciate if you give any feedback on amqp-cgi-bridge.