Gearman In Your Sockets

Creating reactive applications is hard. As a community, we PHP developers have been writing blocking code behind web servers like Apache and Nginx, for ages. We’re just not used to using the event-loop and reducing/removing blocking IO.

When we design our scripts to be volatile, we don’t have to think [as much] about high memory and CPU usage. Sure, we still need to optimise the odd bottleneck, but it’s nothing compared to running our own web server.

High memory consumption is one of the main arguments against building reactive applications. Yet other platforms, like Node, suffer the same problems. They’ve just had them for longer. The ecosystem is more mature and stable.

Fortunately, there are tools to help us reduce memory and CPU usage where it matters. We already have React nor Ratchet to give us non-blocking HTTP and socket operations. We can combine these with threaded/concurrent processing extensions.

You can find the code on Github. You can find the discussion on Reddit. You should already have PHP 5.4+ and Composer installed.

Trying Gearman

Gearman is a mature extension of deferring intensive processing to workers. These workers execute in parallel threads to the server/manager that spawns them.

This is quite like the HTML5 Web Worker specification. One script acts as a worker while another acts as a manager/client, creating new workers on demand.

As I’m working on a Macbook, I need to run this command to install Gearman:

→ brew install php56-gearman

This installs a PHP extension, and the Gearman daemon as a dependency.

Installing extensions and daemons can be tricky. I don’t have a Linux or Windows PC to test these instructions on, so you’ll have to do a bit of research.

Before we look at any Gearman code, let’s think about a situation in which it would be useful. Imagine we wanted to ping a remote server, once a second, for 5 seconds. We might do something like:

print "starting\n";

for ($i = 1; $i < 6; $i++) {
print "ping {$i}\n";

// ping the server...

sleep(1);
}

print "complete\n";

This code prints ping x every second. There’s an artificial delay of 1 second, and it’s not even pinging an actual server. This process is blocking because nothing can happen while the sleep() function is running. Imagine how much worse it would be if the ping operation took longer than a second.

This is the simplest example I could think of, to illustrate a blocking task. If you prefer, think of some blocking file IO or image manipulation. The point is we are going to move intensive, blocking tasks away from the HTTP/socket layer.

So let’s move this processing outside of the main script. We need to split the code into 2 files; gearman-client.php and gearman-worker.php. First the client:

$client = new GearmanClient();
$client->addServer();


$client->setDataCallback(
function (GearmanTask $task) {
print "data: {$task->data()}\n";
}
);


print "sending\n";

$client->addTask("ping", "noop");
$client->runTasks();

We begin by creating a GearmanClient. This connects to the local Gearman service, thanks to the addServer() method. We attach a data callback, since we need a way for the worker to communicate with the server.

Then we add a ping task and call the runTasks() method. This sends the task to any running Gearman workers. The worker is slightly more complicated:

$worker = new GearmanWorker();
$worker->addServer();


$worker->addFunction(
"ping",
function (GearmanJob $job) {

print "received: {$job->handle()}\n";

for ($x = 1; $x < 6; $x++) {
print "sending data: ping {$x}\n";

$job->sendData("ping {$x}");
sleep(1);
}

print "waiting\n";
}
);


print "waiting\n";

while ($worker->work());

We create a new GearmanWorker and add the local server, just like we did for the client. Then we add a ping function, which receives a job. The callback is like the blocking code we had before, except that we also call the sendData() method. This triggers the data callback we added in gearman-client.php.

We need to run both of these scripts, at the same time:

tab 1 → php gearman-worker.php
tab 2 → php gearman-client.php

You should see the these messages, in tab 1:

waiting
received: H:[network id]:[task id]
sending data: ping 1
sending data: ping 2
sending data: ping 3
sending data: ping 4
sending data: ping 5
waiting

…and these messages, in tab 2:

sending
data: ping 1
data: ping 2
data: ping 3
data: ping 4
data: ping 5
If you’re getting errors, at this point, it might be that the Gearman daemon isn’t running. For example, I had to start it up by running gearmand in another tab.

Since we’re using a data callback, this process could be asynchronous. If we print something extra, we’ll see a problem:

print "sending\n";

$client->addTask("ping", "noop");
$client->runTasks();

print "before the 'received' message\n";

The gearman-client.php output will be:

sending
data: ping 1
data: ping 2
data: ping 3
data: ping 4
data: ping 5
before the 'received' message

Callback or not, this process is blocking. Gearman does background tasks, but it’s harder to track their progress. We need to introduce another (albeit temporary) loop:

print "sending\n";

$job = $client->doBackground("ping", "noop");

print "before the 'received' message\n";

do {
sleep(1);


print "checking\n";

$status = $client->jobStatus($job);

if ($status[0] === false) {
break;
}
} while (true);

This time gearman-client.php will output:

sending
before the 'received' message
checking
checking
checking
checking
checking
checking

Ok, so this is now kind of asynchronous, but still blocking. We’ll remove the loop later. First we need to find a way to fix/replace the data callback…

Sending Data

Turns out it’s not possible (eek!) to send data from a background worker task to the client that initiated that task. Fortunately we already know an elegant way to communicate with these background tasks: sockets.

Let’s recap some basic structure, beginning with creating a server.php file:

require "vendor/autoload.php";

use Formativ\Server;
use Ratchet\Http\HttpServer;
use Ratchet\Server\IoServer;
use Ratchet\WebSocket\WsServer;

$server = IoServer::factory(
new HttpServer(
new WsServer(
new Server()
)
),
8080,
"127.0.0.1"
);

$server->run();

If this is unfamiliar to you, check out the previous socket post, where I explain how it all works. We also need a stripped-down version of the server:

namespace Formativ;

use Exception;
use Ratchet\ConnectionInterface as Connection;
use Ratchet\MessageComponentInterface;
use SplObjectStorage;

class Server implements MessageComponentInterface
{
/**
* @var SplObjectStorage
*/
protected $connections;

public function __construct()
{
$this->connections = new SplObjectStorage();
}

/**
* @param Connection $connection
*/
public function onOpen(Connection $connection)
{
$this->connections->attach($connection);
}

/**
* @param Connection $connection
* @param string $message
*/
public function onMessage(Connection $connection, $message)
{
// TODO
}

/**
* @param Connection $connection
*/
public function onClose(Connection $connection)
{
$this->connections->detach($connection);
}

/**
* @param Connection $connection
* @param Exception $exception
*/
public function onError(
Connection $connection,
Exception $exception
)
{
$connection->close();
}
}

If we want to use sockets, for communicating with the workers, we need to store their status and/or completion values:

/**
* @var array
*/
protected $values = [];

/**
* @param Connection $connection
* @param string $message
*/
public function onMessage(Connection $connection, $message)
{
$message = json_decode($message, true);

if ($message["type"] === "worker.complete") {
$this->values[$message["id"]] = $message["value"];
$connection->send("got it!");
}

}

We can simulate the worker, with the following console JS:

var socket = new WebSocket("ws://127.0.0.1:8080");
socket.addEventListener("message", function(e) {
console.log(e.data)
});
socket.send(JSON.stringify({
type: "worker.complete",
id: 1,
value: "foo"
}));
// results in "got it!"

Now we need to plug the gearman-client.php code into the server. First we need to hold onto the client instance:

/**
* @var SplObjectStorage
*/
protected $connections;

/**
* @var GearmanClient
*/
protected $client;

public function __construct()
{
$this->connections = new SplObjectStorage();

$this->client = new GearmanClient();
$this->client->addServer();

}

Then we need to add a new message type:

/**
* @param Connection $connection
* @param string $message
*/
public function onMessage(Connection $connection, $message)
{
$message = json_decode($message, true);

if ($message["type"] === "ping") {
$job = $this->client->doBackground("ping", "noop");

$connection->send("before the ‘received’ message\n");

do {
sleep(1);

$connection->send("checking\n");

$status = $this->client->jobStatus($job);

if ($status[0] === false) {
break;
}
} while (true);

}

if ($message["type"] === "worker.complete") {
$this->values[$message["id"]] = $message["value"];
$connection->send("got it!");
}
}

You’ll immediately notice that this just as blocking as it was before. If you connect to the socket (from console) and send a ping message type; you’ll wait 5–6 seconds for any feedback. That’s because the sending is blocking for all that time.

We’re also not seeing what the final value is. That is until we change the worker to communicate the final value, through a socket connection. We’ll use a custom web socket client:

{
"require": {
"cboden/ratchet": "0.*",
"textalk/websocket": "1.*"
},
"autoload": {
"psr-4": {
"Formativ\\": "src"
}
}
}
There is a Ratchet web socket client, called Pawl, but the one I’ve used here requires slightly less code. It’s also blocking, where Pawl is not. That’s ok in the workers though — they’re meant to do blocking stuff!

Then we need to change gearman-worker.php to send completion messages to the socket server:

use WebSocket\Client;

$client = new Client("ws://127.0.0.1:8080");

$worker->addFunction(
"ping",
function (GearmanJob $job) use ($client) {
print "received: {$job->handle()}\n";

for ($x = 1; $x < 6; $x++) {
print "ping {$x}\n";
sleep(1);
}

$client->send(
json_encode(
[
"type" => "worker.complete",
"id" => $job->handle(),
"value" => "done"
]
)
);

print "response: {$client->receive()}\n";


print "waiting\n";
}
);
We’re not returning anything useful from the worker, but the point is that we could if we wanted to!

The worker script should now output:

waiting
received: H:[network id]:[task id]
ping 1
ping 2
ping 3
ping 4
ping 5
response: got it!
waiting

The worker is running the loop, connecting to the web socket to send a completion value. It’s also returning the confirmation message. Next, we need to remove the blocking loop from the onMessage() method. We need to store the event loop:

/**
* @var LoopInterface
*/
protected $loop;

/**
* @param LoopInterface $loop
*/
public function setLoop(LoopInterface $loop)
{
$this->loop = $loop;
}

We’ll need to provide this, back in server.php:

$custom = new Server();

$server = IoServer::factory(
$http = new HttpServer(
new WsServer($custom)
),
8080,
"127.0.0.1"
);

$custom->setLoop($server->loop);

$server->run();

Then we can add timers, which are like the setTimeout() function from JS:

/**
* @param Connection $connection
* @param string $message
*/
public function onMessage(Connection $connection, $message)
{
$message = json_decode($message, true);

if ($message["type"] === "ping") {
$job = $this->client->doBackground("ping", "noop");

$connection->send("before the 'received' message");

$timer = $this->loop->addPeriodicTimer(
0.1,
function () use ($connection, $job, &$timer) {
$status = $this->client->jobStatus($job);

if ($status[0] === false) {
$value = $this->values[$job];

$connection->send("job: {$job}");
$connection->send("value: {$value}");

$timer->cancel();
}
}
);

}

if ($message["type"] === "worker.complete") {
$this->values[$message["id"]] = $message["value"];
$connection->send("got it!");
}
}

Restart all the things, and send another ping message (from the console). You should see before the ‘received’ message. Then nothing for 5 seconds. Then job: H:[network id]:[task id] followed by value: done. How exciting is that?!

Wrapping Up

There’s a lot more we can do to streamline this process. We could abstract much of the worker logic, so creating new worker classes required less boilerplate. We could abstract the timer logic so deferring tasks to Gearman would require less boilerplate. But I think we’ll leave it there, for now.

If you’re stuck with any of this, send me a tweet or make a Github issue. I’ll be glad to help!

Chris Boden also pointed me to React-Gearman — check it out if this stuff interests you as much as it does me!