Monitor your application using Pinba

At Dailymotion, we use a lot of monitoring tools to watch, analyse and optimize our applications. We use statsd, graphite, collectd and Datadog, but also a lesser-known tool: Pinba (which stands for PHP Is Not A Bottleneck Anymore).

We’re currently working on the migration of an old monolithic PHP application with a bunch of legacy features, to a modern scalable set of services. For this, we’re using Pinba to gather technical metrics about:

  • our current PHP application: it’s helping us for maintenance and rollouts (detecting releases that adversely affect performance) and also to find and understand bottlenecks. It allows us to establish which parts of the framework will give the greatest performance improvement by an early migration. In other words, identifying low-hanging fruit.
  • our new microservices, using different clients according to the language we used to develop the service (Python and Go for the time being).

What is Pinba?

According to its creator tony2001, Pinba is a statistics server using MySQL as an interface. It accumulates and processes data sent over UDP by multiple PHP processes and displays statistics in a nice human-readable form of simple “reports”, also providing a read-only interface to the raw data in order to make possible generation of more sophisticated reports.

How does it work?

The Pinba PHP extension is deployed on all the servers in our web farm. For each PHP thread, it collects technical metrics, and sends these metrics to a central Pinba server over UDP after the response generation is complete (using the Google Protobuf data interchange format). On the central server, the Pinba engine aggregates the contents of these messages, and bulk inserts them periodically into a read-only MySQL database, providing a simple and well-known exploration tool: a MySQL client. Simple and efficient.

Pinba is neither, as is often heard, a debugging tool nor a graphing tool. It is simply an exploration tool, providing a picture of what’s going on in your application on production, in real time, for the last minute or two. This is very powerful:

  • it provides live metrics in production with no impact on performance,
  • it’s a good tool to detect bottlenecks,
  • if you use another tool to consolidate Pinba metrics, you can build some very useful time series graphs.

Pinba is not a magical tool that allows you to do anything with it. A typical mistake is to use it to gather exact data (such as a counter: how many time did an event occur in the last minute). Pinba isn’t designed to perform this kind of task (as we saw before, protobuf messages are sent via UDP for performance reasons, so it’s not impossible for some packets to be lost in transit). Use it to observe tendencies, not to measure actions with precision.

Our architecture

What metrics can you get from it?

Out of the box, the technical metrics gathered in all PHP processes are already very extensive, including:

  • doc_size: response size
  • mem_peak_usage: memory allocation (maximum)
  • request_time: time spent to generate the process (microsend resolution)
  • ru_utime/ru_stime: statistics about resource usage on the server (user and system)
  • status: HTTP response code
  • memory_footprint: size of the process

The engine stores these metrics per process in the request table, so in this table we have one row for every PHP response produced during last pinba_stats_history seconds (60 seconds in our case).

mysql> SELECT script_name, doc_size, mem_peak_usage, status, memory_footprint, hostname FROM request LIMIT 10;

As the request table exposes metrics for every PHP process, access to the raw data can be pretty cumbersome if you have a lot of traffic:

mysql> SELECT COUNT(1) from request;

In other words, 1197502 php responses were generated during a given sixty seconds on our entire web farm.

Fortunately, the engine aggregates on the fly, storing consolidated metrics in report_* tables simultaneously. For instance, metrics grouped by script_name are available in report_by_script_name table, so we can explore metrics for a specific script_name. For instance, if we want statistics about the ten most frequently called routes:

mysql> SELECT script_name, req_count, req_per_sec, memory_footprint_total, req_time_median FROM report_by_script_name ORDER BY req_count DESC LIMIT 10;

Here we can calculate the mean memory footprint, just fetching memory_footprint_total/req_count AS avg_mem_footprint

Out of the box, the engine consolidates metrics in a number of other report tables, which you can explore easily:

  • report_by_server_name
  • report_by_hostname
  • report_by_server_and_script
  • report_by_hostname_and_server
  • report_by_hostname_server_and_script

You can drill down merely by using a table that consolidates data across more dimensions, for instance most used scripts per host and server:

mysql> SELECT hostname, server_name, script_name, req_count, req_per_sec, req_time_median FROM report_by_hostname_server_and_script ORDER BY req_count DESC LIMIT 2;

Collecting your own metrics

As shown, Pinba provides a lot of useful data out of the box, but you can go further, really! Besides report tables, you can use tag_report_* tables. Metrics are consolidated in these tables according to the values of tags you define in your codebase.

For instance, in the old Dailymotion codebase, we have two kinds of tags :

  • I/O: all external access, blocking the execution of the process (principly access to back-ends like MySQL, Elastic, Redis, memcached, … or external APIs called server to server such as Facebook or Twitter).
  • algo: algorithms we want to monitor

As an example, this is how we tag Memcache:

Client code

$pinbaTimer = $this->pinbaService->startTimer([ 'group' => 'memcached', 'memcached' => $this->pinba_pool, 'method' => 'get', 'namespace' => 'dm_cache' ]); $value = $this->memcache->get($this->prefix . $key); $this->pinbaService->stopTimer($pinbaTimer);

Pinba service

public function startTimer(array $tags) { if (!$this->isPinbaInstalled) { return null; } $timerId = 't' . ++$this->incr; $this->timers[$timerId] = pinba_timer_start($tags); return $timerId; } public function stopTimer($timerId) { if ($this->isPinbaInstalled && isset($this->timers[$timerId])) { pinba_timer_stop($this->timers[$timerId]); unset($this->timers[$timerId]); } }

We then define custom tables in the Pinba database, telling to the engine what to store in it using table comments.

For instance, the table to store metrics about memcached usage is as follows:

CREATE TABLE `tag_report_memcached_method_namespace` ( `script_name` varchar(128) DEFAULT NULL, `memcached` varchar(64) DEFAULT NULL, `method` varchar(64) DEFAULT NULL, `namespace` varchar(64) DEFAULT NULL, `req_count` int(11) DEFAULT NULL, `req_per_sec` float DEFAULT NULL, `hit_count` int(11) DEFAULT NULL, `hit_per_sec` float DEFAULT NULL, `timer_value` float DEFAULT NULL, `timer_median` float DEFAULT NULL, `index_value` varchar(256) DEFAULT NULL ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='tagN_report:memcached,method,namespace';

And a table to store metrics about application I/O:

CREATE TABLE `tag_report_group_method` ( `script_name` varchar(128) DEFAULT NULL, `group` varchar(64) DEFAULT NULL, `method` varchar(64) DEFAULT NULL, `req_count` int(11) DEFAULT NULL, `req_per_sec` float DEFAULT NULL, `hit_count` int(11) DEFAULT NULL, `hit_per_sec` float DEFAULT NULL, `timer_value` float DEFAULT NULL, `timer_median` float DEFAULT NULL, `index_value` varchar(256) DEFAULT NULL ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='tag2_report:group,method';

Note to self: next time, don’t use SQL reserved words for column names, like group...

Using these tables, we can obtain fine-grained statistics. For instance, statistics about all I/O activity on a specific route:

mysql> SELECT `group`, SUM(req_count) AS req, AVG(timer_median) AS timer_median, SUM(timer_value)/SUM(hit_count) AS tph, SUM(timer_value)/SUM(req_count) AS tpr, SUM(timer_value) AS timer_total FROM tag_report_group_method WHERE script_name='[PROD]video_item' AND `group` != "int_api" GROUP BY `group` ORDER BY timer_total DESC;

Specific details about MySQL SELECTs :

mysql> SELECT mysql, method, timer_value, timer_value/req_count AS avg_timer_value, hit_count/req_count AS avg_op_count, timer_value/hit_count AS avg_op_value, hit_count, req_count FROM tag_report_mysql_method WHERE method='select' AND script_name='[PROD]video_item' AND req_count > 200 ORDER BY avg_op_count DESC LIMIT 4;

We keep an eye on these types of queries: an unexpectedly high number of selects probably indicates insufficient caching.

Going further

We’re using another great feature provided by Pinba: request tags. You use them to differentiate requests of different types. The request is flagged with values of the tags you defined. For instance we use request tag to flag queries for:

  • format: response format (html, rss, atom, ...)
  • site_content: localization of the site content (en, de, fr, jp, ru, ...)
  • bot: is the client a bot? (yes/no)
  • auth: is the user authenticated (logged in)? (yes/no)
  • provider: used to see variations between different php versions: native PHP versus HHVM, ...

We then create tables, setting comment directives for the engine, so it knows what to store where. For instance, there is a table based on report_by_script_name, where the engine will store data only for PHP processes with value 'no' for request tag 'bot', and 'no' for request tag 'auth' (in other words, unlogged real users - the usual kind):

CREATE TABLE `auth_no_bot_no_report_by_script_name` ( `req_count` int(11) DEFAULT NULL, `req_per_sec` float DEFAULT NULL, ... `req_time_median` float DEFAULT NULL, `index_value` varchar(256) DEFAULT NULL ) ENGINE=PINBA DEFAULT CHARSET=latin1 COMMENT='report1::tag.auth=no,tag.bot=no';

Building graphs

If you want to build graphs from data gathered in Pinba, you must to consolidate it outside (as you store metrics for a minute or two before discarding them). For this, we use collectd coupled to a Mysql plugin. It runs queries against the Pinba database, and stores results in Whisper, and then we query Whisper using Graphite and Tessera or Grafana.

Conclusion

Pinba usage in our old PHP application allowed us to:

  • detect bottlenecks in the application
  • find bugs (excessive backend usage in some routes, …)
  • build graphs and plug alerting to detect regressions after a release

We’re also using Pinba in our Python projects, using Pynba, written by our old friend (and now colleague) johnnoone.

One thing we have noticed with our installation is that the housekeeping thread is not always perfect at evacuating old data: sometimes report tables contain phantom results of activity that took place long before the previous minute. This can be a problem when you’re using Pinba to determine that something is not happening (usually after landing new code in production). This is one of the few cases where, for once, the bug is not in your code.

We have found that the simplest way to fix this problem is to restart the Pinba engine. This produces a brief glitch in time series graphs based on Pinba, but effectively cleans out phantom data. (Note: you can’t use TRUNCATE TABLE, because it's not really a Mysql engine behind the curtain).

Essential Pinba Resources


Originally published at engineering.dailymotion.com on November 16, 2015.