Formats
Let’s have a tabs vs. spaces discussion. I’m kidding. But now that Phil has left, I want to look at different configuration file formats. Specifically JSON, INI and raw PHP arrays.
Focus
Before we deal with specifics, it’s important that you understand why I’ve chosen these formats to focus on. I was thinking about how to improve the configuration system we use in SilverStripe, and it already uses YAML.
The trouble is YAML configuration depends on non-native parsers. PHP supports JSON, INI and PHP files out the box. It’s not too useful to compare the performance of native code to that of third-party implementations. It would be useful to compare different YAML parsers, but that’s a story for another time.
Performance
As we’ll see; performance differences between these are a non-issue. Given the following fixtures:
[APP]
ENV = "local"
DEBUG = "true"
LOCALE = "en"
[DB]
CONNECTION = "mysql"
HOST = "localhost"
DATABASE = "homestead"
USERNAME = "homestead"
PASSWORD = "secret"
[CACHE]
DRIVER = "memcached"
[SESSION]
DRIVER = "memcached"
[QUEUE]
DRIVER = "database"{
"APP" : {
"ENV" : "local",
"DEBUG" : "true",
"LOCALE" : "en"
},
"DB" : {
"CONNECTION" : "mysql",
"HOST" : "localhost",
"DATABASE" : "homestead",
"USERNAME" : "homestead",
"PASSWORD" : "secret"
},
"CACHE" : {
"DRIVER" : "memcached"
},
"SESSION" : {
"DRIVER" : "memcached"
},
"QUEUE" : {
"DRIVER" : "memcached"
}
}<?php
return [
"APP" => [
"ENV" => "local",
"DEBUG" => "true",
"LOCALE" => "en",
],
"DB" => [
"CONNECTION" => "mysql",
"HOST" => "localhost",
"DATABASE" => "homestead",
"USERNAME" => "homestead",
"PASSWORD" => "secret",
],
"CACHE" => [
"DRIVER" => "memcached",
],
"SESSION" => [
"DRIVER" => "memcached",
],
"QUEUE" => [
"DRIVER" => "memcached",
],
];
…and the following benchmarks:
$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
$content = parse_ini_file(
__DIR__ . "/fixture.ini", true
);
}
$end = microtime(true);$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
$content = json_decode(
file_get_contents(__DIR__ . "/fixture.json"), true
);
}
$end = microtime(true);$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
$content = include(
__DIR__ . "/fixture.php"
);
}
$end = microtime(true);
…which do you think performed the best?
Before I made this comparison, I believed including PHP files would be the most efficient. It makes a certain kind of sense. The PHP interpreter is loading other PHP files. The interpreter is already warm. There shouldn’t be much work to do, when compared with other formats.
I imagined JSON and INI files would take roughly the same amount of time to load. Perhaps JSON would take longer, because it supports different types and deeply-nested structures.
What I now think I know is that PHP and INI files take roughly the same amount of time to load and parse 100,000 files. Around 2.4–2.6 seconds. JSON takes around 1.8–1.9 seconds to load and parse 100,000 files.
What JSON gains in speed, it loses in memory consumption. PHP and INI benchmarks use around 256kb (per 100,000 iterations), while JSON uses around 512kb.
Features
INI files are the easiest for non-developers to use. They’re also the most fault-tolerant. Syntax error on line 3? Ignore line 3.
On the other hand, INI files limit the kinds of data types you can use. Everything’s a string. JSON files are slightly better because you can use numbers, strings, booleans and null.
PHP is the most versatile, in this area. With the right imports, you can use complex data types right in your configuration files. Other, smarter people have come to this conclusion already. Sometimes PHP is the best way to configure PHP…
Validation
It’s good to know when your configuration files are invalid, and how this will affect the environment. json_decode and parse_ini_file return null and false when they encounter invalid data.
Include a faulty PHP file and your environment blows up. I thought about this and came up with the following validation function:
function valid($file) {
if (file_exists($file)) {
$escaped = escapeshellarg($file);
$command = PHP_BINARY . " -l {$escaped} 2> /dev/null";
ob_start();
exec($command, $result);
ob_end_clean();
if (count($result) > 0) {
if (stripos($result[0], "No syntax errors") === 0) {
return true;
}
}
}
return false;
}
This function feels horrible, but it’s fairly solid. We check if a PHP file exists. If so, we construct a shell command. This command runs the file through the PHP linter, in the same way as we could run:
$ php -l file.php 2> /dev/null
The bit following the file name redirects error output to /dev/null. The command checks if file.php contains any syntax errors. If it is valid, it will return the message No syntax errors detected in file.php.
We check for that message, and if we see it we return true. For anything else we return false.
Implications
How can we use these things to our benefit? SilverStripe has a set of build tools. One of the things these tools do is to convert these YAML files into intermediate PHP files, after some processing and filtering. A cache of sorts. When code looks up configuration variables, we use these intermediate files.
Even if we used one of the “native” formats, in place of YAML, we’d probably still have a build step. There as some inheritance and environmental features to account for…
So perhaps we could make things load faster if we used JSON as the intermediate format. That would reduce load time but increase the memory usage. Both of which are negligible after the first database query.
Perhaps we could increase the complexity of configuration option types by using JSON or PHP. Perhaps we could reduce the complexity of configuration by using PHP files exclusively.
If anything, we can learn the truth behind beliefs like “PHP files would be faster than JSON or INI files”. That small amounts of thought, testing and measurement are crucial to effecting real change.