Improving WordPress with Static Analysis

When PHP 5.2.4 was released in August 2007, the world was reading the last instalment of the Harry Potter series, watching Transformers in theatres, and following a junior senator from Illinois as he embarked on his outsider bid for the US presidency.

Now it’s 2018. Harry Potter’s son is ending his first year at Hogwarts, the Transformers have saved the world four more times, and that junior senator is now enjoying his third term as President of the United States. [Editor’s note: unable to verify]

But most importantly, the current version of WordPress, the world’s most popular PHP application, still runs perfectly well on PHP 5.2.4.


It’s not necessarily a weakness to have your code work on an 11-year-old version of PHP, but it is a reflection of the WordPress ecosystem’s incredibly conservative approach to change, and its commitment to run under the harshest of circumstances.

In a sense, WordPress is the quintessential PHP app: it gets the job done, and seldom makes backwards-incompatible API changes, while remaining easy to install and extend. But to many modern PHP developers, WordPress is a reminder of how you used to have to write PHP, and how you thankfully don’t anymore.

PHP the ancient way

The main headache of early PHP codebases is this: any time you used a symbol (a function, class or constant) in a file, you’d first have to either have to define it or include an explicit reference to the file that defined it. So if you wanted to use a class ArticleLoader defined in classes/ArticleLoader.php and a class Formatter defined in classes/Formatter.php you’d have to include those files explicitly:

<?php 
require_once('classes/ArticleLoader.php');
require_once('classes/Formatter.php');
?>
<title>
<?php
echo Formatter::escape(
ArticleLoader::get($_GET['id'])->title
);
?>
</title>

If your file had n classes, and each class was defined in its own file, that meant writing n require_once statements (usually at the top of the file).

Because this quickly got unwieldy, people would create a single file with many unrelated functions and classes, or create proxy files containing nothing more than multiple require_once expressions (which would then be required by the initial files). In a sufficiently large app, it became almost impossible to know which functions and classes were available in which files, leading to lots of unnecessary require calls.

PHP the modern way

Those restrictions on loading classes disappeared when PHP 5.1 came out—it introduced a new function, spl_autoload_register, that enabled developers to specify code to handle the loading of files.

A couple of developers built a fantastic dependency management system on top of that new functionality, and named it Composer. Composer is now the default dependency management system used by the PHP ecosystem, and it’s relieved countless headaches.

Rewriting the example above with Composer in mind would look something like this:

<?php require('vendor/autoload.php'); // path to Composer ?>
<title>
<?php
echo Formatter::escape(
Acme\Article\Loader::get($_GET['id'])->title
);
?>
</title>

Behind the scenes Composer uses a config file called composer.json to figure out where the file that defines the classAcme\Article\Loader lives (customarily somewhere like src/Acme/Article/Loader.php).

You only have to think about the location of that single vendor/autoload.php file — Composer takes care of everything else.

It’s much easier to type-check PHP code that uses autoloaders. Consequently, a number of tools have sprung up to do just that.

PHP the WordPress way

WordPress has always been designed to work on old versions of PHP. Though it currently supports PHP 5.2.4 at a minimum, much of its core architecture was built to support PHP 4, and WordPress’s dedication to backwards-compatibility ensures that a majority of that original architecture is still in use today.

WordPress also harks back to a time when PHP was primarily a templating engine — people could easily mix static HTML with server-rendered content, while more recently-created PHP content management systems typically implement their own rendering engines. By doing things the old-fashioned way, WordPress makes it really easy for non-experts to create templates, and that’s a big reason for its continued success.

Between the old-style templating and the support for legacy versions of PHP, WordPress is tremendously hard to analyse. Looking at just the default installation, the main entry point (index.php) can load in over 300 different files via require. Few of those files can be examined individually.

To see why, here are three (perfectly valid) PHP files:

file1.php

<?php
require('file2.php');
require('file3.php');

file2.php

<?php
echo get_title();

file3.php

<?php
function get_title() { ... }

The contents of file2.php can only be understood in the context of their inclusion next to file3.php; otherwise we have a call to a function get_title() that we cannot resolve. So to understand this code, we need a system that’s capable of analysing function calls in the context of some root file.

Does WordPress need type checking?

You could well argue that it doesn’t.

However, type checking brings one massive benefit to old codebases: it shines a light on easy-to-overlook parts of the code. That’s vitally important when refactoring or updating code and trying not to break anything, and especially important for WordPress, with its focus on backwards compatibility.

It also provides hints for refactoring — if a decent automated tool can’t understand what you’re trying to do, chances are many humans can’t, either, so rewriting code to please a tool can also make it easier to read.

Type checking also helps establish a standard by which new developers must abide: you can say “I care about these sorts of issues in these particular files”, and then prevent anyone from inadvertently triggering those issues in those files in the future, even if those same issues are still present in other parts of the codebase.

Running Psalm on WordPress’s codebase

Wayne Gretzky’s father once told him “Skate to where the puck is going, not where it has been”, and most PHP type checkers agree — they’re designed to check PHP as it’s written today.

I’m generally more interested in the places the puck has been, and I’ve built a type checker called Psalm with that ethos. Psalm was originally designed to understand an older PHP codebase at Vimeo, which means that it’s able to cope with some of PHP’s more archaic development patterns. Getting it working on WordPress’s codebase required only a few minor fixes.

I wrote some additional project-specific code to assist in its analysis of the WordPress codebase — a stub file that helps Psalm understand the effects of the functions is_wp_error() and wp_slash(), and a plugin that provides different return types for get_post(), get_comment() and a few others depending on the arguments passed in the given function calls. Knowing more about those functions helps Psalm avoid hundreds of false positives in its output.

It also uses a config option to hoist all defined constants to the top of the page. This behaviour overcomes a weakness in Psalm when analysing functions — it assumes all functions can be called right after they’re defined, which is sometimes not the case (such as in this code sample).

Psalm has two entry-points into the WordPress codebase — index.php and admin/index.php. In each case it recursively analyses included files to understand what functions are callable.

What Psalm says

The project is generally sound, but a lot of the docblocks need work. As of publication date, it analyses 339 files, is able to infer types for 79 percent of their contents on average, finding 924 errors (it can find many more, but the current config disregards some of its stricter checks).

Example errors it found:

About 5 percent of the errors suggested by Psalm are false positives, caused by things like:

  • wpdb::query() cannot ever return true if the underlying SQL it’s executing begins with “DELETE …”, but Psalm sees the general return type of int|bool and so complains when wpdb::delete() calls the query() method and declares a return type of int|false.
  • Psalm thinks get_comment() can return null here, but the logic above that line make it a practical impossibility.
  • The getid3::analyze() method uses a classgetid3_lib that only becomes available inside getid3’s constructor, and Psalm cannot guarantee that analyze is called after the constructor has completed executing.

All of these false positives (or failures in Psalm’s analysis) can be solved by making small improvements to WordPress to better accommodate static analysis tools.

See for yourself

If you have git and Composer installed, run the following:

git clone https://github.com/muglug/WordPressAnalysis
cd WordPressAnalysis
git clone https://github.com/wordpress/wordpress src
cp src/wp-config-sample.php src/wp-config.php
composer install
./vendor/bin/psalm

You’ll have to wait about two and a half minutes for analysis to run, depending on the speed of your computer. In the meantime, why not boil the kettle and treat yourself to a cup of tea? You’ve earned it.

On your return, you’ll see a mountain of issues Psalm has found. Feel free to peruse at your leisure.

Next steps

With these results in hand, there are a bunch of options for improving WordPress, from easiest to hardest:

  • Do nothing — usually my preferred course of action.
  • Create a very specific psalm.xml config that allows all the existing errors (for now), but prevents new ones (no code changes necessary).
  • Fix all the docblock issues, which means no potentially backwards-incompatible code changes, and allow all other existing errors in psalm.xml.
  • Fix all Psalm errors found with the current config, including some potentially backwards-incompatible changes after long discussions with WordPress contributors.
  • Fix the 6,220 issues Psalm finds on the strictest possible config (requires infinite free time and God-like control over the WordPress source code).
  • Rewrite WordPress in Haskell.

WordPress may be over 15 years old, but there’s no reason it can’t benefit from the same tools that newly-developed PHP apps use. And if static analysis tools like Psalm can help ease WordPress into more modern development patterns, the whole PHP ecosystem benefits.

Otherwise, there’s always Haskell.