You may have heard that Zerodium, an exploit acquisition platform famous for offering huge payouts for top tier vulnerabilities, is paying up to $300,000 for remote code execution flaws in Wordpress. If you’re anything like me, you may be reacting with some level of surprise, perhaps even some increduility. Websites built on top of Wordpress aren’t typically known for astounding security.
In fact on the contrary, PHP applications in general seem to be plagued with a perception that they’re insecure in part due to the platform they’re built upon and the rest due to generally poor programming practices that surround it.
The $300,000 price tag here though implies that this perception is incorrect — after all, unless we’re looking at a severe disconnect between the market and reality, the flaws should be very hard to find and exploit. Perhaps even as difficult to find and exploit as those in security hardened systems such as browsers or mobile sandboxes.
Recently I decided to dive deep into the Wordpress internals to see why the price tag is so high. What I found has made take a second look at some of my preconceptions about what secure systems look like — but not for the reasons you may expect. Here are my findings.
When you first start reviewing Wordpress you’ll be immediately hit by what looks like weak system design and poor programming practices.
Developers writing SQL queries in Wordpress involves, in many cases, string concatenation and that developers remember to sanitize their inputs.
The following is an example in the class
WP_Term_Query which is used to query the Wordpress taxonomy, a feature that offers Wordpress developers the ability to create custom categories to describe posts, but this pattern is extremely prevalent in the Wordpress codebase. In the example below you’ll see that a
LIMIT SQL clause is being constructed using string concatenation:
Eventually the various clauses are combined further down in the file to form the full SQL statement:
The next question is to establish how
$number are validated. They’re validated further up the class in the
parse_query function which converts the input into an integer.
parse_query is called within the main entrypoint
get_terms. This means that the validation will always be called no matter how any developer integrates with this class.
So what’s the problem with this code then? We’ve established that validation works in this case and can’t be bypassed, everything is ok — right?
What is a Secure System?
Well the answer to that question gets to the core of what it means to design a secure system. Perhaps we can define a system as secure if security researchers are not uncovering vulnerablities. It looks like the last publically disclosed instances of SQL injection being uncovered in the Wordpress core was in 2017. So in this sense, perhaps Wordpress is sufficiently secure against SQL Injection vulnerabilities.
Although it doesn’t mean that a security flaw will never emerge. Systems are constantly added to and removed from, so a vulnerability that doesn’t exist today may exist tomorrow, and vice versa. Perhaps a better definition of a secure system is one where the emergence of a flaw is less likely to lead to complete compromise of the system, i.e. there are sufficient mitigations in place. Under this definition, Wordpress, in my opinion, isn’t sufficiently secure against SQL injection. For every endpoint, any mistake in type validation could potentially lead to a serious vulnerability.
A secure system is one where the emergence of a flaw is unlikely to lead to complete compromise of the system.
The bigger issue with how Wordpress implements SQL queries is that developers and review processes are fallible. By relying on string concatenation and userland validation, every line of query code becomes security critical — since only one small mistake, one oversight, is all that will be needed for complete compromise of the database.
Other platforms and frameworks generally isolate this functionality into a specific place, a core abstraction that can be used by developers to implement queries. The result is that only one area of code must be tested and secured. For example in PHP’s symphony framework, an ORM is implemented. The limit clause creation would look something like this:
$entities = $em
->createQuery( '...' )
In this case, even if a developer forgets to convert the string input into a number, perhaps the request will fail but no vulnerability arises.
All of this so far sounds fairly damning and critical of the Wordpress developers, but before going on I think it’s important to reevaluate our own views and preconceptions in light of our findings. After all, my review did not uncover any SQL injection vulnerabilities, nor have there been any publically disclosed in quite some time. Perhaps the reality is that these flaws are of such a trivial nature, are so easy to review for, grep for and fix, that the risk is just theoretical. Perhaps the lack of any core abstractions, i.e. an ORM, makes the code easier to implement/review and therefore critical flaws have nowhere to hide except in plain sight.
The argument sound reasonable, but am I convinced? Not really. Are you? I’d love to hear your thoughts below.
A Brief Aside on Deserialization
I explored many other options in the time I allocated to Wordpress, in particular I looked at how values are serialized/deserialized in the Wordpress database. You’ll find that PHP
serialize is used in a number of key places in the core application. Some notable examples of this are the various
_meta tables and the options/transient api which is used for managing settings and cached data.
When reviewing how metadata values are retrieved, I found the code below interesting — why would a value only “maybe” be unserialized?
The answer is that when data in written in, it’s only serialized in the case that the value is an array or an object, or it passes the
is_serialized case. In other words, Wordpress wants to support writing scalar strings into the database —possibly to avoid the overhead of deserializing every entry in the metadata/options/transients tables.
If the alarm bells are ringing at this point, you’d be thinking along the same lines as me — is it possible we can find a string that will be written as a string but deserialized as a PHP object?
The answer was no, since the logic is symetrical enough to avoid exploitation. Even with complete control over the string value being written to a metadata/transients/options table, this is currently unexploitable due to this. Interestingly it looks like 12 years ago the developers of wordpress accidentally introduced an arbitrary deserialization flaw with the release of wordpress 3.6.1. This is ancient history in terms of security developments, so I don’t regard this as something to hold against the wordpress developers.
So why am I telling you about this if it is not a flaw? It’s notable for two reasons. The first is that it gives us a target for database writes if we find an SQL injection vulnerability. If we can combine this with the right deserialization gadgets, it could potentially be leveraged to achieve RCE.
The second reason is that PHP’s native serialization is inherently risky to use. It doesn’t implement any functionality that validates the serialized value was issued by the application or another known actor. This means that if an attacker can write something that looks like a serialized PHP value, it will be deserialized without any verification. Since most Wordpress applications include third party plugins, some of which will have a poor security posture, it’s certainly possible that attractive deserialization gadgets exist in many installations. Any developments in this space will be crucial to keep an eye on.
Anyone who knows anything about Wordpress, will have heard about Wordpress plugins. Wordpress is built around the concept that third party developers can write extensions to the Wordpress core. These plugins can then be installed by non-technical website administrators via the wordpress admin UI, if it is enabled. PHP plugins are a trope for achieving RCE in both the real-world and CTFs.
However, we’re not interested in admin functionality since Wordpress adminstrators are already extremely highly privileged. What I found interesting is that Wordpress has a secondary mechanism for loading plugins named must-use plugins which are loaded on boot regardless of whether the plugin is enabled by an admin member of staff. In the image below you can see the bootstrap code in
wp-settings which is included in all wordpress pages.
The only requirement is that a PHP file is dropped in the
WPMU_PLUGIN_DIR folder which is is the
wp-content/mu-plugins folder by default.
Whilst this alone isn’t a vulnerability, since as an unauthenticated user we have no mechanism for writing to the
WPMU_PLUGIN_DIR directory, this directory does offer an attractive and relatively stealthy alternative target for wordpress installations with plugins installed. If a situation arises in which you have an arbitrary filesystem write, but can’t write to the
/var/www directory, can’t get the web server to execute your script as PHP, or need persistence, this could be a potentially useful alternative.
The Traditional Path to RCE
The analysis I’ve done so far has looked at a very traditional route to RCE. Putting it all together, if a bug does emerge that gives us SQL Injection, the path to RCE could look like one of these scenarios:
- SQL Injection -> Auth Bypass -> WP Admin -> Admin Bugs/File Upload
- SQL Injection -> Deserialization Attack -> Secondary Flaws
What is the $300k Price Tag Really For?
I do truly believe that Wordpress developers are fallible and that their review process could fail at some point. It is totally possible that one of their developers commits some code that doesn’t sanitize some input correctly and suddenly we can take complete control of the database… or some new functionality is added which offers attackers an opportunity to deserialize arbitrary PHP objects… but the simplicitly of the system, combined with the number of eyes on pull requests, potentially makes it less likely. The price tag assigned to wordpress RCE right now also creates a scenario in which there are many security reserachers looking at the wordpress core, which could mean that bugs of a trivial nature won’t remain hidden for very long.
The key stipulation in Zerodium’s $300k offer is that they are explicitly looking for preauthentication exploits, which in the case of Wordpress creates an incredibly narrow attack surface. However, this doesn’t mean the story has to end— there are other paths we can consider. Even with a very narrow attack surface there exists one large target I haven’t mentioned yet:
The final area which I think is potentially a fruitful direction to focus research efforts is
libxml2. It’s a widely used C library that PHP integrates with in order to parse and query XML documents. In the case of Wordpress, it’s used in many places throughout the application, but the
xmlrpc integration is relevant for preauthentication exploits.
libxml2 is a software library for parsing XML documents.
So why is
libxml2 notable? I think what’s notable about it is that parsers are notoriously difficult to implement and even more so in a language like C without memory protection. I invite you to browse the code in
libxml2 to get a sense of how it’s implemented, but at a glance it seems complex. For example, the bulk of the parsing logic exists in a 10,000 line long file. Is it easy for the developers working on libxml2 to reason about and understand the consequences of all changes?
In 2021 alone we’ve seen a use-after-free and an out-of-bounds read, so memory corruption flaws do not exist solely in the realm of imagination. However, the reality is that RCE via memory corruption today is complex, there are many countermeasures that must be bypassed which means that specific gadgets are needed, which could simply not exist. Furthermore just because libxml2 is a complex procedure and it’s written in C, doesn’t mean it’s definitely insecure… but the risk is certainly non-zero.
Overall is $300k a sufficient incentive to cover the research time involved in finding and building an exploit chain for libxml2? It is a large commitment in terms of time and potentially a fruitless one, but overall my opinion is that it probably is.
At some point in the future I’d love to devote some time to reviewing libxml2 in detail, but it’s beyond my capabilities at the moment. However, a determined researcher with sufficient time could potentially uncover useful gadgets here and you never know — maybe the set needed for RCE currently exists, maybe not.
Overall my takeaway from reviewing Wordpress is that simplicity is a strong defence. Security flaws thrive in complexity and when the requirements and the implementation are both simple, the number of vectors of attack are small. If Wordpress substantially grows in complexity or new large features are added, I do predict we’ll see severe vulnerabilities due to the lack of a secure-by-design core. However, if Wordpress remains much the same as it currently is — perhaps the simple approach taken by the developers is ok, at least for now.
I hope you enjoyed my article.
All the best,