RFC for a Secure Unserialization Mechanism in PHP

PHP serialization/unserialization has several drawbacks ^1.

On the serialization side, the Serializable interface:

  • breaks references inside serialized data structures;
  • delegates the responsibility of the serialization format to its implementations, to the detriment of optimized formats that e.g. igbinary provides.

On the unserialization side:

  • security exploits have been demonstrated when using unserialize() on user-submitted data;
  • serialized string referencing missing classes create placeholder objects of type PHP_Incomplete_Class, which behave in an unusual manner and most importantly break the semantics of the original structure.

The root of these security issues is that creating objects out of serialized strings can led to code execution, namely of the callable defined by the unserialize_callback ini setting and/or of the __wakeup(), unserialize() and/or __destruct() methods. The first three are part of the typical unserialization lifecycle: a security issue caused by them would be the responsibility of their authors. But __destruct() is much more nasty: authors usually don't think of it as an attack vector and thus fail to implement needed safety measures (which could e.g. consist of throwing an exception in a __wakeup() method).

To mitigate these security issues, the unserialize() function handles an allowed_classes option since PHP 7.0. Thanks to it, Serializable allows filtering the allowed classes in the subgraph managed by objects that implement it. This feature is only a mitigation because not all use cases know all the possible classes beforehand.

Proposal

  • handle a new __serialize(): array method, replacing __sleep() and Serializable::serialize() when implemented;
  • serialize the returned array using a new S: type (e.g for an object of class Foo whose __serialize() method returns [123]: S:3:"Foo":a:1:{i:0;i:123;});
  • forbid using C: or O: for classes implementing __serialize();
  • handle a new __unserialize(array $data, array $nested_objects): void method, replacing __wakeup() and Serializable::serialize() when implemented;
  • have $data set to the unserialized value;
  • for validation purposes, have $nested_objects contain the list of all objects in $data, excluding those already inspected by nested implementations of __unserialize();
  • have the unserialize() function handle a new validation_callback option that would accept a $nested_objects argument with same semantics as above;
  • have the PHP engine disable any destructors found in the unserialized value whenever the unserialize() function throws any Throwable or terminates the script execution (alternatively, if disabling destructors is not technically possible, the engine should empty all properties of unserialized objects.)

Expected benefits

  • fixing compatibility with soft and hard references;
  • moving the responsiblity of the serialization format to the outside of the userland serialization steps;
  • same or higher validation capabilities of the unserialized objects/classes;
  • ability to reject PHP_Incomplete_Class instances independently from the unserialize_callback ini setting;
  • higher security by not calling destructors on any early termination of unserialize().

Extra considerations

The global unserialize_callback ini setting and the related PHP_Incomplete_Class objects could be left unchanged. But we could also take this RFC as an opportunity to make enabling the validation_callback option also disable them and always throw a specific type of Throwable instead.

As described before ^2, having __serialize() and __unserialize() be magic methods has a distinct backward compatibility advantage. For this reason, this RFC doesn't mention any new interface that implementations should use.

Instead, the PHP engine should have a rule that checks that both methods are defined at the same time (implementing only one of them would make no sense) and that they have the expected signature.