This discussion has indirectly resulted in a new serialization mechanism added to PHP 7.4. This post is now obsolete.
PHP serialization/unserialization has several drawbacks ^1.
On the serialization side, the
- breaks references inside serialized data structures;
- delegates the responsibility of the serialization format to its implementations, to the detriment of optimized formats that e.g.
On the unserialization side:
- security exploits have been demonstrated when using
unserialize()on user-submitted data;
- serialized string referencing missing classes create placeholder objects of type
PHP_Incomplete_Class, which behave in an unusual manner and most importantly break the semantics of the original structure.
The root of these security issues is that creating objects out of serialized strings can led to code execution, namely of the callable defined by the
unserialize_callback ini setting and/or of the
__destruct() methods. The first three are part of the typical unserialization lifecycle: a security issue caused by them would be the responsibility of their authors. But
__destruct() is much more nasty: authors usually don't think of it as an attack vector and thus fail to implement needed safety measures (which could e.g. consist of throwing an exception in a
To mitigate these security issues, the
unserialize() function handles an
allowed_classes option since PHP 7.0. Thanks to it,
Serializable allows filtering the allowed classes in the subgraph managed by objects that implement it. This feature is only a mitigation because not all use cases know all the possible classes beforehand.
- handle a new
__serialize(): arraymethod, replacing
- serialize the returned array using a new
S:type (e.g for an object of class
- forbid using
O:for classes implementing
- handle a new
__unserialize(array $data, array $nested_objects): voidmethod, replacing
$dataset to the unserialized value;
- for validation purposes, have
$nested_objectscontain the list of all objects in
$data, excluding those already inspected by nested implementations of
- have the
unserialize()function handle a new
validation_callbackoption that would accept a
$nested_objectsargument with same semantics as above;
- have the PHP engine disable any destructors found in the unserialized value whenever the
unserialize()function throws any
Throwableor terminates the script execution (alternatively, if disabling destructors is not technically possible, the engine should empty all properties of unserialized objects.)
- fixing compatibility with soft and hard references;
- moving the responsiblity of the serialization format to the outside of the userland serialization steps;
- same or higher validation capabilities of the unserialized objects/classes;
- ability to reject
PHP_Incomplete_Classinstances independently from the
- higher security by not calling destructors on any early termination of
unserialize_callback ini setting and the related
PHP_Incomplete_Class objects could be left unchanged. But we could also take this RFC as an opportunity to make enabling the
validation_callback option also disable them and always throw a specific type of
As described before ^2, having
__unserialize() be magic methods has a distinct backward compatibility advantage. For this reason, this RFC doesn't mention any new interface that implementations should use.
Instead, the PHP engine should have a rule that checks that both methods are defined at the same time (implementing only one of them would make no sense) and that they have the expected signature.
Originally published at gist.github.com.
Retweets welcomed at https://twitter.com/nicolasgrekas/status/1033043739671977985