Objectively comparing ORM / DAL libraries

For many of us, developers, pattern of accessing external data (SQL) is a thing of preference:

$user->age = 20;
// or
query("update user set age []", 20);
// or
$user['age'] = 20;

We get used to different syntaxes and subjectively endorse it. By looking past the individual preferences, I wanted to create a “comparison criteria” that would help PHP community to objectively evaluate data persistence frameworks based on features. As I looked around I haven’t found anything, so I started to work on my own list:

https://socialcompare.com/en/comparison/php-data-access-libraries-orm-activerecord-persistence. I welcome my readers to help me populate and use the table. In this article I explain various features which I’ve used as comparison criteria.


Sections and Scope

Object Relation Mapper or more generally Persistence Framework introduces patterns in the way the application stores or retrieves the data. Often that is to separate physical implementation from the business logic.

I have defined list of “features” starting from very basic to the most advanced and grouped them up like this:

  • Support of Persistence Engines (SQL, NoSQL, API, Files etc)
  • Basic single-record operations (C.R.U.D)
  • Criterias and Conditions.
  • Custom SQL extensions — Query Building
  • Data fetching (iterating, arrays, hashes)
  • Single record loading/saving features
  • OOP patterns (Entity Hierarchy)
  • Relations / References
  • Cross-persistence joins, aggregation, reports
  • Behaviours/Hooks
  • Auditing and Event Sourcing
  • Integration and Meta Information

Evaluation

Each frameworks may be rated to have 3 levels of support for each feature:

  • Yes (green). Feature is natively supported and documented. Easy to use.
  • Partial (yellow). Can be hacked together. May requires add-on or some hacks. Limited documentation.
  • No (red). No documentation or support. Requires a work-around.

To illustrate, here are 2 examples and I’ll try to evaluate “MySQL” and “MongoDB”.

“Increment of single record value”.

  • MySQL — Partial. No dedicated method but can be done through expresison: set a=a+1
  • MongoDB — Yes. incr(a).

“Incremental of multiple records”.

  • MySQL — Partial. Still need an expression: set a=a+1 where ..
  • MongoDB — No. Must increment each row individually.

Support of Persistence Engines.

(NOTE: bold text correspond with the criteria in this comparison chart)

When we talk in general terms, Persistence can be pretty much anything. SQL databases are often chosen as a “persistence” and PDO support is a requirement, although some DAL can go beyond that and provide Transparent support for NoSQL. This means that the rest of your application does not have to change if you swap SQL storage for something else.

Vendor-specific extensions are possible, which means your application can explicitly ask for some feature that it knows to exist in selected persistence. For instance, the code may look like this:

if ($persistnce->supports('multi-row-increment')) {
$multi_record_set->incr('a');
} else {
foreach($record in $multi_record_set) {
$record->incr('a');
}
}

More on vendor-specific features later. Next, lets look if DAL can store data in Array. Although this may seem unnecessary, it is actually quite invaluable feature for Unit tests. If you can specify “database state” through an Array instead of importing it into your database, it can be a significant improvement in performance and clarity of your test scripts.

Ability to support RestAPI as persistence is another important factor which is very important for your modern microservices-based app. Your frontend-PHP may have to switch from using SQL directly to using middleware through a RestAPI. Will your DAL be able to accommodate such change without application refactoring?

Using JSON string/resource as persistence is quite useful, because it enables “nested” and “contained” objects, for instance you may load single record representing “UserBasket” from your database which will then contain list of added items serialized as JSON or stored as array data by the database server within your record. With the popularity of JSON-object NoSQL databases this is a major requirement.

Finally, with all the different persistences, DAL should be able to re-use your Model(or Entity) definition across multiple persistnces. Can you load your “Basket” contents from SQL, Array or MemCache without creating new entity class for each one?


Basic Single-record Operations

A bread and butter of ORMs is ability to load individual records, work with them and then save them back into persistence. Although a logical Record seem to map into a Record inside your database, the job of DALs is to enable various transformations.

Here is example of RedBeanPHP code:

$book = R::dispense( 'book' );
$book->rating = 10;
$id = R::store( $book );

Majority of the business logic code will happen with the $book record. How far can $book structure distance itself from the database structure? If I rename field “rating” into “book_rating” in SQL will my business logic code have to change too? What if I want to move “book_rating” into a different table? Can this all be done transparently?

RedBeanPHP by design does not support any advonced mapping at all, it is just syntactic sugar around your SQL table. A more powerful DAL will allow you to define various ways to re-map things, which is very important for safe database refactoring in larger projects.

Table Name Mapping is ability to link Object/Class representing your Entity with physical table or collection inside a database. With this, you can change name of your table in the database without affecting your application code. Similar type of mapping is needed at a field level. If you rename firstname to first_name inside your database, how many lines of code changes in your PHP app? Hopefully just one.

Since we talk about mapping, sometimes a physical field of SQL can be substituted by expression or a sub-query. For example, your “BasketItem” may have “cost” changed between expression “(qty*price)” and a physical field. Will your DAL be able to facilitate such a change transparently for the rest of the application? What about sub-query? Your “Basket” will have field “total” which may either be expressed as sum(cost) or be a physical field. Will decision to change how “total” calculates in your database affect your application code in more than a single place?

Type mapping allows user of a DAL library not to be concerned about how to handle DateTime format. In PHP we use DateTime for storing Date, Time and Timestamps which must be mapped into native time storage transparently by DAL, while handling timezones, daylight saving time and more:

$basket->created = \DateTime();

Another data type that DAL must handle is “money” or “currency”. In many cases applications will want to describe their own formats with a specific routine for storing and loading values of those user-defined types. Actually there are two ways to implement this. This can be either implemented within a Entity itself (e.g. setCreated(..)) in which case the implementation is not re-usable across multiple Models. The other, more preferred approach is by using mutator or typecasting, which enable global use of our new type.

When we load and store data, sometimes we would want a certain field/property to be stored inside a different table (or collection). Mapping field to related Entity is not same as Mapping field to related table, because Entity is defined at “Domain Model” while table is a persistence detail. Next example defines field country as a mapping to “country_id->name”. The class Country() can further map the “name” field into something else. Here is how Agile Data imports fields on Domain Model level:

$user->hasOne('country_id', new Country())
->addField('country', 'name');

It’s more commonly for DALs to map fields into a physical table/field or only support “calculated” fields in PHP (which qualify only as a Partial support).

Entity’s field can also be mapped into a sub-query of related entity. Again, this is different to mapping to a physical resource, because technically aggregate functions and aggregate field can abstract complexity:

$basket->hasMany('Items', new BasketItem())
->addField('total', ['aggregate'=>'sum', 'field'=>'cost']);

A single Model/Entity can contain fields from multiple physical tables (or collections) if DAL can define entity to join multiple tables. This is also called “associations”. The important requirement with this type of mapping is to be able to support basic operations, so if you add a new record, DAL must automatically distribute it into 2, 3 or more physical tables and link them up together as per your database logic.


Criteria, Scope, Condition

So far I’ve looked into property/field mapping, but there is also row-based mapping. The best example of this is ‘soft-delete’ a technique where instead of physically deleting a record, it will be flagged as deleted. DAL implements this in such a way where deleted records become ‘inaccessible’ by the business logic:

$book->load(10); // record exist with "deleted=1"
// throws exception NotFound (404)

But there are more uses for scoping. For instance if your application works with multiple user data, you may want to scope your Models to only be able to access records of a currently logged-in user.

$order = new Order($db);
$order->load(10); // exist, but belongs to other user
// throw exception NotFound (404)

Despite the fact that many DALs support soft-delete, very few would support user-access scoping. It becomes part of business logic to verify if “Order” belongs to currently logged-in user and therefore a poteential for a human error.

There are two ways how Model-level criteria (or scope) is implemented. It can be set statically by a constructor of a model (which works with soft-delete) or it can be done dynamically when Model is associated with persistence (which can enforce “current-user access” restriction).

Defining global criteria inside persistence is a valid and a very powerful pattern. This way your entire application can access database only as a specific user only.

Next area to look into is enforceability of criterias. DAL may restrict your application from accessing records of another user, but what about adding new record?

class LoggedUserOrder extends Order {
function init() {
parent::init();
$this->addCondition('user_id', $this->app->user->id);
}
}
// elsewhere:
$order = new LoggedUserOrder($db);
$order->load(10); // record exist, but belong to other user.
// throw exception
// or 
$order['user_id'] = 10;
$order->save(); // not the ID of user that is currently logged
// throw exception

Can your DAL apply restrictions in such a way so that Business Logic will be prevented from loading, saving, adding or deleting records that fall outside of the scope? If yes, I call that criteria compliance.

Domain-level criteria means that the field inside condition is not a physical field but rather a “domain-model” field. Using our cost that is defind using expression, here is example:

$basket_item->addCondition('cost', '>', 10);
$basket_item['qty'] = 2;
$basket_item['price'] = 3;
$basket_item->save();
// Throws exception, does not meet criteria compliance

Technically field that is used inside scope criteria could be mapped into anything that DAL supports in the previous section.

Further from here, as a vendor-specific function of a SQL expression, can we use that as a scope condition?

$user->addCondition('[name] = [surname]');
$user->load(10); // user exist, but his name does not match surname
// throw exception.

Finally, dynamic criteria would be “Yes” if it can be applied dynamically, when you start using your model/entity by relying on some application state (e.g. currently logged-in user) or even defined inside your busineess logic, like I was using in my last few examples. (as opposed to defining it statically in your YAML).


Query Building

Quite often, even though your application uses cross-persistence DAL, you know for certain that you are dealing with SQL so you want to make use of some of it’s wonderful features, such as multi-row update:

ItemTable::where('item_type_id', '=', 1)
->update(['colour' => 'black']);

It this code method where() responds with a Query object, which can be manipulated, but it will emit SQL query at the end. Even though this can cause problems (for example bypass validaton on “colour”), most modern ORMs will give user ability to make use of Query builder which originates from the Model.

The benefit of emitting query from a Model itself has some benefits. For one, it implies name of the table. It may also use field mapping inside arbitrary raw expressions:

$query = $basket_item->action('update');
$query->set('cached_cost', $query->expr('[cost]*1.2'));
$query->update();

This approach is used by Agile Data to combine power of SQL expressions with pattern of Persistence separation.

A similar code for Eloquent also supports raw expression although it cannot reference domain model field in a same way as Agile Data.

ItemTable::withUserCriteria($user_id)
->update([
"cached_cost" => DB::raw("`cost`*1.2"),
]);

Further looking into nested queries — can query emitted by one model be used inside another as a nested query?

$cost_expr = $basket_item->action('field', 'cost');
$query = $stats->action('update');
$query->where('id', 123);
$query->set('lowest_cost', $stat->expr(
'min([])',
[$cost_expr]
));
$field->set('highest_cost', $stat->expr(
'max([])',
[$cost_expr]
));
$query->update();

This code, expressed in Domain Logic instructs DAL (in this case — Agile Data) to calculate highest and lowest cost from $basket_item and store them into stats model — all in one SQL query.

Next — what parts of query can be modified by a Query Builder: “field”, “where”, “limit”, “order”. SQL language has extensions. Will query builder allow you to set partition property or another vendor-specific attribute?

In my last example I used action('update') which emits a different type of query. Can Model emit update, delete and even insert queries? What about query types that may be extensions of a specific SQL vendor such as truncate or replace?

There are slight dip into schema-related features such as ALTER or DESCRIBE. Although those may not be needed in regular operations, does DAL provide similar abstraction for table alterations? In other words — migration aside, can I create a query through a DAL which would change type of my SQL table column? This can be useful if my database have to change from inside business logic. (e.g. create a new table for a new user).

The next and quite challenging feature is a full support of “OR” conditions. While “AND” is additive and you can write code like this:

$query->where('gender', 'M');
$query->where('age', '<', 20);

if you want to express the same thing using OR query, it becomes difficult. There are two challenges with the OR queries.

  1. Transposition. If you are to re-arrange the order in which you have applied “OR” conditions, will this affect scope?
  2. Nested. What if you want to include a or (b and c) can this be achieved?

There are two ways to implement full support of OR conditions. First is by expressions:

$query->where('[gender]=[] or [age]<[]', ['M', 20]);

The challenge here is to be able to specify Domain Model fields and inject values. The other option is the support for nested “or” expression objects:

$q->where(
$q
->orExpr()
->where('a', 1)
->where('b', 1)
->where(
$q->andExpr()
->where('a', 2)
->where('b', 2)
)
);

As long as application of OR conditions can be transposed and they can be nested, it qualifies as a “Yes”.

The final criteria — use of domain logic in multi-record operation is ability to inherit mappings and Condition defined in the model down to the emitted Query. Assuming that we use “scoping” to restrict access of DAL to only those records that are owned by currently logged user, will this code comply with the rule:

$query = $basket_item->action('update');
$query->set('cached_cost', $query->expr('[cost]*1.2'));
$query->update();
   // only updates record of currently-logged user

Data Fetching

Ability to retrieve data from the database is a classy feature. Suppose you want to export data into CSV, you would need your data as 2-dimensional array. Or, perhaps you want to have id=>row[] as associative array?

As the data comes from the database, it may reed some type-casting. For instance “SQL” stores boolean values as 1/0, but in PHP they are expressed as true and false. If DAL supports typecasting, then you will get native PHP types inside your array.

As we are dealing with large data amounts here, ability for DAL to only query for the fields that you need, is important.

If you were able to export data like this, would there be a similar action of Importing data where you can feed exactly same 2-dimensional array and have your data populated in identical way? Finally, if you do perform export / import like this, will it respect all the criteria and mapping rules which you have defined previously on the Domain Level?

Next I’m going to look at the ability to fetch individual horizontal row of data. What about fetching one column only? How about only fetching a single value of a specific row and a specific column? The important requirement for DAL here is to only query for the information you need. Fetching all the data then filtering one value does not qualify.

Iterator is another important requirement for DAL, fetch data row-by-row and yield it so that other code can pick it up without storing any extra results in the memory. Finally, all the persistence mapping rules are good, but if you seek performance, can you bypass the mapping if you wanted?


Single-record operations

The nature of ORMs is that data will reside inside a PHP object at least some time before it’s sent back into persistence. Let’s review how data is loaded and stored.

The most fundamental requirement is to load the data by ID of that record. This implies that DAL knows which field you consider “id” and that you have means to map that ID field into a different column, such as “item_id”. Additionally DAL must support numeric and alphanumeric IDs.

Other option, is to load by other field. In this case “id” will be extracted from the loaded row anyway and would be used to save the data back to avoid overwriting multiple records.

In some cases, you may have a table which does not have a primary key. Can DAL still operate with such a table? Can DAL operate in read-only mode where it can’t change individual records but can still be used for iterating?

Since some models may have large number of fields, is it possible for business logic to specify which fields must be loaded? This implies that DAL is smart enough to recognize that some fields are “important” always and load them anyway — for example “id” field must be loaded at all times, or we wouldn’t be able to save changes back.

When changing fields in memory and then saving the back, DAL should recognize which fields have changed values (dirty fields) and only update those. This minimize database traffic and helps you not to overwrite other fields.

Final feature in this seciton is guarded fields. This prevents business logic from accidentally setting “important” fields such as user_id or is_admin or id by a code like this:

$model->set($_GET);

OOP Patterns (Hierarchy)

Object Oriented principle and specifically “inheritance” is something most developers are familiar with. But conventional SQL databases are not hierarchical. By mapping tables into Classes, there is an opportunity to supplement your database with OOP features.

Here is example:

class Admin extends User {
function() {
parent::init();
        $this->addField('admin_level');
        $this->addCondition('is_admin', true);
}
}

Here a new Model “Admin” will inherit all the field definitions and table associations from model “User” but will change it in 2 ways:

  • Scope of Admin is tighter, so it will only load records where is_admin=1
  • Admin model has access to one extra field: admin_level.

Having Admin model can be very helpful in the business logic, since it’s much harder to make a mistake that may result in security vulnerability if you forget to apply “is_admin” condition.

The next pattern is called Disjoint Subtypes, which can be used to extend your database capabilities. Here is example:

class Document extends Model {
public $table='document';
function init() {
parent::init();
        $this->addField('ref_no');
$this->addField('type', ['enum'=>['invoice', 'payment']]);
}
}

Nice, now you can work with documents, but what if depending on the “type” you want different rules to be applied? Those are “sub-types” of your Document and can be defined like this:

class Payment extends Document {
function init() {
parent::init();
        $this->addCondition('type', 'payment');
$j = $this->join('payment.document_id');
$j->addField('payment_type', ['enum'=>['cash', 'card']]);
}
}

Next, if we use “Payment”, it will inherit fields from “Document” but will add more fields. This time, however, we do not wish to modify structure of “document” table, so we join it with supplimentary table “payment” and store “payment_type” field there.

If DAL support disjoin subtypes pattern, it can be applied again and again on top of your existing data structure and objects without modifying them, which is often a requirement in large projects.

The final pattern is extending Model with table substitution, which can be used for added security:

class Admin extends User {
public $table='admin';
}

In this scenario a “Admin” will be identical to User in many ways, except that it will store data in a different table. This pattern, once again, can be used to avoid modification of “un-toucheable” code or tables.


Relations and References

This is a very exciting area for ORMs, because it introduces a lot of elegancy and much more intuitive code compared to SQL queries. I will be using the following scenario:

  • User
  • Basket
  • BasketItem
  • Item

Clearly, those business entities are related to each-other in some ways. User can have only one basket and basket belongs to one user only. Basket can have multiple Items and Items can be placed in multiple Baskets. BasketItem is an intermediate entity designed to connect Basket to Item, but it will also contain “qty” field.

DAL may allow developer to define relationships between entities. Typically that would be “hasOne” and “hasMany”.

When defining User and it’s hasOne(Basket) relation we would also want to specify that “User” entity would store Basket.id in “basket_id” field.

This is different for Basket, because it wouldn’t have “user_id” field, instead it should look for User with basket_id = id when you attempt to traverse it. Traversal strategy defines various ways how relationship needs to be traversed, specifying which fields must match.

Next pattern is about traversing referencs and “pre-fetching” data for related entities before you traverse them. Here is example of eager-loading feature:

foreach(User::with('basket')->where('is_vip', true) as $user) {
echo $user->basket->item_count;
}
// instead of n+1 without eager loading, will only perform 2 queries
// but may use more memory, depending on implementation

The other alternative to getting baskets for all users is Many-to-many traversal:

$user = new User($db);
foreach(
$user->addCondition('is_vip', true)->ref('basket_id') as $basket) {
echo $basket->item_count;
}
// will execute 1 query

In this scenario, we do not load individual $user value, instead we say that we want to get many baskets of many users.

The final form is a Deep traversal which can give us names of the items that specific user user have in their basket:

$user = new User($db);
$user->withID(10);
foreach($user
->ref('basket_id')
->ref('BasketItem')
->ref('item_id') as $item
) {
echo $item->name;
}
// will execute 1 query

Ability to traverse multiple references without pulling “id”s and feeding them as condition into a related entity, but rather relying on sub-query expression significantly saves number of executed SQL queries by DAL.

The final requirement for DAL is to still be able to execute above traversal code even if “BasketItem” is persisted outside of SQL. In practical scenario, you may want to use Redis cache to store basket contents for a busy site in order to minimize number of SQL database changes.

DAL should allow you to use the same code without ever exposing the fact that in order to arrive at the list of $item it had to jump over to Redis and then back into SQL. This feature is called cross-persistence traversal.


Cross-persistence joins, aggregation, reports

Here I describe some of the most advanced features DAL could implement and their practical examples. I am not aware of any DAL that can handle these requirements fully.

To explain Cross-persistence join imagine a site that implements a “message board”. On the Index page you want to list recently posted messages and their authors. Next to the name of the author, you want to indicate when author was “last seen online”.

Technically — you will use your main SQL persistence to store Messages, so your query would be pulling data from the SQL. The information about when user was online is stored in Redis. DAL should be able to pull information from both persistences and join them — in this case SQL first, then query only relevant records from Redis, preferably in a single operation.

The other example would be to show most-popular posts, where the popularity statistics is extracted from Redis, and then SQL is queried with a specific IDs to fetch the rest of information.

In order to support Cross-persistence join DAL must do the heavy-lifting and simply provide you with properly-ordered data stream.

Next, lets look at the UNION of existing models. In SQL UNION lets you query data from different tables then join them into one result set. The benefit of this is ability to us multi-threading and prepare data from multiple sources inside database much quicker than when using temporary table while also avoiding any memory or CPU spikes inPHP.

DAL must support ability to invoke UNION while translating all the Domain Model logic into queries. It should let you specify multiple models and combine their output:

$report = new UnionModel();
$report->addNestedModel(new Sale());
$report->addNestedModel(new Expense());
$report->addFields(['date', 'client_id', 'amount']);
$report->join('country', 'country_id')
->addField('country', 'name');
foreach($report as $row) {
.. mixed Sales and Expenses
}

It’s cool to see if Union can also be joined with tables or models.

Aggregation is a next step. Once the data is combined, DAL should be able to aggregate it by grouping on some columns and apply aggregation functions on others.

$aggregate = $report->groupBy(
['date', 'country'],
['amount'=>'sum([])']
);

The result of a grouping would be entirely new Model, it will have no “id” column and will be read-only but still something that can be iterated, contain fields, perform type-casting, etc.

Multi-persistence UNION would allow to simulate UNION in PHP memory if their sub-models are associated with a different persistences. Transparently.

Model to Model join is similar to UNION but instead of joining rows vertically it will join them horizontally. Unlike Joining table for the purpose of storing data, Model-to-Model join combines sub-queries on certain key side-by-side. DAL must implement this join without compromising consistency of each model data.


Behaviours, Hooks and changing default operations

Going back to our basic CRUD operations, developer should be able to affect how DAL is executing them. Ability to specify hooks before/after operation can influence action in various ways:

  • Prevent default operation (delete) and substitute it with another (update).
  • Compliment operation (update) with another one (write audit log)
  • Affect operation (insert) by modifying persistence flags (insert ignore)
  • or Apply (ACL) restrictions based on which fields were modified.

Finally since we are talking about Hooks, it should be possible for Developer to add more hooks, such as “beforeArchive” then use them.


Auditing and Event Sourcing

When user opens your web application or calls API you may keep an Access Log consisting of requests, times and query parameters.

DAL has opportunity to introduce transparent logging of it’s own, registering all the operations that will affect the database. The first requirements is for DAL to enable global audit of all database access for a specified persistance.

If produced log contains enough information to fully re-create your database in either direction: (REDO and UNDO) from the stored old/new values, then it also implements a pattern of Event Sourcing. It’s important for a DAL to handle type conversions properly in order not to loose any values.

Another important quality for log is to track which actions were triggered by business logic and which were triggered as a response to other actions. I’m calling those reactive actions. For example action of deleting “Basket” will recursively delete “BasketItem” records. This may result in multiple entries in the AuditLog (each basket item + basket), but ability for DAL to associate reactive actions with the main one through a hierarchy will help to keep the log clean (by hiding reactive actions).

DAL should allow AuditLog to be stored in any persistence, not only dump CSV into the log file and offer ability to add additional fields into log and even perform logging of custom events. Once log is created, DAL can give you ability to retrieve historical records that correspond to your existing record of a specific model. This gives application user to see “modification history” of their data.

Another use of Auditing is adding columns into your tables such as “created_dts” or “modified_dts” which will be automatically and transparently updated by DAL. Again — through global application and association with application, developer can also record user_id/IP of who performed the database change transparently.

Last type of auditing is creating “revisions” from records. E.g. when you update your “DeliveryAddress”, it creates a new record, archiving the old one but keeping it in the same table. This way other records will relate to a “versioned” historical data and new records will use most recent delivery address.


Integration and Meta Information

DAL knows a lot about your business entities. For example, it knows which table they are stored in. It also knows which fields are there in your business entity.

But it’s possible for DAL to know even more about each individual field. For example, type, caption, serialization settings and other meta-informaiton can be provided for each field by DAL. Doctrine describes this information through annotations while other ORMs use hints and properties. Meta-information is useful in integrations, for example, if you are building generic UI or generic API routines.

DAL may have some UI extensions, such as admin systems, CRUDs or Forms that pick up the field meta-information and use it to create UI out of the box.

Similarly API extensions are libraries that are designed specifically to work with DAL as a gateway by using meta-information and populating API structure with minimum amount of code.


There may be other features/criteria which I have missed that can be evaluated on when comparing Persistence Mappers. If you have any suggestions, I encourage you to contribute to my comparison:

https://socialcompare.com/en/comparison/php-data-access-libraries-orm-activerecord-persistence

but $_SESSION, file or S3 can also be viewed as persistence.

  • Implementation of PHP’s PDO comes as a requirement for most Data Persistence frameworks.
  • cannot store data in Session or Amazon S3, but some DAL go above PDO.