Pragmatic approach to reinventing ORM

It’s been over a year, since I posted my first highly debated post on Reddit:

Reinventing the faulty ORM concept.

Gathered information helped me design and implement an alternative pattern to ORM with many important advantages and that has reinforced my belief that ORM is faulty. Few words about me:

  • I have actually implemented a proposed alternative in PHP and released it under MIT license. Resulting framework is increasingly used in production projects by me and 3rd parties.
  • My initial concept dates back to 2011 and it has gone through multiple iterations and refinements.
  • I believe in progress and choice. Agile Data is much better at expressing database interactions on a high level while ORM implementations are better at… being familiar.
  • As the author I can’t help to be slightly biased :).

In this article, I’d like to look at:

  1. Why ORM is important?
  2. Why ORM is broken?
  3. How Agile Data differ in approach?
  4. How Agile Data qualify production/enterprise use?
  5. Notes on architectural decisions in Agile Data (OOP vs Decoupling)

What significant features ORM brings to your project?

Unlike Java developers who wholeheartedly accept patterns such as ORM or Event Sourcing, PHP language is used by a different developer audience. For many it’s to “get the job done”. Quick and simple solutions are preferred over big and enterprise-ish architectural designs.

Although I’m not a fan of ORM, I think they bring some important qualities into applications and before dismissing ORM pattern I’ll extract some of the benefits:

  1. Project that use ORM has better concern separation: database; business logic; presentation. In teams different developers can focus on data loading/storing/caching/optimizing; business logic/unit/integration tests; and presentation (UI and API interfaces).
  2. ORM helps you refactor. Sometimes legacy software would have a terrible database design. The reason is often due to “new features” being added on top of software which must not be touched. After years of maintenance even renaming a table column would be a significant refactoring project. Splitting SQL table into two separate tables or moving it into NoSQL might be impossible within reasonable budget. With ORM any of that can be done easily without major risk to the project.
  3. Existence of ORM introduces “order” into project which makes it possible for add-ons and extensions to add additional features, such as audit or soft-delete features in a plug-and-play way.

So for any Database Access Library to provide a comparable alternative to ORM, they should at least:

  • be database agnostic;
  • separate persistence logic;
  • be systematic and extensible;
  • allow database refactoring;
  • simplify use and help prevent errors.

Many ORM PHP alternatives that I’ve looked at do not even offer these basic features. For Agile Data to qualify, I had to make sure that it ticks all the mandatory boxes first.


Why ORM is broken?

I believe there are 4major areas in which all of ORM implementations have significant problems:

  1. ORMs suck at aggregating or building report data. Many projects that have to follow a very strict dependance on ORM would still use stored procedures or raw queries to extract aggregated report data. DQL is also — only a partial solution to this problem.
  2. ORM level the ground for the databases. If your database has a unique feature such as ability to perform full-text or geo search, join tables or use built-in expression language — ORM does not offer ways to use those features. There are work-arounds such as “query” builders that integrate into ORM, but they disregard isolation of persistence logic. They are also not part of ORM itself and cannot be fed back into ORM.
  3. Performance of ORMs is quite poor and not because of technical implementation, but rather due to logical design. Problems such as (n+1) and huge arrays with “id”s are waiting to blow under the hood of your application and require you to tinker around to find and fix them. Similarly ORM does NOT help to reduce number of queries and amount of data sent or retrieved form the database.
  4. Top-level integration of ORM is quite bad. I haven’t seen a piece of PHP code that would work well with arbitrary ORM model / entity. For example, Sign-up form implementation in Web Apps may have to spell out all the fields and even work hard on providing data for those drop-downs / auto-complete callbacks.

The problems that I have listed cannot simply be “fixed”. Author of a pattern has a choice to select what concerns will be abstracted from the developer and where developer must pay close attention.

By design ORM tries to abstract “database” operations with a code like this:

function getBasketTotal(User $user) {
$basket = $user->getBasket();
$total = 0;
foreach ($basket->getItems() as $item) {
$total += $item->cost;
}
return $total;
};

This code would work fine with a zero-latency database, but in reality queries take time and fetching and processing unnecessary data take toll on CPU and Memory of your application.

For comparison Agile Data defines pattern differently, enabling code like this:

function getBasketTotal(User $user) {
return $user
->ref('basket')
->ref('Items')
->action('sum', 'cost')
->getOne();
}

Capable databases will implement ‘sum’ aggregation over a specified field while making sure not to drop you out of Domain Model logic.

A special mention to Query-builders:

$user->getBasket()->sum('cost');

Although his code looks similar, it’s defined through a “Query Builder” which, for example, implies that ‘cost’ is a physical field inside a table. Agile Data does not require developer to know specifics of ‘cost’ field, which can be defined through “join” or “expression”.


How Agile Data uses different patterns to avoid ORM problems?

The entire system of Agile Data works together to offer a great database abstraction experience, but whet I’m asked for the major differences, I offer this list:

  1. Agile Data introduces a Rich Field object. This allows Model fields to be smart and perform mapping to expressions, type-casting, join-mapping, selective field loading and support for dirty field values. On top of this each field contains meta-information making it possible for developers to create generic routines that work with any Model regardless of persistence or defined fields.
  2. Model class in Agile Data is very different. A standard ORM entity object maps into a single record. Model object maps into set of records. This natively enables further use of many-to-many referencing, multi-record updates, aggregation and expressions through references.
  3. Actions — although similar to “Query Builder” objects can be re-supplied back into ORM. This allows, for example, to automatically create sub-queries for INSERT records to look up ID by a non-primary key.
  4. Simple and straightforward design of Agile Data is not tampered with compatibility or legacy decisions and can be easily learned by someone who is new to programming. What’s more important is that many students who learn Agile Data do so without even knowing SQL language.

As example, suppose we want to define relation between “Contact” and table “Country”. The syntax is very simple:

$contact->hasOne('country_id', new Country())
->withTitle()->addField('country_code', 'code');

Many things happen under the hood:

  • Contact now has field country and country_code which will be automatically retrieved from Country table without extra queries or cache.
  • Field mapping takes place
  • Field country_code can be specified when creating new contact instead of country_id. This will not result in additional queries.
  • Additional logical rules (such as soft-delete) will be applied on the country transparently.

At the end, developer does not need to add any “hacks” to make this work:

$contact->insert(['name'=>'John', 'country_code'=>'UK']);

This transparent implementation is possible thanks to Smart Field class and DataSet mapping in the Model.


Using Agile Data in heavy production apps

Agile Data 1.0 has been released in 2016. Subsequently, version 1.1 have added strict type-casting support and since then no major change has been done in Agile Data. All the new additions and enhancements are backwards compatible and in most cases require no refactoring.

Instead, I have focused on coding add-ons:

  1. Audit — https://github.com/atk4/audit. Unlike similar add-ons for ORMs, my implementation is easy to integrate with the entire project and enables “Event Sourcing” support, “UNDO” and “REDO” functions that can be applied on any database operation, even if it had caused nested operations. Audit is entirely transparent, works with any persistence and can be tweaked or extended. Available under MIT license.
  2. Report-https://github.com/atk4/report has no similar implementation for ORMs. This is your Domain Model aggregation engine. Perform grouping, define aggregation functions, use UNION to join multiple DataSets into a new DataSet. As a result your report data can be generated without relying on SQL or semi-SQL language but simply through expressing aggregation actions. It is also important that “report” module carries out calculation logic on the server and not in PHP. Available under MIT license.
  3. Chart-https://github.com/atk4/chart. This illustrates how a 3rd party JavaScript chart library can be integrated with arbitrary data. Combining this with the report add-on gives you a very handy solution for your dashboards. Available under MIT license.
  4. For the interactivity — Agile UI comes with tons of widgets such as CRUD, Table or Form that can work with arbitrary model and are entirely database-agnostic. Also available under MIT license.

There are more add-ons are in the work, for instance, I wrote in my previous article about “password field encryption”.


But what about decoupling?

Frameworks follow different practices and they do so for a reason. If you are a Symfony developer, then you design and create mostly everything through interfaces. Agile Toolkit relies on a more fundamental patterns of object-oriented programming.

So instead of having “Audit” implement a certain interface, inheritance is used. This is by design and not because of poor coding decisions. I have made all the possible considerations and designed architecture with the highest efficiency, simplicity and extensibility as a goal.

I often hear developers raise a concern that this design would make their software “dependant on Agile Data”. Some other developers said they wanted to use “persistence” part but they don’t want their entities to inherit Model class.

While I could go into details and pros and cons, this boils down to the question of choice. There are a lot of brilliant PHP developers who keep adding new decoupled extensions for the Doctrine, but to the goals of Agile Data internal decoupling simply adds no benefit. At the end I didn’t want to sacrifice experience and comfort of existing developers just to win argument with those who are not going to use Agile Data anyway.

Agile Data library has only 2 dependencies and it can be easily plugged into any framework/project.

I am always open to suggestions and Agile Data. My choice of MIT license ensures that framework will remain available and growing. The team that I have put together and trained offer commercial consultancy for Agile Data.

I hope that my work would be useful to PHP developers and continue to grow and evolve into an even better product.