Model for large applications

In previous article I wrote about issues of model in large applications. In following parts I will be describing how to solve these issues and how model could look like.

This model takes inspiration from bigger and older brothers like Entity Data Model (.Net) and Java Persistence API.

Principles

Basic principles of this model came from widely approved theories and patterns. Foundation stones are SOLID principles and basic design patterns like DDD, Factory, Strategy, Adapter and others.

This model is suitable for large & scalable applications, but it is not good for small ones. For these I suggest you to use Active Record as the easiest option.

Structure of model

Entity

Entity is domain object which is defined by its identity not attributes. So for example User is defined by its identity. If he/she changes his name it still the same person. And if we have two persons with same name, they are not the same person. Opposite of this is ValueObject which is described below.

Domain Object

Entity is basic domain object so it carries important data and contains logic to manipulate them. Entity can manipulate only with its own properties or linked entities through relations. Nothing else.

Every domain object must be in bounded context which can be represented by namespace and/or micro-service.

So for example we have entity Article and another entities Comment. Article can have methods publish() to modify its publish date and method deleteAllComments() to manipulate child entities. But don’t forget that deletion have to do Comment itself.

Creation

Entities must have public constructor without arguments. Passing array of data to constructor is not allowed. This is because very good practice tells us that “only arguments without object cannot exist, should be passed to constructor”. And entity can exists without data.

In schema we can see as well Entity Factory which should be global object. It is here for creating entities with data the right way. For example: EntityFactory can have method create which receives two arguments. First is class name of entity and second data. This method initiates entity and fills it with data. But it as well can use Object Repository. This depends on implementation.

Properties

Properties of entity must be private or protected and accessible only with getters/setters. This is necessary for validating correct type. Also it is required by Open/Closed principle from SOLID.

All properties of entity must be represented by Value Objects. Because validation, creation, import and export should be driven by these objects. For example we have entity User with email property. Entity knows only that the property is instance of Email Value Object. Checking if it is in valid format, has to do the object itself. This drastically simplifies entities, follows correctly Single Responsibility Principle and helps keep code DRY. This will help if we have for example Date Value Object in multiple entities. Because we can use the same one.

Null is not allowed as property. If property can be null then the Value Object must support it. In many cases you will have two types like Date and NullableDate.

Exporting and importing properties have to be driven by interfaces. For example mentioned Date Value Object should implement Importable and Exportable interface. Import method should be able to manipulate all possible formats. Export method can receive requested type as argument but there should be some default.

Identity

Because entity in difference from value object must have its identity there has to be method for getting it. For example getIdentity which returns value object of some id property.

In scalable applications we cannot depend on any sequence or such, because its hard to synchronize between multiple instances. This means that we have to use some UUID as identity.

Changes from outside of application

Nothing cannot change properties of objects outside of application. So it is forbidden to set anything like ID or timestamps in database. It must be done using default value in Value Object.

If there is necessary to use database stored procedures, then the result must be returned to application and then it stores it. This is because database must not know where to store it.

Relations

Basically relations are another type of property. So in previously mentioned example Article has many Comments. Comment entity has properties ArticleId and Article which is of type belongsTo. Article has one property “Comments” which is of type hasMany. All relation properties must be nullable and should be automatically lazy loaded in first use.

All of them should be bidirectional. It means that relation definitions should be on both sides. This makes querying much easier.

Also there have to be implemented some solution to load relation when requesting parent entity. This will allow move merging (joining) data to database which reduces number of queries significantly.

Definitions of relations must be minimal and cannot contain for example reference Id column, it has to be done by global mask. For example hasOne receives only 1 argument and that is class name of remote entity.

Relation types

HasOne [1:1]

The reference id is in the other entity, so same as in HasMany. Its inversion is BelongsTo.

HasMany [1:N]

Similarly to HasOne the reference id is in the other entity, but difference is that in this relation there can be more of others. Inversion is also BelongsTo.

BelongsTo [N:1]

Inversion of HasOne and HasMany. The reference id is in current entity.

BelongsToMany [N:M]

Is used on both sides of many to many relation. In these relation there are multiple reference ids in the middle. When there are some additional data, it becomes new entity with two many-to-one relations.

How joining tables are stored is not strictly defined. In cases when both entities have same handler, then it will be mostly in database. If not I suggest to use some key-value store with combined key.

Polymorphic Relations

Polymorphic relations are used quite often and entities have to support them. Because it is quite wide topic I will describe them in some later article. Until then you can see the basic concept and PHP example in Laravel Documentation. But be aware of all arguments they can receive. We told that in this model most of them are not allowed.

Keywords for these relations are morphTo, morphOne, morphMany, morphToMany, morphedByMany.

Expiration

All entities must have expiration and it should be controlled by Expirable interface. This is very important for any optimizations and caching.

How components handles this value is their responsibility. Gateway can use it as expire header, cache mapper as expiration and so on.

State

Every entity must have state to allow consistent manipulation. There are 4 states, same as defined in JPA.

  • new — entity is created, but wasn’t stored
  • synced — entity was persisted and has not been modified since then
  • detached — entity was modified since its persistence
  • removed — entity was set as removed

When entity is changed somehow, so state is “detached”, there should be protected information about which properties has been changed to prevent store conflicts.

Manager

Is responsible for selecting correct handler for given request and merge result if it needs multiple handlers. To process this Manager needs in constructor definition, which handler handles which entity.

Manager doesn’t have any crud methods or such, he is just forwarding these requests to handlers.

Because all requests (even read) must be in transaction, it is also responsible for telling Transaction Manager to open all necessary transactions.

As well as all other components which manipulates and carries data the manager must be queryable by Query Language.

Handler

Its job is to build right queries which strategies can use to query mappers. So handler have methods like get, find, save and etc.

It must have just 1 Read Strategy and 1 Write Strategy.

As well handler can contain scope aliases like latest() which sets order by CreatedAt and Limit to 1.

Read Strategy

It dispatches received read queries to all mappers it has loaded. In right order. So for example in case when we have 3 mappers (memory, cache, database) it first queries memory then cache and at last database.

But this order must be dynamic and using mappers query cost calculator as described below.

As part of result there can be suggestion for handler where it would be good to save the result. For example when read strategy founds data in last mapper it can suggest to write the value to previous.

One important thing is that Read Strategy can have different mappers then Write Strategy because some mappers can be read-only or write-only.

Read nor Write Strategy has any query methods, it just forwards received to mappers.

Write Strategy

Solves quite different issues then Read Strategy (RS) and that is why it is separated. Same as RS it has bunch of mappers to write to. But all writes should be done asynchronously and in background. This is possible only because we know that storage cannot change our objects.

It also doesn’t need to know costs. Because it is happening on background.

As part of request there can be list of suggestions to write to. But if Write Strategy doesn’t have such mapper it skips the suggestion.

Mapper

Is there to unify access to storages. Input and output for all mappers must be the same. But under the hood query is transformed for storage by Transformer. Conflicts are handled by Conflict Strategy which ensures that only correct data are written. And so on.

Mapper has to manage all of these things. So for getByPrimary example at first the query keys are translated by Transformer. Then mapper queries storage. Then result is sent back to Transformer which transforms it from some raw data to Entity and finally mapper returns the Entity back.

Query cost calculator

Every mapper must have method to say how costly is to query the request. This is important because different storages have different advantages. Key-value is fastest on primary key but slow on other wheres. Search is fast on full-text and so on.

Transformer

Is responsible for transforming queries from and responses to storages. For RDBMS it can transform CamelCase properties to snake_case column names. Cast properties to storage format and so on.

Because it is called when mapper need transform raw data to entity than it has to use Entity Factory for creating these entities.

Conflict Strategy

In most applications conflicts are not handled at all and it works as “last-write-wins”. But it is not suitable for very large applications. Conflict Strategy responsibility is to prevent conflicts at all and when they appear then solve them.

So in the update case it should for example filter out synced properties of entity. This will prevent collisions when one instance changed anything else in same entity.

On the other hand there should be some mechanism how to detect these collisions and handle them. And yes because it can be costly it must not do that and just let the last win. But we have to think about what is best in the current scenario.

Storage

Storage is mostly implemented as database connector or some abstraction library. This layer in most cases does not need single line from application developer.

But it doesn’t mean that we can ignore it. This layer has important influence to overall performance.

Extra

Value Object

Is simple object defined by its value. In it we should encapsulate all logic to handle the value. So for example when we have Email, then when we change it, its another email. And the validation and transformation must be encapsulated inside of its Value Object. More examples of this can be found here.

Events

Almost on every corner of whole application including model should be placed events handled by some emitter. Events could look like manager.request.received, manager.handler.found …. If it makes common sense then event can change variables. So manager.handler.found event can override the handler.

Query Language

In this type of applications it is necessary to have one simple “language” for querying all data sources. These data sources can be Collections, Manager, Handler, Mappers and such. Of course all these components should be implementing some interface like Queryable.

This language have to contain filtering (WHERE), ordering (ORDER BY), limiting (LIMIT) and embedding (JOIN in sql).

Embedding has to be handled with caution, because not all components can handle all entities. This is why all top level queries must be done against manager.

It also has to be completely storage agnostic. So it can be inspired by SQL but it cannot be similar.

Final

I believe that this model will help to design better and faster models. And as well to protect some new products from dead ends.