Large Application Model issues


In all application we developers challenge quite similar things. One of most important is model layer as meant in MVC. There are a lot of different patterns which tries to solve this, but not all of them are suitable for large & scalable applications. Most of them are in light-weight languages like Python, Ruby, PHP or Javascript (Node.js).

Designing large & scalable application is very different than small one. For example when we are building some medium website with 20k views per day and about 20 model objects (entities) then the complexity is quite small and common framework is enough. But in case of API which serves 15k requests per minute and has over 200 entities the complexity rises.

And when you succeed then the rate can multiply and blow your system out. To be ready for that, all software architects must change the way of thinking.

Issues to challange ?!?!

In large scalable applications we have to solve completely different issues then in small apps. In small ones, there is most important fast and easy development in army of one or few at most. In large applications there are many developers and each of them has different skill, so we have to structure and standardize more.

Below are few most important issues which we have to solve in these situations.

Object (Entity) persistance separation

In large scale apps which becomes more then one huge overcomplicated block, we have to separate entity from its persistence. For example if we are building application with a lot of logic, then we should think about micro-services. This architecture pattern allows us to make smaller easier to develop & maintain services which each has just one responsibility.

In this scenario we have to make entity in one service and work with it in others. These services must not know about how is the entity stored, how it was made or anything like this. They just know it came from origin service.

Hundreds of entities

In really large applications we have lot of entities, maybe even hundreds of them. Telling each of them how it is stored is huge overhaul and led us to system inconsistency. We have to define one to few principles how are entities stored and follow them.

Multiple storages

If we have a lot of different entities, then we need to take advantage of different storage types. For example we want all instances of User entity to be stored with all theirs relations in some RDBMS, active users by their primary in Key-Value store like Redis, currently logged-in in memory and for full-text name search all of them in ElasticSearch. This can become a large hell with unmaintable different and detached instances.

This has to be solved by same principles as wrote above.

Single Responsibility Principle

As definition says every class should have a single responsibility, and that responsibility should be entirely encapsulated by the class we have to separate logic to single pieces.

For example when Entity have property updatedAt, then the it should be represented by Value Object which has encapsulated logic and validation or external validator if it becomes more complicated.

Publishing entities

Almost all current services have API so they have to publish entities. And there is a point when we should start thinking about multiple formats of communication. For example we can decide that MessagePack is much faster than JSON and we want to support it. In this moment, there have to be one robust concept of publishing entities. And using methods like toJson() breaks SRP because entity itself should not know anything about how it is published.

As well as this we will need similar import logic to recreate these entities.

Validation

All properties of entity should be validated, and if we don’t want to break Single Responsibility Principle then the property should have own validation method or validator.

Consistency of logic through system

If we are using lot of services with multiple completely different storages we have to encapsulate behavior to one place. And in most cases when we were building apps we used database sequences and default values to define properties of objects. This is totally wrong.

We need consistent behavior in whole system. We cannot have some objects without id or leaving other systems to modify our objects without any rules.

So to ensure that all behavior will be consistent we need all defaults in application itself. So when entity has property createdAt it has to be value object which has behavior that when it is created is saves current time.

Primary keys must be UUIDs because if we have multiple instances of applications or lazy synced databases than we cannot follow some sequence of integers.

Existing Patterns

Active Record

Since Ruby On Rails this pattern become very popular and most of us used it for applications where we should not.

What’s wrong with Active Record
Everything and nothing.

Active Record is simplest and good for most of small applications where we have one central component which rules everything. That is suitable for small but not large ones. Because this breaks SRP and when application become much larger and we have to scale it, it is basically impossible.

In large apps we have to focus more on complete architecture and think about micro-services or other techniques. In these architecture patterns we need objects (entities) separated from their persistence, because the entity lives trough whole system of independent services. And that is not possible with Active Record.

So for summary. Active Record is great if you are building some website or small app, but forget it for large ones.

Data Mapper

as meant in most cases, it is going right way and tries to separate entities from persistence. But it stays in half way. Yes there are defined entities and manager which handles their storage, but entities itself tells the manager how they should be stored (e.g. have some table property). This complicates the ability to have multiple storages and define global rules how to store. And of course it breaks SRP principle.

JPA — Java Persistance API

This is really tough guy on our playground. He is very experienced and have lot of fans, but he is not perfect.

JPA is most used pattern in JAVA applications and have implementations like Hibernate. Even light-weight languages such as PHP have implementations of this in (not such good) Doctrine2.

JPA is quite similar with Data Mapper and have same issues. But on the other hand it defines a lot of good principles. Like open-closed principle or forcing empty constructors.

Entity Data Model

Entity Data Model defined in .NET done almost everything right. Its largest problem is that it is vendor locked and for example uses xml definitions which are not good on other platforms/languages. But basically if you can, use it. You will not do anything bad.

The Right Way?

Well designed model should be aware of all these issues and solve them. So I made Model for large applications which I recommend to everyone who is looking for thoughtful theoretical model which is proofed in production.