Demystifying the DDD Aggregate

Published in

CodeX

7 min readJun 4, 2023

“Other objects have longer lives, not all of which are spent in active memory. They have complex interdependencies with other objects. They go through changes of state to which invariants apply“ Eric Evans

Software engineering is synonymous with complexity and Domain-Driven Design claims to tackle it at its very heart. But there is a number of light bulb moments people must go through before they can apply it effectively. Just a few of these are: what constitutes enough complexity to adopt DDD? What does complex even mean? And what actually is an aggregate?

Domain Model

An aggregate is a model of something in the real world. More specifically it relates to the problem domain that the solution is being developed to solve.

There are certain realities of these things that inform the characteristics of the model. They exist over a lifetime. They are composed of multiple interacting parts. They adapt.

When Martin Fowler documented the growing use of domain models in the early 2000s in his seminal book “Patterns of Enterprise Application Architecture”, he summarised them as:

“An object model of the domain that incorporates both behaviour and data”

This close association with objects is hardly surprising given the predominance of object-oriented development at the time. He also made reference to an exciting new book in the offing by Eric Evans which would explain domain models in much more detail. Of course what we now refer to as the original Domain-Driven Design Blue Book.

Object-Orientation

In contrast to the simpler CRUD data model which is a data-first design, aggregate design focuses on behaviour. Although it does not need to be implemented using an object-oriented language, it is very much informed by the ideas and benefits of objects. Such as cohesion of code and data and encapsulation of that data and lower-level code.

One of the main challenges in adopting DDD still remains a grasp of good object-oriented design. Often referred to in the DDD world as the conflict between rich domain models and anaemic domain models.

The rich domain model has high encapsulation with only high-level behaviour (i.e. the aggregate root) exposed to external interaction. The anaemic domain model has low encapsulation with lots of getters and setters exposing the underlying data. This is effectively reverting to CRUD and will not give you the benefits of using an aggregate.

I remember a similar conflict over 20 years ago when I was an inexperienced C++ developer who thought in terms of objects. I was working with more experienced colleagues coming from the procedural paradigm of C. They were creating classes that looked more like glorified structs, often with public data with little to no cohesion or encapsulation.

However in retrospect, although my classes were more cohesive, I often liberally used getters and setters so I was far from perfect myself. I never knew about the principles of the Law of Demeter or Tell Don’t Ask. DDD adoption still wrestles with these age-old misapprehensions.

CQRS is one way to ensure that your aggregate focuses on behaviour and avoids preoccupation with state retrieval. It delineates the responsibility of the object into separate read, and write models. The read model(s) are focused on inferring the state of the aggregate from different perspectives (i.e. projections). Freeing the aggregate to take on the role of the write model, focusing solely on behaviours and state changes.

Aggregation

Some of the hallmarks of complexity are when you have interdependence and the emergence of higher-order scales. In this respect, the aggregate is a coherent whole aggregated from a hierarchy of interdependent parts.

These parts are entities or value objects.

The value object is a type of immutable data model which does data validation.

The entity in contrast is mutable and more like an object in the traditional sense of object-oriented development. With coherence and encapsulation. As well as state across its lifetime. It also needs to be uniquely identifiable.

These are all traits which it shares with the aggregate.

Adaptation

The challenges of complexity do not come with only interdependence. There needs to be change.

Take a legacy codebase — where there is almost certainly a high degree of coupling. A big ball of mud!

On the plus side, it might represent a very successful product. If you can get away without updating it much and it continues to be lucrative then happy days.

But the luxury of not changing is not an option for most software code bases. Market needs and society continuously evolve. And the superpower of software development is to enable solutions to adapt to these changing needs. This is where it becomes problematic. High interdependence and volatility lead to many unexpected consequences.

The aggregate is about interdependence. But it is also very much about change. It is often referred to as a “write model”. So in CRUD terms, this is equivalent to the Update.

The type of change that is required can indicate whether the investment in DDD will be a benefit over CRUD.

Let’s first consider how a CRUD data model changes.

After creation, there is a series of updates. Each update is relatively unconstrained. Apart from some data validation.

When things become complex, there is more adaptation. Frequent changes in behaviour and form can occur to respond to changing contexts. Think lifecycle or workflow. Although the progression may also be non-linear.

So there is data validation like CRUD.

But the form must also be validated depending on the current state (i.e. different attributes apply in different states). What states you can change to next are also constrained by the current state. And the change to any of these valid states is only possible under certain conditions.

This might sound a lot like a familiar software pattern — the state machine. Where you have a current state. And a state handler for each possible state that represents its behaviour in response to external events it is exposed to in that context. And within state handlers, there can be transitions to other states.

Identity

Adaptation poses a challenge for identification.

If you meet a work colleague in a social setting, you will probably recognise them. But what about a picture of them as a child? Or a baby?

In systems, we need more certainty. We assign unique ids to things to ensure a stable mechanism to identify through change. Passport IDs. National Insurance Numbers. Credit card numbers.

Aggregates are the same. A popular approach is to generate a GUID. But it can be anything as long as you can guarantee uniqueness in their context.

Self-similarity

Another common feature in complex systems is self-similarity. This is the idea that a higher scale is similar to the scale below it. For example, broccoli looks like a bigger version of one of its florets.

One of the reasons that the aggregate has so much commonality with an entity is it is an entity itself. This is called the aggregate root and it coordinates behaviour across the hierarchy below it. The interface to this entity is the only way for anything to interact with the aggregate. Thus reaping the benefits of encapsulation.

But unlike other entities, there is another responsibility of the aggregate. Its primary purpose.

Consistency Boundary

“Invariant: An assertion about some design element that must be true at all times, except during specifically transient situations such as the middle of the execution of a method, or the middle of an uncommitted database transaction.” Eric Evans

One of the great challenges of software development, particularly with concurrency, is the consistency of data.

Imagine implementing an array in a multi-threaded environment. It has two pieces of data - the items’ contents and the count of items.

Consider an insert on an instance of this array. After changing the contents but before updating the count, the running thread is interrupted. Now another thread reading this object will get an incoherent response causing unintended consequences.

The data is inconsistent and the contract of the class has been broken. More specifically that it must always be possible to retrieve a stored item from any index in the array. In this instance that is not true. This is an example of an invariant – behaviour that must always be true.

In order to correct this, all the operations carried out to insert an item need to be treated as an atomic change. Something that can be done with locking.

The same problem occurs with data persistence. For many years the relational database was the go-to solution for this. Its ACID transaction guaranteed consistency and atomicity. In fact, it solved it so well that we almost did not need to think about it.

But like everything in our complex world, there are no silver bullets. Only tradeoffs.

In recent years, the limitation of the centralisation of the relational database has been recognised. It still is very useful in many contexts. But with scale and complexity, it can become a single point of failure, a performance bottleneck and a technical distraction from the real needs of the domain. This has led to the growth of NoSQL databases.

DDD and the aggregate predate NoSQL. But there is a natural symbiosis between the two (although the aggregate will also work well with a relational database). In part, this has helped to make DDD popular in recent years.

Remember that it models a domain object which exists over a lifetime. This means it will need to be persisted because of working memory scarcity and susceptibility to power failure.

An aggregate is designed as a highly cohesive domain model where data consistency and atomicity are required across different parts of the object to make key business decisions. In a nutshell, aggregates are designed by identifying invariants.

This allows them to tradeoff data consistency where it is needed with decomposed and decentralised persistence requirements.

Thus enabling the potential of production scale and resiliency. As well as a basis for independent database control for different teams. One of the critical missing ingredients in many a failed microservices adoption.

Summary

An aggregate exposes the behaviours that change a complex domain object. Whilst encapsulating the complexity of that change. Complexity from an object composed of other objects, and the messy dynamics of their interactions. Complexity from adaptation to different contexts through a lifecycle. And through that changing lifecycle always the ability to be uniquely identified.

But most importantly aggregates are designed with a careful and nuanced view of data consistency. Their boundaries are drawn around invariants, characteristics of the domain object that must always be true.