Clean DDD lessons: modeling identity

Published in

Technical blog from UNIL engineering teams

5 min readJun 9, 2023

This is the first part of a series of small targeted lessons supplemented with clear examples intended to introduce different aspects of Clean Domain-Driven Design. In this article we focus on different aspects of identity modeling.

Whether we are talking about entities, value objects, or aggregates the question of identity is a primordial one. Common rule of thumb in DDD is that having an identifier (commonly referred to as an ID) is what differentiates an entity from a value object. This identifier value, once set for an entity, never changes in the course of the life of the entity even though all other attributes of the entity may change. Lets examine some interesting aspects of identity modeling in more details.

Natural identifiers

Just like everything else in our model, identifiers must strife to correspond as closely as possible to actually existing notions or attributes of the real world. The notion of identity and identifiers is extremely prevalent in the world and has been for some time. A very carefully thought out entity will most likely correspond to a really-existing entity with a natural identifier. This is simply because identifying different instances of entities has always been an essential task in any human activity. We have invented, cataloged and persisted in various forms all kinds of identifiers for persons and things out there, long before computers were invented.

Prefer a natural identifier, if there exists one, when modeling an identifier of an entity.

Natural identifiers by definition follow closely Ubiquitous Language. They are guaranteed (by their very nature) to provide the reasonable uniqueness within the bounded context of our application. And, of course, they are readily understandable by the users of the application. These are all good reasons to prefer them as the identifiers for our entities. Here are some examples of natural identifiers that we have seen already.

In the context of Cargo tracking domain, we have UN/LOCODE identifier for Location entity.
In Library domain we have a well-known ISBN identifying CatalogEntry entities.

Generated identifiers

Modeling complex domain may require conceptualisation of entities for which no convenient natural identifier exists. These are the cases for which a schema for generation of identifiers needs to be invented. When designing such schemas we need to keep in mind certain caveats. Generated identifiers must be reasonably unique at least in the bounded context surrounding the entity. UUID and GUID are well-accepted such identifiers for which there are multitude implementations exists for any programming language. Another popular approach is to rely on autoincremented numeric values assigned on demand by a relational database used to persist the entity after creation.

Avoid using autoincremented values assigned by a RDBS as identifiers for domain model entities.

The main reason to avoid using autoincremented values assigned by a RDBS as identifiers for models is that, usually, they are not available at the time when we are creating new entities. It is only after we have persisted an entity (with a null identifier) that we actually get a (autoincremented) value from the database. That means that we can not construct an instance of a valid entity with a valid (non null ) identifier right from the start while performing any business logic in the Entities layer of our application.

Generated identifiers, such as UUID or GUID, have one significant disadvantage — they are somewhat unwieldy and unreadable either by the developers themselves or the users of the application. There are ways to design identifiers which will be more user-friendly: shorter and more readable, with a reasonable sacrifice to the uniqueness of the generated values. Nano ID is one such example. It allows to generate random identifiers of fixed length using a provided set of unique characters: called “alphabet”. The greater is the alphabet and the longer is the desired length of identifiers, the less there will be a chance of two randomly generated identifiers colliding with each other. Obviously, the chances of collision will depend greatly on the average rate of generation of new identifiers by the application. There is a collision calculator for Nano ID available which will estimate the chances of such collision occurring for a given alphabet and a given rate of generation of new identifiers.

Another simple technique to improve readability of the generated identifiers is to prefix each (randomly) generated value with a well-known short prefix usually consisting with a few letters. This prefix will be unique for each entity in the bounded context but readily understandable by a developer or a user. For example, we may assign the letter c as a prefix for all identifiers for Customer entity and the letter p — for Product entity. Then, using Nano ID generator, for example, we may produce a set of following identifiers for Customer :

// Customer IDs always start with "c"

cTapdqM
cS30e03
cZNRdbK

And a set of identifiers for Product :

// Product IDs always start with "p"

pbKddNP
pndu8zR
pEvzuZW

This technique will greatly improve the readability of identifiers while, at the same time, eliminating the chances of collision between the identifiers of different entities (impossible since the prefixes are different and the length of the random part is fixed).

Value Object wrappers for IDs

As we have seen, there are many considerations governing the choice of actual values for identifiers. But when it comes to the implementation aspect, there is little room for ambiguity.

Never use primitives (integers, longs, etc.) or Strings as IDs of entities. Always wrap them in a corresponding ID value object.

We should not use primitives (int , long , etc.), their boxed counterparts (Integer , Long , etc.), or even String values directly as identifiers registered with entities. Instead, we should wrap these values in value objects and use instances of these objects as IDs. Here is an example of such value object, slightly modified from the original Cargo tracking domain (by DDDSample). It wraps UN/LOCODE identifiers for Location entity.

/**
 * Modeled after original "se.citerus.dddsample.domain.model.location.UnLocode".
 */
@Value
public class UnLocode {

    String code;

    @Builder
    public UnLocode(String code) {

        // Must not be null and must conform to UN location code format
        if (!notNull(code).matches("^[a-zA-Z]{2}[a-zA-Z2-9]{3}$")) {
            throw new InvalidDomainObjectError("Invalid UN/LOCODE: <%s>".formatted(code));
        }

        this.code = code.toUpperCase();
    }

    public static UnLocode of(String code) {
        return UnLocode.builder().code(code).build();
    }

    @Override
    public String toString() {
        return code;
    }
}

Several things are notable with this implementation.

Popular Lombok is used for a concise and a transparent implementation of Value Object pattern.
A helpful static method UnLocode.of() is provided for the convenience of a caller.
A single point of entry (the constructor) assures that no invalid UN/LOCODE String : i.e., null or not matching the authorized regular expression, can be passed to create a valid UnLocode instance.

Using wrapper objects for IDs greatly facilitates the comprehension of the overall codebase since it promotes the use of Ubiquitous Language. The typed nature of these value objects will also help to avoid many errors at the compile time.

Conclusion

We have looked here at some aspects of modeling identity in DDD. We have discussed natural identifiers, generated identifiers, and how to use Value Object pattern to wrap identifier values in a (typed) object.

Clean DDD lessons: modeling identity

Written by George