Creating Coding Excellence with Domain-driven Design

Published in

The Startup

13 min readOct 16, 2018

How Alibaba’s Hema team utilizes domain-drive design (DDD) to create efficient, flexible, and evolution-ready coding architecture to meet the intensive business needs of a new kind of supermarket.

Allow me to introduce myself. My name is Qunhui, a Senior Staff Engineer at Hema (盒马, Alibaba’s digitally-connected offline supermarket.). My career as a programmer has spanned many years, and during that time, I have both seen and written many lines of code. Creating coding structures that deliver the best quality software products is a constant topic of conversation between myself and other programmers. Recently, during these conversations I always find myself bringing up the topic of domain-driven design (DDD) sooner or later.

Ace programmers have a mixed bag of viewpoints in regards to DDD. DDD is one of many programming approaches that is far from being perfect or enjoying overwhelming advantages over other approaches. As for my personal views, I am more willing to comment on whether the design itself has attracted due attention. In my opinion, a good design is a good design, regardless of the approach used.

Most of the code I come across is not DDD-based. Even methods with less restrictive standards, which measure the final design based on whether the code that generates it is compatible with different approaches, do not significantly enlarge the overall pool of available code. Most appear to be “spaghetti code,” that completes an operation by directly storming to databases from terminals. Most designs concentrate on databases, sometimes without even using database design, leaving a heap of fields that make you wonder why they are there.

Coding should not be simply writing up a bunch of code; code should be simple and elegant. Strong testing capacity ensures our software products are externally excellent (hats off to the Hema team’s stress testing staff!). However, the internal quality of said products is continuously ignored. All resources are invested in completing projects with tight deadlines, leaving internal quality increasingly falling behind where it should be.

Hema’s services as a full-scale supermarket are more businesses-oriented. The entire chain from supply to distribution is complicated and heavily-coupled. It is impossible to keep everything organized everything without sorting out the different relationships. Design is extremely important in this domain. Improper designs can end up being a huge challenge that those in charge of subsequent steps can find difficult to tackle.

For my module, we’ve implemented a complete DDD cycle for building the entire system. Our own individual thoughts and tweaks are part of this implementation. I would like to share them with you all here. Hopefully, you will find our experience useful.

Domain Models: Object-oriented vs. Database

There are two commonly-used approaches for design in DDD:

1. Database design: Data is abstracted, and its relationships are defined as databases (also known as data modeling).

2. Object-oriented design: Data is abstracted, and its relationships are defined as objects (also known as object modeling).

Most coding architects prefer database design over object-oriented design in the early stages of designing a software system. While both approaches are equally important and non-conflicting, the final state of the system can be vastly different depending on which approach is used.

Database design

Previously known as the “data dictionary,” the domain model (data model) for this design mode is commonly-used by experienced coding architects. The clarity of the domain model determines the intrinsic quality of software products. Products built with properly-structured domain model have a clear-cut structure that allows for easy modifications and affordable implementation of future evolutions. The architect plays an instrumental role in the developer team by defining the software structure, which ultimately determines the future readability, scalability, and evolutionary capacity of the software. Generally, the architect designs the domain model, and the developers use the domain model as the structure for writing their code. The domain model is essentially the foundation of database design.

Architects continuously evolve the domain model based on demand discussions. Some designers write the domain model as SQL statements, which develop in a process comparable to that of a rose bush:

1. A single table is generated (a seed is planted)

2. Multiple tables are generated (the seed takes root and sprouts)

3. Design faults occur (the plant sprouts excess shoots)

4. The design is corrected (a gardener trims the excess shoots)

5. Final launch (the rose blooms)

In conventional programming, the coding architect typically generates pages of architectural design that are combinations of dense words and domain-defined database table designs. All subsequent iterations are developed from the domain model to create architecture similar to the following figure:

To explain this mode in an abstract sense, we will create a coding architecture using a hypothetical theme of a father’s conscience motivating him to scold his naughty son.

· On the service layer, a preferred manager is set to manage a majority of the corresponding logic.

· The POJO, or blood loss domain model (explained in a subsequent section), serves as the data and is constantly modified and combined by the manager (the father’s conscience).

· The heavy service layer serves as a huge data processing plant that completes operations logic based on the database.

Using “Father” and “Son” tables, the generated POJO is:

public class Father{…}
public class Son{
	private String fatherId;//The fatherId in the table serves as the external key of the Father Table id
	public String getFatherId(){
		return fatherId;
	}
	……
}

Now, let’s say our hypothetical “Son” does something naughty and his “Father” has to scold him, leaving both the “Son” and the “Father” in emotional pain. The manager acts as the “Father’s” conscience and guides him to scold the “Son” using the following process:

public class SomeManager{
	public void fatherSlapSon(Father father, Son son){
		//Please be understanding if the logic does not make sense
		father.setPainOnHand();
		son.setPainOnFace();//Assume that emotionalpain is a database field
	}
}

Object-oriented design

When working with DDD it is good to operate under the assumption that if your machine consistently holds out with infinite memory, data persistence, and therefore meaning databases, are not required. When faced with this scenario, Alibaba Hema experts recommend designing software under a philosophy they refer to as
“persistent ignorance.”

Without databases, the domain model must be designed on the basis of the program itself, which presents expert coding architects with a fantastic opportunity to demonstrate their full range of skills.

In the opinion of Alibaba Hema experts, object-oriented modeling is the ideal modeling approach compared to process- and function-oriented modeling. The functions of class and table are somewhat similar, and together with row and object, some experts feel they have corresponding relationships.

However, experts like Hema’s HuiJin strongly oppose such correspondence because they feel it makes software design meaningless. This is because clusters and tables have several major differences that result in domain modeling expressions with drastically different diversity. Sealing, succession, and polymorphism allow for much livelier domain model expressions and maintain stricter compliance with SOLID principles.

The following are explanations of important aspects of domain expressions:

· Reference:

Relational databases represent the many-to-many relationship using the third table. This domain model refers to non-visualization. Staff in charge of business systems do not understand this mode.

· Sealing:

Classes can design methods. Data does not give a full picture of the domain models. For example, a data table can compile the 3D data of a person, but cannot tell whether they are running or walking.

· Succession/polymorphism:

Classes can be polymorphic. For example, data cannot tell the difference between humans and pigs in term of behavior, except when using 3D data. A data table does not know that people and pigs move in different ways.

Let’s return to the example of the angry father scolding his son:

public class Father{
	//The father scolds his son on his own, regardless of whether his conscience (the manager) offers help
	public void slapSon(Son son){
		this.setPainOnHand();
		son.setPainOnFace();
	}
}

Following this approach, we gradually design lively domain models in an object-oriented environment. The service layer is a collection of exact operations based on these models (it becomes thinner, and leaves many actions to be handled by domain objects). Domain models do not complete operations. Each domain object performs its intended purpose (single responsibility).

Let’s consider another hypothetical example of a person running. “Person.run” is an action unrelated to operations but the manager or service can, by calling on person.run, complete a 100m race, or deliver a take-out food order, among other operations. The resulting architecture would be similar to the following figure:

Now, let’s remove the assumption that your machine consistently holds out with infinite memory. In reality, such machines do not exist. Without this assumption, we need a database whose responsibility is no longer carrying the heavy burden of domain models. Instead, the database plays its original role of persistence by:

· [Storing] Stores object data in the longevity storage medium.

· [Getting] Efficiently returns data query results to memory.

When unburdened from carrying domain modeling, database design can go whatever directions the developers would like to go. Measures that accelerate storage and search can also be added. We can also add columns to databases, use document databases, or design exquisite staging tables to complete big data query. In short, database design focuses on efficient access, rather than the perfect expression of domain models.

Now, let’s look at the architecture again:

The following concepts should be emphasized:

· Domain models are used in domain operations. They can be used to read, but not without a price. Under this precondition, one aggregate might contain some data that supports actions similar to getById, but is not applicable to query. Serving queries are not DDD’s initial purpose.

· Queries are based on databases. Complicated queries should steer clear of the domain layer and directly interact with databases.

Trim more: Domain operations > objects > data query >table rows.

Blood Loss, Anemic & Rich Domain Models

Blood loss, anemic, rich, and bloated domain models were first proposed by Martin Fowler for defining domain models based on their richness. We will skip discussing bloated domain names here due to their excessive size.

Blood loss domain model

Database-oriented DDD is a typical example of a blood loss model. For Java, POJO only offers simple, field-based setter and getter methods. Inter-POJO relationships are hidden in certain object IDs, and they are explained by the external manager. Using our previous example of a son.fatherId, when the “Son” does not know of his relationship with the “Father”, the manager accesses the “Father” through the son.fatherId.

Anemic domain model

To explain the anemic domain model, let’s return to our father-son example. If the son does not know who his father is, does he find him by DNA verification (son.FatherId) using an intermediary (manager) every time? No, this is not possible. The domain model can be made richer by tweaking the “Son” class as follows:

public class Son{
	private Father father;
	public Father getFather(){return this.father;}
}

The “Son” class is now more diverse. However, another inconvenience is encountered, in that there is no way to know the “Son” through the “Father.” Why does the father have no idea who his son is? To address this issue, we add the following attributes to the “Father”:

public class Father{
	private Son son;
	private Son getSon(){return this.son;}
}

Now, these two classes are more robust and have developed into what we call the anemic domain model. This model has now established a decent “family” where the “Father” and “Son” know each other. But, upon taking a closer look at these two classes, something problematic is revealed: An object is typically obtained using a repository (database query) or factory (new memory), as detailed in the following figure:

Son someSon = sonRepo.getById(12345);

This method extracts a “Son” object from the database. To build a complete “Son” object, sonRepo needs a fatherRepo to build a “Father” to assign to son.father. Similarly, to build a complete “Father”, fatherRepo needs sonRepo to build a son to assign to father.son. As a result, an undirected loop is formed. This recursive call problem can be solved. However, it comes at the cost of the domain model becoming a makeshift one, which is intolerable for developers pursuing pure domain models.

For Alibaba’s Hema team, our target is quite the opposite: directed and loopless. The question becomes to prevent this recursive call, can we leave out a reference in the “Father” class and “Son” class? The father class is modified as follows:

public class Father{
	//private Son son; Delete this reference
	private SonRepository sonRepo;//Add a repo to the Son
	private getSon(){return sonRepo.getByFatherId(this.id);}
}

In this way, building a “Father” does not build another unwanted “Son.” However, the price we pay here is that a SonRepository is introduced into the “Father” class. In other words, a persistence operation is referenced in a domain object. We call this the rich domain model, which is introduced in the following section.

Rich domain model

The existence of rich domain models causes domain models to lose their purity. These models are no longer a pure memory object, but one that hides a database operation, which does not bode well for testing. A quick unit test is performed before connecting with a database (explained in detail later). Sometimes rich domain models are required to ensure the model is complete.

Let’s use another hypothetical scenario: A Hema store has thousands of items on the shelf, each with hundreds of attributes. If all these items must be showcased when building a store, the efficiency is poor:

public class Shop{
	//private List<Product> products; This list of items being built is too large
	private ProductRepository productRepo;
	public List<Product> getProducts(){
		//return this.products;
		return productRepo.getShopProducts(this.id);
	}
}

At this point, it is also prudent to provide a short description of dependency injection:

· Dependency injection is a singleton object in runtime. Only objects (@Component) within the scanning scope of spring can use dependency injection through annotation (@Autowired). New objects cannot get dependency injection through annotation.

· Alibaba Hema expert, Hui Zi, recommends constructor dependency injection, which is testing friendly, builds perfectly complete objects, and explicitly tells the programmer which objects they must mock/stub.

Now, let’s go back to the rich domain model and revisit our father-son example:

public class Father{
	private SonRepository sonRepo;
	private Son getSon(){return sonRepo.getByFatherId(this.id);}
	public Father(SonRepository sonRepo){this.sonRepo = sonRepo;}
}

In this scenario, creating a new “Father” requires assigning a SonRepository, which makes writing code cumbersome. The question becomes, is injecting SonRepository by dependency injection an attractive option? Here, the “Father” is not a singleton object that can be recreated in the New and Query scenarios. SonRepository injection is impossible while building the “Father.” Situations like this are when factory mode, which some consider useless, demonstrates its value:

@Component
public class FatherFactory{
	private SonRepository sonRepo;
	@Autowired
	public FatherFactory(SonRepository sonRepo){}
	public Father createFather(){
		return new Father(sonRepo);
	}
}

As FatherFactory is a system-generated singleton object, SonRepository can be naturally injected into Factory. The newFather method hides the sonRepo of this injection, and thus a new Father becomes clean.

Testing-friendly domain models

Blood loss and anemic models are pure memory objects and intrinsically testing-friendly. But in practice, rich domain models do exist. To remove rich domain models, the domain object must be dismantled and thus becomes somewhat discombobulated. Frankly, the battle between anemic and rich domain models never ceases. In rich domain models, objects carry the property of persistence and thus become dependent on databases. The basic requirement is then to mock/stub these dependencies. Let’s look at our “Father” example again:

public class Father{
	private SonRepository sonRepo;//=new SonRepository() Construction is not possible here
	private getSon(){return sonRepo.getByFatherId(this.id);}
	// Place in the constructor function
	public Father(SonRepository sonRepo){this.sonRepo = sonRepo;}
}

Placing SonRepository in the constructor function confirms whether it is testing-friendly. The Unit test can be performed smoothly by mocking/stubbing the Repository.

How Hema implements repository

When using the object-oriented approach, the domain model exists in memory objects that eventually end up in a database. Removing restrictions for domain models allows for flexible and variable database design. Let’s look at how domain objects find their way into Hema databases:

In Hema, we designed Tunnel, a unique interface that allows for accessing domain objects in different types of databases. Repository does not directly perform persistence but transforms domain objects to POJO Tunnel for persistence. Tunnel can be implemented in any package. In this way, domain objects (domain objects + repositories) and persistence (Tunnels) can be separated completely, and domain packages become a pure set of memory objects.

Deployment architecture in domain models

Hema operations are heavily interlinked. From making purchases from suppliers to delivering products to customers, the relationships between objects are clear. By principle, a large, fully inclusive domain model can be used. Another option is using boundedContext to divide domains into sub-domains, as demonstrated by one of DDD expert Martin Fowler’s illustrations below:

For Hema experts like Huizi, the ideal deployment structure is:

Summary

In summary, through DDD, Hema is exploring the vast possibilities of coding architecture design. Hema’s brand new 2B+Internet business mode offers many details worthy of in-depth research. DDD has demonstrated solid initial performance for Hema by standing up to real-world challenges in business scalability and system reliability. With the help of DDD, the Hema team are carefully constructing an Internet-based distributed workflow engine (Noble), and the fully Internet-based graphic drafting engine (Ivy). More unique designs are expected from Hema engineers in the future.

(Original article by Zhang Qunhui张群辉)

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.