Domain-first vs. Schema-first Architecting

Published in

That’s What I’m Talking About

10 min readAug 16, 2016

I have spent a lot of time talking about a “domain-first” approach to designing and building software. As the title of this publication suggests, “That’s What I’m Talking About.” This has taken me down the path of advocating for static types and in particular, for modeling your entire application domain as pure Java interfaces. There is a counter argument (held dearly by friend and editor Nathan Dintenfass) that a well designed data model is going to do a more than adequate job of explaining what the application is. Furthermore, if the data model is carefully designed and built, any data stored within is guaranteed to be “correct,” meaning that with the referential integrity checks built into the schema design, regardless of the application sitting atop the database, we have a structurally imposed guarantee that the data being added conforms to our schema. Hence, we are free to begin coding other applications that use this same schema and we can be confident that the data will conform. If we code a prototype REST API on top of this schema in Ruby and later decide we want to change it over to Java, no problem!

It’s true. Modeling applications this way, using relational databases as the store, and using referential integrity constraints in those stores are great ways to guarantee at a very low layer of the application that bad stuff can’t get in, regardless of the language choices higher up in the stack.

The “Schema-first” approach: the database IS the domain.

But I have a problem with the base assumption that I can completely model my domain in a relational data store. So let me be clear about a couple things:

I do not see domain modeling and data modeling as the same thing.
I strongly believe the domain modeling should be done using a statically typed, abstract construct such as a Java interface. Other languages have similar constructs, but I prefer Java interfaces because they are simple, they can’t have code (Java 8 has introduced `default` and now allows static methods in interfaces, but I would suggest using those sparingly when defining your domain), and they can be “mixed in” (via the `implements` keyword) rather than forcing themselves into the object hierarchy (via the `extends` keyword).

Schema-first vs. Domain-first

If we are modeling a domain using a DDL to define our tables and constraints, we can certainly define all of the attributes that are present on each object in the domain. We can easily define various types of relationships among the objects in the domain. We can even overlay integrity constraints on those objects and relationships to make sure that an error is raised when a program violates those constraints.

I don’t have a problem with any of this per se. The problem I have is starting with this. I believe that the analysis and design of the problem domain should not be burdened by decisions about how the data should be stored. It is confusing, and it distracts us from truly thinking about the task at hand which is creating a robust foundation for this application, and hopefully for related “adjacent” applications in the same domain. Remember, what I am advocating here is that we introduce a layer of declarative code that is essentially training the compiler how to understand and verify our problem domain.

I am making another implicit assumption here — that we are going to want to use an Object Oriented language to express our problem domain. Despite recent tirades against OO’s usefulness, I am not ready to close the book on Object Orientation as an indispensible tool for creating valuable software solutions, especially for business and consumer applications. But if we are going to use an Object Oriented language and store our data in a relational database, things could get complicated. Whether or not you believe in the so-called ORM impedance mismatch, it is certainly undeniable that if you are going to model a system where object inheritance would be an obvious design decision, you are going to have to make a decision in the domain modeling phase as to whether you will use table-per-subclass or table-per-hierarchy as your relational mapping strategy. If you are going to use a relational database, you’ll have to make this choice at some point anyway, but if you have to consider it during the domain modeling phase of a project then you may begin (possibly inadvertently) to make modeling decisions based on schema preferences and limitations which is, as they say, putting the cart before the horse.

If, on the other hand, you are just modeling the domain using Java interfaces, simply declare the base type then the sub-type that extends it and move on. There is an obvious example of this in the Pirc.com application (which, I promise, we’ll actually start building one of these weeks). In that domain, we have the notion of a `Deal` which encapsulates a name, description, a start and end date, and deal terms. But there are also deal sub-types: a `Sale` is a `Deal` that is available at a particular retailer, and a `Coupon` is a `Deal` that is tied to a particular product rather than retailer and may be matched up with a `Sale`. Done. We haven’t made any assumptions about how these things will be stored, nor have we even forced a type of persistence onto them, we have simply said that a core type in our system will be `Deal`, and that there are different sub-types of deals that need to be supported.

The “Domain-first” approach: a declarative layer of code allows the various applications to make no assumptions about where or in what format the data is stored.

Domain Objects That Are Not in the Database

There is a more important case to account for though. What about the case where an object that is clearly in your domain won’t necessarily be in your database? Consider, for example, payment information and credit card processing that we intend to push out to a provider like Stripe. If we will be accepting payment on our website, we are going to need to model that in somehow. And that modeling will likely include a data component on our side, but there may be be other interactions that we need to model that go beyond the simple attributes+relationships that can be modeled in a relational schema. Another example is transactional e-mail, where the formulation and sending of a notification may involve objects in our domain, while the endpoint of said object would actually be a transactional e-mail provider like MailChimp or SendGrid. You could choose to consider this “out-of-band” information that doesn’t need to be modeled, but I like to include it in my domain model as an indication that part of our application is concerned with sending notifications based on certain types of events.

Exceptions

One other item that I will be getting to in a future post is the idea of including fine-grained Exception types in a sub-package of your domain model. These exception types, of course, will not be interfaces. They will be concrete classes, and they are clearly an aspect of the domain model that can’t be modeled in a pure schema-first approach. Having exceptions be a part of your domain model makes the java package describing the domain even more robust by giving some visibility into the types of errors that may occur in the domain. I’ll save the checked vs. unchecked exception debate for another post, but suffice it to say that Exceptions are an underutilized aspect of the Java language in my opinion, and they can and should have a place in any domain modeling exercise.

Crossing Execution Context Boundaries

Let’s step back for a moment and recall the source of the debate here. Whether we are building schema-first or domain-first, the point is that we want to have a reliable foundation on which to build the next layer. Let’s say for the sake of argument that the next layer up from either the schema or the domain is a set of controllers that defines a REST API for our domain. These controllers will either depend on the data access layer exposed by the underlying persistence mechanism (be it an ORM solution, straight SQL, or some other approach) or they will depend on the domain interfaces that we have defined. Another application, such as a unit test, a batch process, or perhaps another web application, would need to depend on the same schema or domain in order to accomplish its task.

I would like to introduce at this point the notion of an “Execution Context” as an architectural concept. The term has been used many times before in various languages to refer to the environment or scope in which a particular piece of code is running. Environment and scope are pretty implementation-specific ideas, so perhaps we should instead call these “Architectural Contexts.” In the current discussion, I would like to consider the following areas as separate and distinct Architectural Contexts:

The persistence layer
The web layer that contains the controllers for providing our REST API
Unit tests
Another web application, such as a back-office administrative app.
Batch jobs that create/modify our domain objects in bulk.

These different Architectural Contexts don’t depend on one another, and each may exist completely independently. For any of these applications to be of use, they will need to be able to cross an architectural boundary. The simplest example of this is the REST API controllers interacting with the persistence layer. This idea is very similar in concept to Jeffrey Palermo’s Onion Architecture, with a different basic structure: rather than concentric layers emanating from the central domain, I see all of these Architectural Contexts as free-floating cells whose outer membrane may only be crossed by using domain objects.

In the Domain-first approach, Architectural Contexts are self-contained cells whose outer membrane may only be crossed by domain object instances.

Looking at things in this way, where persistence is no longer a layer but rather an Architectural Context, we finally see a fundamental difference between a schema-first approach vs. a domain-first approach. In the domain-first approach, code that knows how to persist the domain objects is not a compile time dependency of the REST API code. Practically speaking, though, the REST API won’t be all that useful without the persistence layer. The aforementioned Jeffrey Palermo had the same situation in his Onion Architecture and has this to say about it:

The database is not the center. It is external. Externalizing the database can be quite a change for some people used to thinking about applications as “database applications”. With Onion Architecture, there are no database applications. There are applications that might use a database as a storage service but only though some external infrastructure code that implements an interface which makes sense to the application core. Decoupling the application from the database, file system, etc, lowers the cost of maintenance for the life of the application.

Integrating External Services as Architectural Contexts

Another place where this architecture becomes particularly useful is in pieces of code that integrate with external services. Such services will have their own libraries for accessing the service. These libraries will therefore need to be incorporated into the source code and build process as a dependency. Rather than having any of our Architectural Contexts rely on the third party library, we will code contexts that will expose methods in the domain model for accessing the service, thus enabling any of our other Architectural Contexts to access this new functionality. This approach came in handy for me on a project where we changed transactional email providers in the middle of the project. By modeling application notifications into the domain and having the web Architectural Context access the portion of the domain model that dealt with transactional email, it was fairly easy to swap out the old provider without affecting other parts of the code.

External Services can also be mixed into our Domain Stew and exposed to other contexts via the appropriate domain interfaces.

Packaging Architectural Contexts

One thing I want to make clear here is that I’m not making any assumptions about how these contexts are packaged either. Frequently a diagram such as the “domain stew” shown above implies something about the runtime deployment of each Execution Context. For example, in an Actor-based system the diagram may mean to imply that each context should be packaged as an actor. In a micro-services based system, each context might be its own JVM or Docker container. Determining the best way to deploy these components is not the goal of the domain-first approach. In fact, when first starting out these different contexts may just be different jar files or even just different packages in the same build unit. But it should be noted that as long as all of the Architectural Contexts are only able to access the domain interfaces and not the underlying implementations, none of them will care how the others are deployed — just use the specified domain interface as instructed and everything else will just work.

Finally, More Specificity on What the Domain Model Actually Is…

If you’ve read my previous posts on this topic, you may have noticed that they are a bit thin on the details of what is actually contained in a domain model. Hopefully this post has given a more complete idea on what I view as critical components to the domain model. Those components are:

Interface definitions for all of the objects in the system. It is expected that these definitions will grow and change over time. As they do, the compiler will help us find the places in our architectural contexts that need to change as well.
Interface definitions for objects that don’t live in the system but still need to be a part of the domain. The very fact that they are in our domain re-enforces the “location indepenence,” meaning that users of the domain model can’t know, but shouldn’t care, where the modeled objects actually live.
Exceptions for describing error conditions in the doman. Why stop at specifying attributes and relationships when we can also specify exception cases that are intrinsic to the domain.

Putting all these components together into a single package or family of packages that in total defines our domain will go a lot further toward the ultimate goal here of building reliable software that can grow and change as our product matures.