A large scale redesign journey using Domain-Driven Design techniques

Published in

Brex Tech Blog

8 min readJun 13, 2022

Brex has been offering a spend management product for almost 2 years, with rich features. These include paying bills and exporting to accounting software, submitting and processing reimbursements, and expense reviews with a chain of approvals. In gathering feedback from existing and potential customers, we saw the opportunity to build an even more impactful product: a software platform that empowers companies to move faster by reducing overhead across all Brex products starting with spend management.

This article shares the journey our engineering organization undertook to build this new software platform — Empower — from the ground up. I’ll outline how we used proven Domain Design Driven techniques to build a solid foundation for our future products.

Refactoring breakthroughs

Our product architecture had evolved to power the complex features of our expense management offering. As with any system that evolves fast, tech debt accrues. We realized that disparate services, owned by different teams, were implementing similar behaviors. We decided it was time to gather all of our learnings, form a group of key engineers who worked on all these various systems, and take a fresh look at our expense management offering to ask: what if we were to build it again, today? What would we do differently? More than an expense management product, what if we could build an extensible platform?

Platform: For backend services, we consider a system to be a platform if it provides features that other services use to operate product features (or other platform features). They are extensible and generic, and do not make product assumptions.

This initiative led to multiple refactoring breakthroughs, a concept familiar to Domain-Driven Design practitioners, as described in Eric Evans’ excellent — and timeless — Domain-Driven Design: Tackling Complexity in the Heart of Software book.

Refactoring Breakthrough: A deeper insight into the Domain, that leads to a significant opportunity to rearchitect systems, restructure code, and redesign APIs.

Thinking in terms of Fundamental Entities

Our backend infrastructure heavily relies on Protobuf, a typed language for defining APIs, for synchronous (gRPC) and asynchronous (Events Infra) communications. Backend features are exposed to our frontend using GraphQL. The initial goal of our group was to define the set of backend APIs, described in Protobuf, that would power this new platform. This would capture the main entities and their relationships

In other words, we adopted an API-First design approach: if we get the APIs right, the rest will follow.

We needed to develop the Entities and Operations that would support the platform. Within the context of a microservices architecture, designing APIs is equivalent to modeling the Domain. Guided by the product insight, we set ourselves on the path to identifying the most important Entities, that combined, would provide all the behaviors required as well as become the basis for the extensible platform. These Entities would have corresponding data models, be exposed as APIs Resources in our backend (Resource-oriented design) and surfaced all the way to our Front-End through GraphQL objects. These Fundamental Entities would become the key constructs that customers would interact with: a minimum set of concepts to understand in order to leverage a powerful and diverse set of features.

Entity: In the context of Domain-Driven Design, it is an object defined by a thread of continuity and identity. They generally have attributes associated with them and expose an Identifier.
Operation: A function taking some input and generating an output, generally with side effects (e.g., mutating data in a database, calling another service’s API, etc.)

The first part of our analysis looked at the different features we offered: card expenses, reimbursements, bill pay. These entities have many common data fields: a user initiator, an amount, a counterparty (a merchant or a vendor), dates, and an associated payment. They also have similar behaviors: they all go through review processes, they can be approved or rejected, and they can be exported to accounting software or added to reports by financial controllers. These observations led to the first breakthrough: all these features are ultimately subtypes of a common Fundamental Entity — an Expense.

Expense: A mutable abstraction of a payment, either in the past (e.g., a credit card purchase) or in the future (e.g., a reimbursement). It implements a state machine that captures the lifecycle of an expense: created, submitted for reviews, approved/rejects, etc.

Now that we had our first Entity, we took the perspective of different customers’ users’ roles on how they would interact with it: an employee, spending the company’s money; a manager, reviewing Employees’ expenses; and a financial controller, managing the finances of the company. An employee would spend money under different contexts — a daily stipend, an offsite, a sales dinner, a trip to visit another office. A manager would review their team’s expenses, ensuring they follow the company’s expense policy. The financial controllers would design the expense policy, designing different rules depending on the spending context. They would create budgets for events and for teams. These budgets would generally be tied to a specific spending context, and managers may be responsible for them. We had another breakthrough: the need for a Budget entity.

Budget: A budget provides control and visibility of spend within an organization by associating an amount to a spending context. Expenses are associated with budgets, and their accrued amounts take credit away from the budget’s allocated amount. Budgets are financial instrument-agnostic: they capture card expenses, reimbursement, and cash transactions. They can be organized by department, location, or a group of individuals, and can be nested. It allows managers and controllers to delegate responsibilities to their reports.

Throughout our discussions, another term kept coming up: policies. What expenses are allowed or not and who needs to review — all this is captured in an expense policy. An example of a simple expense policy is requiring a memo for all card transactions, and a receipt for transactions over $50. However, policies become complex because the context matters. For example, some companies will offer a daily stipend during a remote offsite, a work-from-home stipend may not have the same receipt requirements, or certain people would be allowed to spend more depending on their title. Besides this, we also realized there were other types of requests that would go through a similar approval process — such as an employee requesting a credit card limit increase, or a manager requesting a budget for an offsite. This brought us to our third breakthrough: there exists a generic concept of a policy that can apply to different types of requests.

Request: An entity that tracks the progress of a user requesting something: a Reimbursement, a card expense, a new budget, etc. It is an abstract type which is implemented by many types of product: expense request, new card request, budget request, or user limit increase. As a Requests need to prompt for complex action to be taken, they may nest request actions.
Policy: A set of rules that applies to a type of request in the expense management platform. For a given rule, it has a set of conditions that evaluate some attributes of the request against specific values, then generate a list of actions. There are many types of actions: require receipt, require memo, require review, etc.

These 4 Fundamental Entities combined together unlock an extremely rich array of functionalities. They are deeply composable and extensible, adding key properties to a great platform, and we felt confident we could continue our initiative with this solid foundation.

The benefits of an API-First approach

Equipped with the key models of our domain, we prioritized defining the APIs our backend systems should provide, through deep design discussions. We went through multiple iterations, stress testing them at every step to ensure they would support our current and future functionalities. The bulk of our work now was to sequence the Operations that would define our expense management features.

Some key operations include:

Evaluate a Request: when an expense is submitted, either through a reimbursement or a card transaction, it will create a corresponding expense request that will be processed by the requests service. A similar flow happens for budget requests. It will call the Policy under the hood to evaluate the request against the rules defined in it and yields a list of actions the users must take.
Submit an Expense: this key expense’s operation happens when a user submits a reimbursement, or when a card transaction is processed by our system and generates an expense. It is the state transition that initiates the Request Processing operation.

The designed APIs revolved around the Fundamental Entities, and led to surfacing 4 distinct logical services: expenses, budgets, requests, and policies. Below is a very simplified diagram of the service, with how they mainly interact (either through synchronous RPC or asynchronous events). Notice how they are layered: expenses and budgets forming the top layer, that communicates down to requests. requests only communicate with the upper layer through events.

Simplified diagram of how the Fundamental Entities interact

Once the Protobuf APIs were defined, a group of product engineers (full-stack and front-end) was assembled with the goal to design the GraphQL APIs we would expose to the front end. Using the backend Protobuf APIs on one end, and the product and design wireframes on the other end, they worked to define frontend-friendly interfaces, carefully stress testing all the flows and validating the GraphQL APIs would support them.

After stress testing the Protobuf and GraphQL APIs, and separating each logical service into their own server, we were able to create different workstreams for each backend server and frontend components, and execute in parallel. Frontend teams were able to rely on the GraphQL APIs and mock data to iterate as the servers were being built; backend teams would be able to leverage the defined Protobuf APIs similarly. The beauty of this API-first approach was that very little coordination overhead was required at this point. We did iterate on the initial APIs multiple times as we learnt new things building the different services, which required cross-teams collaboration. But overall, each team was able to execute independently using the APIs as the interfaces between teams: their executions were decoupled.

Key learnings

This has been an herculean effort, made successful by an incredible team of engineers, product managers and designers working together. A refactoring of this magnitude is extremely complex. We are now in the process of migrating customers to the new systems, an initiative no simpler as it involves migrating many data models.

One interesting aspect of our endeavor was that we hadn’t fully defined all the product features of our future platform when we started defining the Fundamental Entities. We went back to first principles, and tried to extract the key concepts on which to build our future products. In the end, the APIs we defined ended up influencing the product decisions and our current roadmap.

In conclusion, here are some of our key learnings:

Think thoroughly about the Fundamental Entities of your Domain — the ones that transcend layers as the most important concept for your customers to understand.
Design architecture in layers as it helps identify the dependency model between services.
Continue refactoring when new insights emerge after learnings from your customers.
API-First design is a very powerful methodology to move fast and allow teams to be decoupled.

Many thanks to Amin Ariana for a very thorough review, Jarrod Ruhland for his great feedback, and everyone else who reviewed this post: Feng Zhao, Cosmin Nicolaescu, Pedro Franceschi.

Interested in building financial software using the principles described here? Come join us at Brex!

A large scale redesign journey using Domain-Driven Design techniques

Refactoring breakthroughs

Thinking in terms of Fundamental Entities

The benefits of an API-First approach

Key learnings

Written by Thomas Césaré-Herriau