Microservices and Persistent Data
Database per Service?
How valid or useful is the “database per service” microservices architectural pattern? Given the pattern’s crippling impact on managing distributed data with microservices, that is a question worth answering.
The database per service pattern results from a misunderstanding or misapplication of principles described by Werner Vogels, CTO at Amazon. Vogels is a very smart guy and one of the creators of cloud computing as we know it today. When Vogels began building the software that helped make Amazon into the giant it is today, he articulated a set of principles upon which that software was designed. Among those principles are:
- Software components should be built as independent stateless services.
- All business logic in a service should be encapsulated with the data upon which it acts.
- There should be no direct access to a database from outside a service. Any and all access to a database should be accomplished by invoking a service specifically implemented to do so.
- Each service should publish an interface that enables access to its data and functionality by other services.
Inferring that these principles mandate a database per service is an illogical extension of the concept and runs counter to many of the reasons for which we use microservices.
What Is the “Database per Service” Pattern?
The database per service pattern requires that any microservice needing persistent data is to have its own unique database to be accessed only through that microservice — as if it is assumed that an application will have only one instance of that specific microservice. A Microservices Architecture blog gives this description of the pattern: “Keep each microservice’s persistent data private to that service and accessible only via its API. A service’s transactions only involve its database.” The article’s analysis of the database per service pattern is as follows:
Using a database per service has the following benefits:
1. Helps ensure that the services are loosely coupled. Changes to one service’s database does not impact any other services.
2. Each service can use the type of database that is best suited to its needs. For example, a service that does text searches could use ElasticSearch. A service that manipulates a social graph could use Neo4j.
Using a database per service has the following drawbacks:
1. Implementing business transactions that span multiple services is not straightforward. Distributed transactions are best avoided because of the CAP theorem. Moreover, many modern (NoSQL) databases don’t support them.
2. Implementing queries that join data that is now in multiple databases is challenging.
3. Complexity of managing multiple SQL and NoSQL databases
You will notice that there are many other ways to achieve the two benefits described and that the resulting three drawbacks make managing distributed data when using microservices is virtually impossible. From a purely architectural perspective, that seems to be a very poor trade off.
What’s Wrong with the “Database per Service” Pattern?
Keep each microservice’s persistent data private to that service and accessible only via its API is absolutely correct and is a viable and important microservice constraint. It is taking that constraint to the illogical extreme of a unique database that is not. Let’s look at some of the reasons why:
- One of the primary purposes for which we use microservices is to exploit the scaling, reliability, and failover capabilities available through cloud containers and container orchestration. That requires that we be able to deploy multiple instances of an individual microservice. In most cases, that implies that the state of each deployed instance’s persistent data must be identical (which is one reason why we try to keep microservices stateless). That is, in all practical terms, impossible to do with the database per service pattern.
- To implement effective failover strategies — made necessary by possible software, hardware, or network disruptions — duplicate datastores mirrored on multiple servers in multiple locations are required. Not only does the database per service pattern fail to acknowledge that requirement, it makes it difficult or impossible to satisfy.
- To implement effective scaling strategies — in order to respond to dynamically shifting workloads — duplicate datastores mirrored on different clusters are often required. Not only does the database per service pattern fail to acknowledge that requirement, it makes it difficult or impossible to satisfy.
- A well-designed microservice represents only one instance of a thing at any single point in time. For example, a customer microservice should represent an individual customer, not all customers. Typically, all the logic in a customer service acts on a single customer instance, not on all customers simultaneously. In that regard, database per service should be row per service, an even less practical solution. This is a totally unnecessary impedance mismatch. If you need a service that acts on an array of customers, the correct microservice pattern is an aggregator or array — creating a customers microservice that uses the customer microservice to act on individual customers. A software impedance mismatch is a mismatch between what you have and what you want.
- The database per service pattern short circuits the concurrency and consistency protections of modern DBMSs, the relationship and referential integrity protections of relational DBMSs, and the data structuring capabilities of noSQL document DBMSs — as well as all DBMS multi-table joins — for benefits more easily achieved by other means.
What Should We Do Instead?
In the case of database per service, schema per service, or table per service the patterns don’t work for any but the simplest microservice requirements. They don’t work because they misunderstand the constraint they are attempting to enforce: “There should be no direct access to a database from outside a service. Any and all access to a database should be accomplished by invoking a service specifically implemented to do so.”
So how can we realize the benefits of this constraint without incurring the unacceptable drawbacks of database per service. In the Cloud Actor Model of microservice design we have patterns that are designed to do just that. The model implements a set of concepts that make implementing cloud-native microservices with distributed data more straightforward and manageable. The Cloud Actor Model uses:
- Actor model microservices as LEGOs® to build distributed cloud applications because they’re simple, make sense, and can easily be snapped together to build bigger things. That simplicity and intuitiveness makes it easier to design and build complex applications with actors as the building blocks. Actors are small independently testable, deployable, executable units of code. Actor instances are reentrant and thread-safe. Actors communicate via message passing. Everything that an actor instance needs to do its job is either in the message to which it is reacting or in the persistent resources to which it connects. Actors do not use locks so they cannot become deadlocked. There is no separate framework. All server-side components are microservices and deployed only when needed.
- The concept of actor roles. Among their many roles, actors can be plain microservices, microservice clients, event publishers, event handlers, message brokers, distributed loggers, error handlers, repository handlers, resource handlers, and Web servers. Each role has its specialized mission, defined behaviors and constraints, and its own base programming template to make development easier.
- The repository handler actor role as application interfaces for creating, reading, writing and deleting data through logical data views. They can work with single resource handlers or multiple resource handlers. Repository handlers present a logical data model to application actor instances and interact through a REST API with resource handlers to physically map, store, and retrieve data in the physical data model. Repository handlers and their associated resource handlers do the heavy lifting of distributed data management (failover, scaling, replication, consistency) for the rest of an application’s actors.
- The resource handler actor role as application adapters to the physical data model. They are used by repository handlers to map resources to and from persistent storage, very much like an Object Relational Mapper (ORM), such as Hibernate, maps objects to and from relational databases. Resource handlers are accessed through a REST API. Resources are things that reside in non-volatile system storage like files, key-value stores, and databases.
- Self-organizing broker actors as the glue that connects individual actors by organizing messaging among them and by acting as circuit breakers to mitigate cascading error conditions. Brokers are the only stateful actors and manage the failover, scaling, and self-organizing capabilities of the Cloud Actor Model. When a broker is running, it broadcasts its presence to all other reachable brokers. Brokers are federated across cloud clusters and share state information with each other. A small broker proxy lives as a sidecar in every actor pod to facilitate actor registration and message passing using the optimal broker. Brokers take in messages addressed to a specific actor type and route them to the physical address of the optimum instance of that actor type. A mailbox is paired with each individual actor instance to buffer incoming messages for the actor and to send messages and communicate with broker proxies on behalf of the actor instance.