At Weave, we knew that how we stored data would have a big impact on everything we did, so we focused on it form the beginning. We weren’t always right, but we learned and made changes as we grew. This article covers the path we took to data storage, the trade-offs we made, and where we are today.
The first step Weave made in moving from a monolith to microservices was deciding how to store data. One of the main advantages of microservices is that they own the behavior for a set of actions, this isolation makes changes easy because other services are isolated from side effects. However, storing data is hard. Especially consistent, highly available, partitioned data, as the CAP theorem points out. So our first major decision was encapsulation of behavior and state inside the microservice vs. specialization and expertise of maintaining data in one service.
We decided that maintaining data correctly was more important than encapsulation, and built the DataService. The DataService wrapped the database with a REST API managing persistent data for every service. So assuming we were childish enough to name our services after Simpson’s characters, this is how our stack looked.
Initially, this worked great. We had a slight overhead of having to create both SQL queries in the DataService and REST calls in our services, but we built libraries to help which mitigated this cost. As we grew we discovered a lot of things about how to improve database performance, security and redundancy. Life was good, until we wanted to make changes to a service’s data structure.
Because each service had direct access to the data for every service, we developed a practice of extracting all data from the DataService directly opposed to asking the service that owned the data. For instance, if Mojo wanted to know about Elder Moleman’s cataract, instead of asking Elder Moleman, it just grabbed it from the DataService, creating the service communication pattern below.
In retrospect, it is obvious this would happen. All the services already had access to the data and building up the infrastructure to talk to other services was just extra work. Then one day Elder Moleman wanted to change how it stored data about its cataracts. That is when we realized the depth of our contamination and the paralyzing impact it had on making data storage changes. No service could make any changes to the underlying data without an upgrade path for all services that accessed its data. This led to upgrade dependancies between services that took ages to resolve. Mojo couldn’t update until Grimes updated, which couldn’t update until Furious George updated which couldn’t update until Mojo updated— deadlock.
To prevent this data access contamination Weave introduced API Keys. Each service was assigned a key and the DataService restricted access to tables based on the key provided with each request. This forced services to get all information for the service that owned it, resulting in the pattern below.
By partitioning the data and restricting access, we changed our habits and overtime removed the data contamination. Some changes were harder than others, particularly when data that could be added with a SQL join now required an external API call and merge, but in the end, those problems are way easier to deal with then the deadlock we had before.
Production using microservices with API keys and the DataService was amazing. For a long time I thought we had found the perfect solution. As Weave continued to grow we learned even more about database scaling and protection. Every service benefited from the learning by being connected to the DataService. But eventually this architecture developed its pain points as well.
Weave’s DataService had become a full scale monolith with all the troubles associated with them. It was a legacy code base with multiple ways to do similar things. It required a fair amount of tribal knowledge to work on it. Developers were scared to work on it because everything in the stack depended on it. It was difficult to scale horizontally. Malformed queries affected the performance of every service. I could go on and on. None of these issues was too dramatic, but together they slowed us down and drove us to find another solution.
Schemas + Connect Package:
Based on what we learned with the DataService, Weave decided to go back to our first decision of letting microservices manage their own data. Weave set up multiple database servers with a PostgreSQL schema for each service. Backups and other maintenance support is still restricted to a small number or servers, services can manage their data structures directly in a way that requires little tribal knowledge, performance is less coupled across services and Weave provides a consistent, safe, easy way to manage data. This allows us to create the communication structure below, while still benefiting from the expertise of a small team across all services.
Not all data is equal. Some data is more sensitive and has to comply with government and internal security standards. Weave uses a hybrid structure for all sensitive data. In cases where stored data must comply with these higher standards, the data is still managed through the DataService.
By choosing to start with a DataService model, Weave had an excellent proving ground to learn how to manage data. Using the information provided by our shared database package makes it easy for many teams to follow best practices without being shackled to one service. It would have been difficult to determine best practices and deliver a stable product with each team learning how to manage data independently. But now that Weave has this knowledge, the hybrid approach ensures that any sensitive data is properly maintained, while allowing individual teams the ability to choose whatever datastore fits their needs, and move fast.