In a system comprised of multiple microservices and 3rd party apps, getting data from these services can create duplicate calls across the system and increase coupling. One strategy to overcome this is to create an Aggregator cache microservice which caches the current data state of the system and having a single querying endpoint for the system’s data, serving the system.
In Wix CI we have many microservices and 3rd party applications such as TeamCity build servers, GIT source repository and JFrog’s Artifactory binary repository. Multiple microservices in the CI need to call these 3rd party applications and other microservice in the CI system for information of the system’s current state. This situation creates tangling coupling between the microservices. Our solution was to write an aggregator cache microservice which aggregates the data from the different microservices and store it as a blob in the database. In Wix CI the basic concept of information is a Build. A build comprises of its metadata, artifacts, owners, release candidate versions etc.
The aggregator microservice calls the different 3rd party apps and microservices and consolidates the data from them into a Build object. Microservices in the CI system which need this data, can now call a single endpoint to get the current Build consolidated data coupling them to only the aggregator cache microservice.
A crucial issue in this architecture is when the aggregator cache refreshes its data and which strategies to use. In Wix CI we use kafka events as triggers to “move” the system. The aggregator cache consumes events from other microsystems and 3rd parties which indicate a change in the state and then calls them to get the data of their new state. There are two strategies possible for refreshing the state. The first is to refresh only the data from the microservice or 3rd party, which produced the kafka event. The other is to call all the data endpoints to refresh all the data (even if nothing has changed in the microservices). This creates a single flow for the aggregator cache, while the other will create multiple flows one for each data point. Once the aggregator cache microservice completed refreshing to the new state, a kafka event will be produced to inform the other microservices in the CI ecosystem, which consume these events, to call the aggregator cache to get the new state.
Strict Updates vs Best Effort
We chose to do a full update on any of the kafka events, this posed a dilemma if to rollback the data on an error (full data integrity) or to use the best effort approach. The best effort strategy collects the current state of the system even if there is an error getting data from other services. This creates a situation where the aggregated state does not reflect the current state of the system. In some systems this can be crucial but in our system we decided to opt for best effort.
The following example displays how the best effort approach works.
In Wix CI we have a microservice which maps an artifact to a developer. Our CI dashboard will display a warning message if the developer tries to deploy a microservice which he is not the developer of. If a new developer is added to the mapping service, this will produce an event for the aggregator cache to refresh the data, if the refresh fails retrieving data from another microservice this will not fail the refresh and the new developer will be added to the cache and the dashboard Will enable the developer to deploy.
Single Critical Failure Point
Using an aggregator cache microservice creates a single critical failure point. A bug or failure in the aggregator cache will affect or shut down the entire system, this is the downside of using the aggregator cache as the entire system is coupled to it.