Building a reporting service in Microservice architecture

4 min readApr 17, 2019

We were told that sharing is caring but not in case of microservices where we have one database per service. One service cannot access data to another service by directly connecting to a database instead we need some other mechanism (REST API's ) to fulfill our need. Generating reports from the distributed database requires a lot of effort and most of the time we end up implementing the anti-pattern of Microservices.

Let's illustrate by example. Say we have two microservices from which 1 holding data of orders while other holding data of customers. We want to fulfill a minimal business need by generating reports of all orders with customer information.

The use case is pretty simple and valid business need. In a monolithic system, a simple join is good enough to solve this business case. The problem with reporting is two-fold: how do you obtain reporting data in a timely manner and still maintain the bounded context between the service and its data? Let’s discuss some of the ways of handling reporting typically in microservices and build reporting service accordingly.

Connecting with Database Directly

In this case, the reporting service will retrieve orders and customers details from orders and customer database correspondingly by directly connecting with their database. This technique is simple and will help to fulfill a business need but it fails to satisfy the business bounded context of microservices. Even a minor change in any database schema requires modification in reporting service.

In order to solve the business bounded context problem, we use Http Pull Method or aggregation service.

Aggregation Service

This service will use the REST API call to both orders and customer service and return results. Circuit breakers are essentials for this kind of technique otherwise we are likely to get 504 and performance issues. While this purely satisfies the microservices principles but its immensely slow in performance. It’s a simple Http Pull Method which causes a decrease in performance as it adds load in 3 services. First, get all orders than get all corresponding customers then aggerate information in reporting/aggregating service and return results. Application performance degrades as data scales up adding up data to carry over the network. Aggregation service is just a Facade over different service, aggregating data from all other services and return back desired results.

Aggregating data with different services

With the above 2 techniques, your Data Team will not be happy as they are still joining between the different database of microservice and pushing it to the data warehouse. Adding more database will add more complexity in defining ETL pipelines. From above 2 concepts, we came to the conclusion of a dedicated database for reporting service.

Batch Pull with Dedicated Database

Introducing a dedicated database for reporting service solves the problem of reading from the different database but how do we populate the information timely. Batch Pull model is a technique where information will be pulled from a different database of microservices in bulk and updated in the reporting database. Setting up a job e.g 1 hour and pull all the updated records in bulk and update reporting DB sounds a good option but still breaking the business bounded context. A small change in any schema can break a batch job and need to be changed each time. Reading on an hourly basis and trying to find the right updated records is not efficient as well. More the data grows more time taken by the job.

All the above 3 techniques come to mind initially when building a reporting service but all have certain flaws and are anti-pattern for microservice. Lastly, we come to a technique called event push model with a dedicated database.

Event Push Model with Dedicated Database

Reporting service is never bounded to timely consistent data it moreover bounded towards availability of data if talking in terms of CAP theorem. In a microservice architecture, if designed correctly whenever the state of an application changed an event is published to event processor (Kafka) and consumed by multiple application. First and foremost thing is to design a reporting database correctly and then write a master listener which reads from each topic in Kafka and update the reporting database correctly. These master listeners are just workers that read messages from different topics in Kafka and update the database with the right information. As already discussed reporting database requires availability instead of consistency. Information will be consistent after some time depending on the workers’ throughput. Higher the worker throughput is higher the consistency of reporting database is which can be changed based on business needs.

With the last technique in mind, we are able to maintain the eventual consistency of data without disturbing the microservices principals. ETL pipelines can easily be integrated and as a company, we have one source of information for all data.

Conclusion:

Dedicated database for reporting service.

Read all messages from all topics if needed from Kafka or any stream processor.

Increase or decrease worker throughput based on consistency on data.

For further details read Microservice Antipattern and Pitfalls