How to build an asynchronous, scalable, idempotent and highly testable microservice
This article tries to translate all the process and show the experience of building a microservice, from planning to shipping, based on a real case at QuintoAndar.
INTRODUCTION
In this article, I will try to show how we created a fully asynchronous, scalable, idempotent microservice with high test coverage. To understand the steps and decisions that we’ve made during implementation, I’ll first explain a little of the context that motivated the creation of this microservice.
A few months ago we started working on a feature to enable users to have exclusivity in the renting process of a house, we called this reserving a house.
With QuintoAndar’s business model everything happens very fast. This is very good for the tenant who wants to rent without bureaucracy and for the owner who wants to rent as quickly as possible but it can also be frustrating if you really liked an apartment, but someone rented before you. That’s why we decided to create a house reservation feature to improve the experience of our users. If you really liked an apartment, you can make a deposit and you will have exclusivity to close the deal.
In order to charge the deposit, we decided to integrate with some credit card payments provider. We took this opportunity to create a financial integration HUB that can be used by other services in the future.
Given the need to create this HUB, we started the project planning phase, if you want to know a little more about the methodology we use in all our projects, I recommend the reading of this Adilson’s article that explains how we work and try to apply the Agile’s principles in the best way as possible in our team.
STARTING THE PROJECT
We started by setting up a Java application, with Spring, ready for unit, integration and API testing, CI/CD configuration and several other tools we use to maintain good code quality and monitoring, for example, SonarQube and Sentry. In the future, we can write a new article talking about how this process was and how we optimized the startup of new microservices here at QuintoAndar.
With the application bootstrap ready, we began to study how we would integrate with the payment gateway. Defining the information flow was the easy part because the payments provider’s API is very well documented and we were able to abstract most of it. But we didn’t want out HUB to be coupled to that particular gateway in case we decide to switch to another one or have multiple gateways for fallback or different payments methods.
DECOUPLING FROM THE PAYMENT GATEWAY
This was our first challenge, how to create a service that can work with several gateways without exposing this complexity to the client or requiring refactoring when integrating with other providers. So, we decided to use the facade design pattern in our implementation as shown in the diagram below:
Each time a request reaches our API, our controllers only communicate with our Facade requesting an operation to be performed, no matter to the client which gateway we will use, it only requests the operation is passed to the Facade and it makes the decision of which gateway to use.
The facade is responsible for communicating with service methods that are agnostic to third-party gateways, ie part of the business rules that exist no matter which gateway is being used. And also communicates with the specific gateway methods that will charge the credit card, for example. So every time a new gateway is added all that’s necessary is to implement the specific rules of that new gateway and plug it into the facade with the business rule that determines when it should be used.
With this, we decoupled our HUB from specific payments gateways and integrating with new gateways would be much easier.
ASYNCHRONOUS MICROSERVICE
If you have already made an online purchase, you may have noticed that sometimes you enter your card and the payment is instantly approved, but in others, you receive a message like “Awaiting for payment confirmation”. This occurs because a charge isn’t always synchronous. When you make a purchase on a website, the checkout system communicates with the payment gateway, which in turn communicates with an acquirer (Stone, Cielo), who communicates with the card’s brand and in turn communicates with the bank to know if you have a limit available to do the purchase.
Now imagine this happening on a Black Friday where millions of transactions occur at the same time throughout the country. You can imagine how many possible points of failure can occur and making it impossible for all of that flow to be synchronous. How can you ensure that when you make a reservation of a house on the QuintoAndar website this whole process works correctly?
The payment gateway documentation says that most requests will take less than 5 seconds. But how do we guarantee during spikes, our clients won’t receive a timeout because the response from our gateway is taking too long? To guarantee this, we’ve decided to implement a fully asynchronous payment hub.
No request other than a GET is performed synchronously in our microservice. If a reservation request reaches our microservice, we post it in a queue and return an HTTP 202 — Accepted response to the client and this request will be processed asynchronously by our service. As you can see in the diagram below, a request that reaches the API is passed to the facade, the facade posts this notification in the Amazon Simple Queue Service (SQS) and then a consumer in this queue picks up this message to perform the charge request to the acquirer.
This ensures that our microservice can easily scale since no HTTP request will be pending for a long time, we avoid timeouts and we can have as many instances of consumers as necessary to process the incoming charge requests. With SQS, we also already have a transparent retry policy, any exception thrown during the processing of the request causes it to return to the queue and be processed again according to the criteria configured for that particular queue.
When this request has been processed and finished, we send the result using webhooks. That’s also done asynchronously using SQS.
This ensures that our system will not return timeouts or have any trouble scaling as the volume of payment requests increases. However, a new problem arises, if everything is asynchronous, how do we ensure that the same request made multiple times by duplicate client calls or by any other reason that generates duplicate calls to the Hub, isn’t charged multiple times?
MAKING THE SERVICE IDEMPOTENT
A state machine represents the modeling of the problem that we are trying to solve. A state machine can only be in one state at a time, from this state we can know the whole history of changes from the input in the service. Also, a transition to a next state can only occur if a pre-established condition is satisfied.
The diagram below represents the state machine we’ve created for charges in the Payments HUB.
With this state machine, we were able to validate in an SQS consumer whether a charge is still pending of processing before performing the charge request again for the gateway. This means that even if we receive 2 or more similar requests, we’ll only charge once.
AUDIT AND AUTHORIZATIONthat
In many systems, it is necessary to maintain traceability of how events occurred, in what order they occurred and by whom these actions were executed or requested. Thinking about a Financial Integration Hub, this becomes even more important as any undue action can impact the company as well as our customers.
Our payment hub is not a service that is called directly by the user on the website, there is always a microservice to interface the HUB and the user since the HUB has no logic of when or how much to charge, it only performs the requested amount. We then decided to adopt a 2-level authentication where the service that interfaces and the user must be authenticated.
When a request reaches the API, it calls our authentication service to validate the identities that are making the call, if they are valid, it adds them in the security context of spring and for each operation performed that changes some data, the users of the application on that context are saved in the audit log.
KEEPING A HIGH CODE QUALITY
As discussed earlier, in the application setup we have added tools and plugins to maintain good code quality, including JaCoCo which is a plugin that calculates the percentage of test coverage of the code. We have established a minimum of 95% coverage for this microservice and in addition to unit testing, we also added API integration test, thus we have a greater guarantee that our service will work as expected.
We also added SonarQube which is a static code analyzer that generates several very interesting metrics that help us maintain a quality code. It generates coverage reports, indicates possible bugs, points code smells and all of this can be easily configured.
CONCLUSION
During the application development, we faced some changes in the payment gateway API and using this facade architecture, we had just a little rework making the necessary changes and our client didn’t have to worry about the interface specified in the API, since any necessary change could be made only in the service related to the acquirer.
Another interesting point is that maintaining high test coverage also directly influences the quality of the code. That’s because when a new developer implements something, the project already has several examples of how to test, how the code is organized, and other patterns and conventions. If something changes by accident or has some unexpected side effect, some test will fail, preventing the issue to make its way to production.
In the end, we can see that planning before starting the implementation and the mindset of building a project with quality from scratch can produce a surprising result. The result shows us that all the investment of time planning, drawing, discussing, listening and learning with every part of the process of creating this service really paid off.
If you liked this article and want to help us change the way people live: