How we made a 20 times better performance Microservice.

Arifin
Inside Bukalapak
Published in
7 min readDec 15, 2018

A story about system migration

A jet. Photo by DON JACKSON-WYATT on Unsplash

Bukalapak is one of the biggest Online Marketplace in Indonesia. Today we are serving millions of customers doing business from uploading a product, making a transaction, communicating via chat, to writing product reviews.

This article will tell you a story about how we pulled out one of our features from the big main web service into a smaller independent microservice, and getting a splendid performance result by doing it. That is, 20 times better service response time.

Our web server was a single monolithic application built on top of Ruby on Rails. In the early days, everyone committed to a single repository that is responsible to serve all the business. Everything was good then, everyone made changes, code was deployed, everybody was happy. Who doesn’t love Ruby anyway, moreover when it’s packaged in Rails, Minaswan! 😙.

Time goes by. Our customer numbers is growing, also with our traffic, and we got many new engineers, this is the good part, this is what we want. But along with it, our lines of code is growing bigger as well, that we were quite sure that this is not the good part.

Our code was getting so big that it’s hard to make changes without doubt that the changes will not affect any other part of the business. There was the time that our web server must go down for a while because of a very slow, unfamiliar database query suddenly occurs in the system. We restored the system, examine the query only to realize that it was a legacy and not the important part of the core business.

That shouldn’t happen, any errors that is not related to another business process shouldn’t affect the whole system. We hope that our buyer can still buy the products from our sellers while, say, there is a crash in our train ticketing system, and vice versa.

Microservice comes to the rescue.

Eventually, our CTO was giving a direction for the engineers to make each new features in a separate code repository, separate database, separate service, or simply said, in a separate sub system. Also, the engineers are expected to pull out each features that is already run in the main web service into an independent service gradually.

One of that features is: Product Review.

This is the appearance of our product reviews feature in Bukalapak:

Reviews of a product in Bukalapak

And the journey begins.

Initial state

At the time when our product reviews feature is still within the big main Bukalapak web service repository, we are serving hundreds of product reviews HTTP requests per second. Every responses of the requests is serving a list of product reviews that is matched certain search parameter like certain `product_id` or `user_id`. From our monitoring tools, we acknowledged that each requests is served in about 100 ms.

Response time of product reviews endpoint.

Our monitoring tools also gives us a breakdown information showing that most of response time is spent in the language itself, about 60ms. At that time, product reviews service was running in Ruby on Rails.

That number of response time is actually relatively fast to serve user requests, no one said that it is a slow experience when it comes to see a reviews of a product.

But there is one thing to be concerned about it. This product reviews feature that is serving hundreds of requests per second, where each request averagely contains 5 to 20 records of reviews, is still using the same database with our products and transactions records.

As we already have millions of reviews, this amount of records will only growing more and more, giving more workload to the main database where other very core business transactions resides.

It would be such a relieve if we can pull out this abundance of traffic and query from the main service. So even if it has to grow to billions of records, we don’t have to worry that our core functionality will be directly affected.

Even more, by moving it out from the main service, My team that is dedicated to be focused on developing the feature will have more flexibility to change any flow or schema of the database without worrying about affecting core business.

So we decided to move the backend service of product reviews into a new one, with a new database that is stand independently from the main web service.

Technology that we used in the new service

We wanted it to be fast, even when it’s growing to vast amount of records. After discussions with every engineers that is expert in various fields, we decided to use the following stacks in our new service:

  • Golang as the programming language.
    We are already have services that is written in Golang. They shows much better performance and less footprint than services written in Ruby. Golang also have a decent capability to do concurrency. No offense, we love Ruby, but when it comes to microservice, and that one thing we pursue is speed, Golang comes to play.
  • MongoDB as the main storage.
    We have a pretty unpleasant experience when trying to scale up our MySql database engine horizontally since that’s not what it’s emphasize as a feature from the start.
    While we found transactional database feature is immensely useful for certain case, we don’t think that we need it in this particular case of product reviews. Horizontal scalability is more important, so that we went with MongoDB.
  • Elasticsearch as the search engine.
    We have used Elasticsearch for our product search feature. And we have an excellent experience with it. So we went with it.

Migration Process

We built the project, we wrote the code, wrapping up database to meets the current existing logic, make sure that every functionalities works as it is now. Engineers working hard together with Quality Assurance to make sure that everything would run as expected when the service is deployed.

The project is finished. QA has tested and confirmed that from our consumer’s point of view, this new service would replace current old service appropriately. There came the time to deploy the service.

We planned the migration process, and was doing the migration by following it. Here is the plan:

  1. Deploy the New Service, with all of it’s database dependency (MongoDb and Elastic Search) deployed too. At this stage, the new service must not yet receiving any public requests. All requests must still be handled by the old service.
  2. After the new service is deployed, old main web service must inform the new service for any update of product reviews. New service will save it with the same ID as in old service. For every record sent to the new service, make sure that it’s indexed in Elasticsearch as well.
  3. Monitor how much traffic comes from old service to the new one. Validate the data integrity between them. Make sure that all new created and updated product reviews in the old service also recorded in the new service.
  4. Execute Data Migration. Data migration should happen from service to service, not database to database, since the database engine used itself is already different. Same with step 2, make sure that every record submitted is indexed to the search engine.
  5. After all data copied to the new service and validated to be precise, move Public Traffic to the new service. In this step, we are not yet giving 100 percent traffic to the new service. We want to make sure that the new service could handle the same amount of traffic with the same (hopefully better) response time than the old one. So we put load balancer before the service to split certain amount of traffics to the old and the new one.
  6. Gradually increase the traffic received by the new service until 100 percent.

We followed the plan. Fortunately, everything’s works well. Some interruption occurs in the middle of migration. Like unexpected error, network failure, etc. But with a good strategy it is handled well.

As we told in the second paragraph of this article, we got pretty decent result. Monitoring metrics shows that the response time for the product review endpoint drastically reduced 20 times faster from 100ms to 5ms. We tried it manually on the device, and the faster response really affect how we feel about the product. Hey, we noticed the difference.

Following image is the screenshot of reponse time before and after the traffic is forwarded to the new service.

10x better performance after it’s pulled to a microservice

Hard work paid well, time for some snacks, cheers.

Oh ya, this is the team that was working hard for this project to work. I am one of them, thanks for the team work, guys.

Customer Review squad, celebrating birthday of our Product Manager.

Not present in picture, we got much helps from our System Reliability Engineer, System Engineer, and Core Team that guides us to build such a good service. Thank you very much for the very supportive environment in this company.

Thank you for reading.

--

--