Our Journey from a single Monolithic system to a Microservices based architecture. How did we do it? Why did we do it?

Published in

Rooter Engineering

7 min readDec 13, 2018

When we started our journey at Rooter, the term “microservices” was already a hot topic among the tech circles, but at the time we decided against it because of multiple reasons :

Time constraints for the launch of MVP
Not enough purposes to be focused, in isolation
We could not see the worth in data duplication which we would have ended up with, if we had implemented private data sources for each microservice to achieve high cohesion

Thus, we built Rooter Tech with a single monolithic service at the heart of it. It was going well for us, numbers were going up, the product was garnering attention, we launched our iOS and Web clients after Android, our monolithic system was linearly scaling with the users, we were getting comfortable, overconfident (and fat).

Then came the grinches.

One Bug to Kill them all

It was one of those nights of “IPL 2017” when all our metrics were improving. We had just launched our new feature which provided a much faster and simpler scorecard than the ones that the market had to offer. Suddenly all of it went down, servers were crashing at a particular request, uninstall rate touched a new high, users switched to other competitors within a fraction of a second. Panicked and scared, we reverted back our codebase, debugged and debugged and found out a silly bug in one of the add-on features of the new scorecard module because of which entire monolith had crashed.

Fail slowly and get left out

As we started succeeding, people started to replicate us and suddenly there were a plethora of live prediction platforms in the market. To keep our advantage and to grow exponentially, we had to conduct experiments at a breakneck speed as part of our fail fast strategy. Now our only bottleneck was the single system which had to be refactored after every major addition in the codebase and was getting messier and more complex with each new feature launch.

Bigger the team, higher the coupling

The team size grew and so did the coding styles, even though we tried to enforce a common set of coding principles. Because of the monolith, there were no clear set of ownership boundaries.

It’s all about the money

As we scaled, our AWS bill almost doubled as to cater to a higher RPM on the prediction feature the entire monolithic service had to be scaled up.

After months of procrastination, we decided to swallow the pill. We decided to solve the above-mentioned bottlenecks by adopting a microservices-based architecture.

With multiple loosely coupled microservices, we would be solving the “single point of failure”. Even if one of the services went down, it would not affect the entire system.
With independent codebases for all the microservices, the time for development and deployment would be reduced multifold.

3. Independent codebases also promised to provide mutually exclusive ownership.

4. The biggest advantage of microservices in our case was the reduction of scaling costs. With different services, we would be scaling only the overloaded services and not the entire system.

However, the implementation was easier said than done. Experiments could not be stopped, our competition was gaining on us, and the shortage of manpower was the biggest pain.

So, we decided to do it service by service, instead of breaking the monolith all in one go.

Rooter Notification Service

The first service we segregated was our notification delivery system. As a product which provides live engagement for sports events, notifications are obviously a crucial part of our product.

Earlier, we had a module in our monolith which requested GCM/APNS service directly with the notification payload. During the segregation of the notification delivery system, we also decided to develop a logging and monitoring system through which we could micro monitor our notification delivery to the user device.

In the new service, instead of sending the payload directly to the GCM/APNS cloud, we use AWS SNS through which we also get delivery logs to cloudwatch which in turn triggers a lambda function, which logs the delivery report to our database. Apart from this, we also send an acknowledgement request back to our server from the client on successful receipt of notification. This level of monitoring helped us in enhancing our notification delivery rate from ~50% to ~70% which in turn helped in improving the retention numbers.

Rooter Stats Receiver

Rooter receives sports data streams from different sports data providers (Opta, Cricket API, Sports Interactive) which power our various features (scoreboards, live fantasy game, prediction game). Earlier we were receiving the feeds to our monolith only, but the problem was that our monolith was written in node.js, which is not best suited for the synchronous XML processing that was being done on sports feeds. So, we segregated this service and wrote it from scratch using PHP 7.0 running on a Nginx server.

Rooter Leaderboard Service

Leaderboards are at the heart of gaming on Rooter App. Earlier, leaderboards were built in MySQL itself with a caching layer in between, but the MySQL calculations were killing our systems. In the new service, we moved our leaderboards to Redis sorted sets. We maintained consistency by propagating Mysql changelogs to Redis using a data pipeline (AWS Kinesis). Segregating this service had the greatest impact on our resource usage costs.

Subsequently, we also moved our MySQL unique keys to Redis hashes using the same ETL service which increased write throughput multifold.

Rooter Live Fantasy Service

This service powers the Rooter Live Fantasy game. It is an auto-scaling fleet of EC2 machines running node.js servers which act as an internet facing API server for Rooter Live Fantasy Game.

Our Live Fantasy Service handles all the heavy-lifting of Rooter Live Fantasy game providing a seamless service to an average 40k concurrent users engaged during a live match. This service incorporates a Redis cache, a MySQL persistent storage, a fleet of auto-scalable EC2 servers optimized for high network IO and concurrent connections behind an elastic load balancer.

Rooter Scoring Service

This service provides real-time (latency of ~100ms) scoring for users in a LIVE fantasy game. When a user has selected and powered up a player in the live fantasy game (let’s say Lionel Messi) and Messi scores a goal (which he does so consistently), the game being a “LIVE” fantasy, all the users (can be in the order of 100k) who have this player need to be awarded the points, timelines need to be updated, leaderboards need to reflect the event and all this has to happen within ms after the goal has been scored in Camp Nou - for the perfect user experience that is the trademark of Rooter.

For this, we use AWS kinesis to propagate Mysql Changelogs, Drools to write the game rules, and a non-indexed write intensive version of MySql to update the scores running in a container - in order to achieve low latency and high scalability.

Rooter Scoreboard

Rooter provides the fastest scoreboard of eight sports (and counting). We use Google Cloud Firestore collections for this, which in turn update the client scoreboards via sockets.

We update the collections on Firestore after listening to Mysql changelogs of stats relational DB, which in turn updates the scoreboards on all client scorecards that are subscribed to that socket.

Rooter Profile Service

Gamification is an essential part of our user experience. A good gamification system can do wonders for user retention. And by a good gamification system, we mean all the profile data points and accomplishments have to be updated in real time (<100 ms) for a user base that is touching a million mark now.

The choice of DB was of utmost importance here. We could have easily used a non-relational database but as the large data sets had to be crunched in real-time with minimum latency and a high write-to-read ratio, we decided to use a carefully indexed MySQL DB till the time we move to a more suitable database, which would be an expensive resource like Presto.

Rooter Gratification Service

This service handles real-time gratifications for the users when they redeem their Rooter coins for the variety of coupons available on our app, provided by our numerous partner brands.

Here is a detailed architecture diagram of our microservices-based architecture.

Currently, all our services communicate through HTTP protocol using HMAC authentication in a VPC restricted security group. In the future, we will be moving from an HTTP based communication to a faster RPC for eg. gRPC.

If you are a passionate techie and a sports fan and want to be a part of Rooter Tech as we enter the next level of scale, drop a mail to arpan@rooter.io or akshat@rooter.io with your GitHub profile link.