Fiverr’s Microservices Evolution — Part 1
From a monolith to a chimera and beyond. The journey to architect our backend microservices.
Microservices @ Fiverr are a fascinating thing. They truly are!
Spending quite a long time in the engineering team here @ Fiverr, allowed me to witness a full architecture being born, getting mature, and being replaced by another better architecture with its set of tools and paradigms more than a couple of times!
This series of blog posts, will revolve around how our journey begun, how we structured our services and databases, what challenges we faced, and what were the set of guidelines that drove us each time to a better, faster, more decoupled and resilient solution. It will also discuss general ideas around the love-hate relationship between microservices and data, event sourcing, CQRS, and how they integrate into a fully functioning platform here @ Fiverr.
In the midst of 2010, Fiverr.com was a website build as a monolithic Ruby on Rails application backed by a MySQL database.
As the site gained popularity and traffic increased, additional layer of caching had to be added, in the form of memcached and Rails Action caching, and to avoid real time calculations of data for search and catalog browsing, some cron jobs where added to the mix to warm up the cache with prefetched information.
As any fast paced startup, Fiverr wanted to move fast, with high cadence, and as many other companies such as Airbnb, Etsy, GitHub & Shopify back then adopted Rails as it’s main platform.
But growing and scaling has its cost, and in the course of the next 3 years, increase in traffic began stressing the original architecture.
The monolithic nature of RoR meant that any bug or crash would bring the site down, while having a single database basically meant that over time, and as data grew and table joins started taking longer and longer to build for an increasing number of requests , the system became slower over time, having each user wait synchronously for fetches, updates, and page loads taking longer and longer. A mess.
In addition to the scaling problems we faced, the MVC paradigm of Rails while being fast and efficient, does not promote clean code structure. When strict rules and good design patterns are not applied, maintaining a fairly large platform in Rails can be tough.
Adding to that a growing engineering organization, with a constantly increasing number of teams, and you get a very hard to maintain codebase.
Alongside the great hype surrounding microservices, we started our baby steps in the journey to break the monolith.
When designing our first microservice, we had some important notions we wanted to promote:
- Ease of initial development — our backend engineering team spoke ruby, and we loved it for its simplicity and developer happiness it created. We wanted to preserve ruby as our language of choice (which later evolved into adding Golang to our stack, but on that, a bit later).
- A viral solution — a scaling effort starts firstly from the engineering team, and we had to make sure that our platform team creates a solution that is both profoundly stable, robust and scalable, but also viral — meaning that it’s adoption across the engineering team would be easy.
- Asynchronous messaging between services- as our traffic increased, the need for fast, responsive site emerged, and we understood that letting the user wait for the synchronous completion of his updates, not only slowed his sessions, but also slowed the entire platform, since less requests could be handled while other users are using the session.
Enter The Chimera.
Inspired by the mythological 2 headed beast, our chimera is a ruby microservice template combining:
- READ SIDE Grape API — for getting data from bounded context represented by the service — if it’s users information, the pricing of a gig, or the analytics of a specific order.
- WRITE SIDE RabbitMQ consumer — listening to a messaging topic and asynchronously executing updates to the model of the service’s bounded context.
The Chimera also enjoys the following 2 things:
- A Shared Core — since both “heads” are residing in the same repository, they enjoy code reuse and share the domain business logic, utils, etc.
- A Shared set of DB Connectors — The move to microservices initiated the usage of service specific databases such as MongoDB or Redis (which we now use extensively). While keeping our source of truth to be our relational MySQL cluster, we wanted to promote each service to access its own domain optimized database, and we needed utils to keep it simple to connect to all of those different data stores.
A typical asynchronous update request would look like this using a chimera:
- The user would perform a POST to create an order
- The REST API would validate the request during the sessions, without saving anything to the database yet. This is performed very fast since the validation is very shallow, and included basic input validations and might also include some basic data integrity checks against the DB which are also executed without any heavy processing.
- At this point, if the validation is passed, a message is being sent to the messaging broker (RabbitMQ) with a routing key pointing back to the chimera’s own worker. The message contains all the information passed in the POST command, and will be executed asynchronously without the user having to wait for it.
- The User is then being returned a quick 201 status, and he/she can continue using the platform immediately.
- The message is consumed by the chimera’s RabbitMQ consumer (AKA Worker) and since it has access to the same domain model, it can use it to perform the command and persist the order in the DB.
Following the introduction of the Chimera, which is our variation of the microservice, our updated backend architecture now looked something like this:
In reality, we spawned more then 100 chimeras over the course of the last 2 years, each owned and maintained by a different team, creating better decoupling and autonomy in the engineering department.
As pictured in this illustration, our new messaging bus was used for 2 types of events:
- Command Events — the events being sent by a chimera internally to it’s worker (RabbitMQ consumer) in order to process updates asynchronously.
- Domain Events — the events being sent by a chimera in a pub sub pattern to any other chimera registered to listen to those events, informing the system that something has happened inside the bounded context of the chimera.
An interesting usage we started implementing heavily using the domain events is CQRS — Command Query Responsibility Segregation.
At it’s core, CQRS is intended to create read optimized views to improve the performance of applications.
Let’s look at one CQRS usage scenario that came in very handy for us @ Fiverr — analytics! A simple example to showcase would be the seller orders dashboard:
Getting the number of orders in their different stages would require a simple group by command in SQL. But, when going large scale, we don’t want to perform such a heavy query for each of our sellers each time they refresh their “Manage Sales” dashboard. CQRS to the rescue!
CQRS allows us to update the orders bounded context, persist the order status change in our orders table in a relational MySQL table, then, using a domain event informing any interested component in our system about this change, we capture the change in a totally different bounded context — the analytics chimera, and update our read optimized mongoDB document by incrementing a specific bucket value.
Fetching of the order stats is now reduced from o(n) to o(1). nice!
As I’v mentioned this pattern continued defining a lot of the scaling and performance optimizations we have been conducting throughout our journey in the microservices world.
In the next chapter, we will discuss the various pains and challenges we faced using the described above architecture, and how out of those pains, evolved the next generation of our microservices architecture. Stay Tuned!