Replacing a Car Engine While Driving it

Published in

Fiverr Tech

6 min readApr 28, 2022

How we replaced our inbox system in Fiverr

Prologue

Have you ever rewritten a legacy system into a new system? If you haven’t experienced it yet — lucky you! Keep reading. The inbox app is a mission critical component in Fiverr’s online marketplace. It’s a channel for sellers and buyers to communicate and order services from each other.

In this blog post, we’ll talk about some insights I gained from the project of rewriting our inbox app for its next generation — how we integrated existing features from the legacy system into the new one.

Motivation

The inbox is a very complex system. Multiple clients are involved, and it’s one of Fiverr’s most traffic heavy components — with over 150 million API calls a day and over 50 million produced events (Kafka, rabbitMQ, BigQuery) a day.

As a legacy system, the inbox has a few significant disadvantages:

It was designed 10 years ago to support communications on a much smaller scale than today.
You never know where you’ll end up when a production issue occurs. It is very hard to map legacy and complicated flows under production issue pressure.
It’s challenging to develop new features. As it is legacy code, it’s barely covered with tests, and it’s being used by millions of users every day. In such a case, it is scary to make changes.
We can’t help other product teams in Fiverr to combine messaging-related components into their product. (We can’t share our knowledge and experience about messaging within the organization.) Our legacy inbox system is not generic enough to be used as infrastructure for other products.

How we developed existing features in the new system

We needed a defined strategy for developing, testing, and deploying the features that exist in the current system — create messages, create conversations, mark conversations with different settings, update messages, etc.

We can’t afford to develop all the inbox features and deploy the new system at once since it’s unpredictable and it’s expected to fail (there are too many moving parts in the architecture).

Additionally, we wanted to employ Agile practices and not in a waterfall, so we chose the iterative approach. We wanted to develop a single feature every time, test it, and deploy it to production to see how it behaves. Then, hopefully we can repeat that flow for all of our important and sensitive features. To better understand our workflow, let’s review how we approached the message creation feature:

The basis of our idea is in duplication of the flow, letting the legacy flow (which works perfectly) keep serving production requests, and duplicate the traffic gradually to the new system.

How do we verify that the flow in the new system is good?

At Fiverr, we use a process called Data Quality Assessment (DQA). The idea is pretty simple: we wanted to measure the persisted data in each system and compare it, thus helping us validate that both flows act the same way.

If the message was (1) created with the same text body, (2) related to the same conversation identifier, and (3) at the same time in both systems (in practice we had much more criteria) — we can be confident that the new message creation flow is well constructed.

At the end of each flow, we wrote the message and its details (just as it is stored in DB) to a shared BigQuery table.

We wrote a script that generates a detailed report (every hour) regarding the gaps between both systems’ reporting which looks like that:

example of the message creation DQA report. M1: legacy system; M2: new system

The report was generated by querying BigQuery dedicated table (reporting from both systems).

*example of successful reporting from both systems on the same message with BigQuery*

Why did we have gaps?

Every message sent in the inbox goes through a complex validations pipeline to confirm the following:

The sender and recipient of the message are allowed to speak
The message is not spam
The sender and recipient of the message are not blocked users

And many more…

To calculate each one of these validations, the legacy system is connected to some crucial databases and tables in the organization such as Users, Orders, Gigs, and more (leftovers from our Monolith days).

The new system is a completely isolated micro-service, which means that it does not connect to databases other than its own dedicated one. To use the relevant data for the validations, we needed to persist it in our micro-service database using the event sourcing pattern.

“The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as the application state itself.” — Martin Fowler

Since we didn’t just “rewrite” the flow, we also changed our architecture (moved from Monolith to a micro-service with event sourcing); we could not predict what the reason for a gap would be — too many moving parts.

Examples of gaps that we faced:

There were 5% more created messages in the legacy system.
Spam messages were not created at all in the new system.
0.1% of the messages are created in the new system but not in the legacy system.

Summary

The process that was described in this blog can be broken down into those steps:

Develop the feature in the new system
Create a dedicated table in BigQuery to store reports from legacy and new systems
Report the persisted data at the end of the flow from each system
Generate a detailed report of the gaps between systems in a particular time period
Review the gaps report:
- If you are satisfied with the results, then stop 🥳
- Identify why there are gaps; issue a fix for those gaps
Wait a particular time period and return to step 4

Most of the effort and focus are on step 5. This process enabled the team to solve specific bugs every time, deploy fixes, and review the DQA results once again until the accuracy goal was achieved.

When we deployed our first version of the message creation feature, the DQA process was 92.15% accurate (Not bad!), meaning that 92.15% of the messages were successfully persisted in the new system and were identical to messages in the legacy system.

After a few short iterations, we were at a 99.995% accurate rate, so we decided that was good enough 🙂.

The feature behaves identically to the current feature in production.

We used this procedure to develop the rest of the features in the system, achieving stability and rapid movement in development.

Fiverr is hiring in Tel Aviv and Kyiv. Learn more about us here.

Replacing a Car Engine While Driving it

Written by Saar Arbel