Testing Finance Platform At Scale With Shadow Testing

The Finance Platform at Agoda consumes financial data from internal sources, such as booking and payment systems, as well as data from third parties, such as banks and payment gateways, before reconciling and processing the data to provide output reports and dashboards for Finance users.

At the time of this post, the platform consists of over 30 applications, with more than 100 jobs running daily, processing more than 15 million transactions, and generating more than 50 reports.

To make sure that we can scale our platform to support such a large amount of data, the platform uses Hadoop for data storage. Our Applications run on Apache Spark with Scala as the main programming language, in-house ETL and data tools that pull data from various sources.

Challenges we Faced

There have been challenges in maintaining such a big and complex system:

  • Agoda has hundreds of products across over a hundred systems, each product generates financial data that the platform needs to process. This results in thousands of different combinations of products and features. It is inefficient or sometimes impossible to develop test cases that can cover them all.
  • The calculation process can be complicated with many variables and conditions that come from different sources linked together. Even a small change in one formula can lead to unexpected impacts on parts that, at first glance, seem unrelated.
  • Several developers and QAs are working on the platform with different levels of understanding and experience. To be able to test and understand what to test requires a complete understanding of all upstream systems and how they work end-to-end. To find QA with such knowledge is very rare.

All of our applications follow the test-pyramid strategy: we have good coverage of Unit Testing, Integration Testing, and End-to-end Testing. However, even with those, it was still not enough to ensure the quality of our platform. Over the years, we found a number of bugs due to unexpected impacts from changes.

To solve the problems, our engineers devised an additional regression method: Shadow Testing.

What is Shadow Testing?

Shadow testing, also known as traffic mirroring or AB Regression Testing, is a testing approach that sends production traffic to a mirrored service — usually a pre-release version of the service, then compares the outputs with the production to find if there are any unexpected output difference that comes from the changes.

Shadow Testing has gained popularity in the testing community for effectively helping with regression testing large-scale services, providing excellent coverage with low maintenance costs. Opensource tools like Diffy are also used to feed production traffic to a test system.

Shadow Testing

However, Shadow Testing for Finance Platform works differently than shadow testing for APIs. This is because most parts of the Finance platform are data pipelines. Most applications in the platform just read data from source tables, process them, and then save the results to destination tables.

Therefore, the test application needs to deal with the data instead of the live traffic. This makes implementation much simpler.

Shadow Testing on Finance Platform — How it Works

Shadow Testing For Finance platform

Here is how it works: once a developer commits their changes, our CI will build and deploy the test version, then it will read production or snapshot data and generate the output into its database. After that, there will be a job that will detect differences between test and live output. If there are any differences, it will post the feedback to Git. The developer will then review the output manually and decide whether the commit is good to go.

The following steps are key elements of shadow testing implementation:

Duplicating production data

A crucial part of shadow testing is testing with production data or data that are as close to the production data as possible.

There are two methods that finance platforms use to obtain the data.

  • Direct read-only access to actual production data: This method is fast, but it has a limitation; it is read-only and only takes the latest data, which might not be the same as the data when it was run on production. We used this method for read-only and immutable input or the master or configuration data that rarely changes.
  • Access to a snapshot of production data: This method makes the test much slower but provides us with stable and writable data. It is used for mutable data or the services that write back into source tables; we would want to snapshot them to ensure that they stay as close to the production data as possible.

Running tests and saving outputs.

Once we have the input data, the next step is to build and run the test. CI pipeline will check out the code, compile and then deploy it. Agoda has an in-house framework built on top of Spark framework to deploy and run the test.

There are a couple of points to note:

  • We needed to ensure that the test system’s actual production tables, both input, and output, are protected and are not writable by Shadow Testing process. We did it by using different user credentials.
  • We made sure that database names for input and output tables are configurable and are changed to different locations.
  • You will also need to call Git API to get metadata for git feedback such as commit hash.

Comparing The Differences

Now that we have the test result written in tables in the separate database and the result of the production version. We need to find the differences.

We have a Spark job for comparing results and saving differences into another database.

We look for the following differences:

  • Aggregated data (e.g., numbers of rows and total amount)
  • Data in the columns
  • Change in Schema and Errors

Most tables should be compared using a wildcard (e.g., select *). This will ensure it takes less effort to update the schema in the future. However, we found that sometimes it’s also beneficial to be specific to provide on-point and easy-to-understand feedback.

Providing Feedbacks

Once we have recorded the differences in the tables, the last step is to put feedback to Git so developers and QAs can take action.

CI/CD system should provide API to call so that it updates. At Agoda, we use GitLab API to provide feedback to contributors.

Having differences will not hard-block the merge as there can be expected differences from the intended changes. Still, there are agreements within the team and the code reviewers that all the unknown differences need to be investigated before the change to merge.

Feedback from Shadow Testing

The Benefits of Shadow Testing

After we ran Shadow testing for more than a year, we found a lot of benefits to using shadow testing;

  • Great test coverage from using production data. Shadow tests can detect problems that even developers or QAs are unaware of. According to the team, since we began using it, most of the bugs from unexpected impacts were caught by this process.
  • Unlike unit/integration/system tests, you don’t need to update or add test cases and scenarios for every new change. The shadow test rarely needs updates, even when you make major logic or schema updates.
  • Compared to system integration testing, shadow testing is generally much easier to maintain and much more stable since you only test 1 component with no other dependencies.
  • It can help with learning and improving knowledge of the service. Not all code contributors know everything about the service when they make changes. By checking all possible input and forcing contributors to investigate and understand the differences, it can give people more insight into what the service can do and make sure they do better quality code in the future.

Cons of Using Shadow Testing:

Like all things, shadow testing is without limitation:

  • It cannot be used to test new features or new services. Shadow Testing is mainly used for regression testing. We have unit tests and integration tests for that purpose. Therefore, Shadow Testing is NOT a replacement for unit test and integration testing but additional test steps that can significantly improve overall quality.
  • It still needs manual work to review the differences. Reviewers also need to have some level of knowledge to be able to identify whether the changes were expected or not. Early on, this can take time, but it should get better as people have gained more knowledge on the application.
  • There can be a lot of noise for big changes.
  • The pipeline needs to be well-designed to do shadow testing properly. It’s harder to set up shadow testing to test stateful systems with many dependencies or systems with mutable data.

Additional Works

There were things that we have done to make Shadow Testing even better:

  • Performance Testing: Since you are using the actual data to test. You can kill two birds with one stone by measuring performance and throughput.
Performance Testing Result
  • Making Test Suite Configurable and Generic: For the very first shadow testing that the team does, it should be done specifically for the application. However, once you start applying it to many more services, sooner or later pattern will emerge, and it’s better to make it generic so that it’s faster to apply to other things. Right now, it only takes a day or less to create shadow testing for our new finance application.


The Finance Platform at Agoda uses shadow testing to perform better regression tests to detect unexpected changes. It is now a key part of the platform that helps us catch several issues over the year and give us confidence in our releases.


Shadow testing: https://microsoft.github.io/code-with-engineering-playbook/automated-testing/shadow-testing/

Diffy: https://github.com/opendiffy/diffy



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store