Stop Wasting Time on Unit Testing: How Tokopedia Achieved 8X Faster Results
Learn How Tokopedia Achieved 8X Faster and More Efficient Unit Testing Results
Unit testing is an essential part of software development, especially for large-scale applications that serve millions of users. It’s crucial for the developers to ensure that every feature is tested thoroughly and that every new feature implemented does not create new bugs for existing features because it can cause significant loss for both developers and users.
While having unit tests is important, maintaining and running the tests can be time-consuming and resource-intensive, especially on a large-scale app that has tens of thousands or more test cases.
When one cycle of building and unit testing takes too long time, it would decrease the productivity of the developers themselves. This is especially true on a packed CI/CD system that only has few machines handling dozens of developers.
For our case, we have 9 CI/CD machines with ~65 developers developing daily. Usually each developer works on multiple branches, with each branch triggering both build and unit test job. Can you imagine if single unit test job could take 30 minutes?
In this article, we’ll go through our journey to optimize our daily unit testing process in Tokopedia iOS Team, which started with how we did our unit testing back in the days to where we are today.
The Pain Points
By optimizing our unit test performance, we addressed some pain points our team has:
“OMG! This fix needs to be merged ASAP, but there are 10 idle queues on the CI machine waiting to be executed :(”
“I just edited our GitHub code owner, the PR only has a single changed file with two new lines. What did you say? I need to run all the unit tests, really?”
“Ok, my PR should be ready, let’s trigger the CI to run all unit tests.”
*30 minutes later*
“What? How could my unit test fail? Oh, I forgot to commit my latest changes. 30 mins retrigger here we go, let’s do the housekeeping while waiting.”
The Physical Education teacher way: let’s run all unit tests!
In Tokopedia, we adopted modular app architecture using Bazel as our build system (read more about it here). Therefore, all explanations will be based on Bazel, not Xcode’s native build system.
Our first unit testing process is by querying all unit test targets using the bazel query, then run test on each target from query results. Please note that by adopting modular app architecture, our app consists of dozen of modules (targets) for each feature/test case.
The process is fairly simple but it takes a really long time to run tests on each target individually. This is because every time Bazel runs a build or test, it has its own build process. Namely the dependency analysis, validation, and linking process that could slow down overall testing time.
While this approach is not necessarily bad, it does not work well for our large-scale app. We have hundreds of test targets with tens of thousands of test cases, and running them individually takes a really long time.
On the other hand, if we have a simple app with a few test cases, the time difference is negligible, so improving the unit test process may not be worth the effort.
Remember that I mentioned the Bazel’s build process in the last paragraph? Yep, in the next section, we will talk about how we optimize this build process to make it faster. Let’s head into the next section.
The Marriage Way: where all unit tests unite into one
After using Physical Education (PE) teacher way for a while, we realized that Bazel’s build process takes a toll on our unit test pipeline performance. The dependency analysis and overhead time starting and ending XCTest runner make our unit test time vary from ~20 mins to ~35 mins because we have hundreds of test targets.
So we tried to unify all unit test targets into one single target. The idea is to simplify dependency analysis and overhead time caused by running each test one by one. A rough idea of the process is in the following pic:
The downside of this approach is that it won’t utilize Bazel’s caching to its fullest. Whereas the old approach will cache the test cases that didn’t have code changes.
After testing it for about ~2 months, we realized that this new approach will run faster compared to the old unit test way despite the downside that I stated before. It saves time on the Bazel’s build process (analyzing packages, dependency, etc) whereas the old approach will repeat it for all test targets.
The plot above illustrates that employing a single unit test enables us to achieve more stable unit testing times, with an average duration of approximately 450 seconds. This indicates that the single unit test approach can lead to more consistent and reliable testing results.
Conversely, using the traditional non-single unit test approach leads to a significant number of spikes in the unit testing time, with the highest value being 50% greater than the highest value obtained using the single unit test approach. This suggests that the traditional approach can result in less predictable and less reliable testing outcomes.
Lastly, single unit test also significantly decreased our unit test median to ~563s from ~1772s. This indicates that the single unit test approach can also lead to faster and more efficient testing processes, which can ultimately result in time and cost savings.
Our unit test journey doesn’t end here, we keep trying to find ways to improve our unit test performance to make our workflow more efficient.
The Buffet Way: where only selected unit test matter
Here comes our latest approach, let’s use an analogy to make it easier to understand.
“If you spilled something on your work table, should you wipe all your co-workers tables on our floor?”
“No, are you crazy? I will only wipe my own table and its surrounding which were hit by its splashes.”
“Yep, that’s how you should do unit tests too.”
Do you get it? Our last approach is the selective test where only related unit tests will be tested. So for example, let’s say we have the following structure for our unit test:
Let me give some context about what the arrows and circles mean. Each circle means a module and every arrow means that something is dependent on the other thing. So for example in the above diagram, we have modules A and E pointed to module C, which means that modules A and E are dependent on module C. In other words, A and E are the reverse dependencies of C.
Based on the previous analogy, if module C has some changes (getting milk spilled on), then directly affected modules will be module C itself, and indirectly affected modules will be A and E. So back to our unit test discussion, when we have changes in module C, we should test modules C (obviously), A, and E as illustrated below.
Why A and E? You may ask. Because modules A and E use module C as their dependency, so every change in module C will indirectly affect modules A and E. We won’t test the rest of the modules because of their transitive dependency nature. When A depends on B, and B depends on C, if C is correct, then A should be correct too.
With the long boring chit-chat, let’s jump into the flowchart of our selective test implementation!
With the selective test, we implement these three main steps:
Step 1 — File changes comparison
Usually, when a developer starts to create their features, they will create a new branch from the main branch where all changes will be combined. This step will compare the file changes from their branch to this main branch. The output from this step will be a list of paths for each updated file.
Step 2 — Target path extraction
The output from step 1 is only the file path, we need to find its target name before we proceed to the next step. In this step, we will recursively find unique Bazel’s BUILD file path for each directory.
For example, we have this path:
We will recursively find Bazel’s BUILD file from parent/childOne/childTwo/childThree. If Bazel’s BUILD file exists, we will save its path and break the recursion tree.
Step 3 — Reverse dependencies query
After we found the changed target, we will find its reverse dependency using Bazel’s Sky Query allrdeps command. Then we will combine each reverse dependency into one target just like approach number two above, and test it.
After selective test is live, we tested it for about ~3 months in a real development environment. We found significant improvement in unit test speed, especially when the file changes only affect small modules that not many modules use as a dependency.
Now let’s get back to comparison plot. As you can see, the main difference in the distribution between selective test ON and OFF is on the lower end of the data. Since selective test focuses on running the minimum amount of tests while staying true, it is expected that we saw increased amount of low (fast) unit testing time. The median of our unit testing duration also decreased to ~250s from ~500s.
The main advantage of selective test is we also could reach zero unit test time (yep, the test is directly marked as passed because there is nothing to be tested!). This is the case when our file changes does not affect any main code base, such as GitHub’s code owner file. The evidence of this advantage is the amount of data point that reach approximately zero second when selective test is activated in above plot.
In above graph, we could see that single selected test could halven the unit test speed. Thanks to the possibility to reach instant unit test experience when nothing to be tested. Do note that if the file changes are related to a lot of modules, the unit test time will be more or less the same as single unit test from second approach.
Unit testing is one of the most important step in our engineering process to make sure whether something is impacted or not, at least on the feature logic side. Therefore, its essential to improve unit testing time because unit testing could take a very long time, especially in a large-scale app.
Our unit testing improvement journey wasn’t as smooth as this article stated, it was full of ups, downs, and many experiments which led to where we are today. But as we always say in Tokopedia — Make it happen, make it better.
Last but not least, enormous thanks for Wendy Liga who started this odyssey with single unit test and involves me on our latest improvement — Selective Test which in total improved our unit testing duration 8X FASTER compared to first iteration.
Also thanks to our CI/CD machines for working with us, they are in happier state now 😆