Pocket Gems hit its 10th year in 2019, and our game server unit tests have numbered 2,756 to date. Three years ago, it took 30 mins to run a full unit tests suite with a 2016 i7 MacBook Pro. It currently takes 23 mins to run the full suite with a 2018 MacBook Pro (2.9 GHz Intel Core i9, 32GB RAM, 1TB SSD) with 4 process parallelizations (user CPU time is > 90 minutes).
This is a pain point on our deployment, and we have horror stories because, in the past, engineers skipped running unit tests before deployment. To accommodate our growing unit tests, we created a tool that enables running the full unit test suite in less than 3 minutes.
Stateless Unit Tests
Each unit test can be run independently with each other. In theory, you can run each single unit test in parallel at the same time. It will then limit the number of CPU cores or your one slowest test.
We can put this into practice, where we can spawn multiple server nodes that run the unit tests. It sounds easy, but in reality, many things can happen. What would you do in the case of node failure? How would you monitor the node’s health? Do you need an auto-scaling policy? How much will it cost?
Nowadays, technology has gone in the direction of e serverless computing. Examples of such services are AWS Lambda and Google Cloud Function. The good thing about these services is that they charge you only when you are using it. You also don’t need to monitor the node health, because usually, the system is short-lived, which means it only runs when you need it.
In general, we can categorize serverless architecture usage in two use cases:
- Webserver with API gateway
- Short-lived event-driven function
We had an idea to utilize its parallelization to run unit tests. However, unit tests are not a webserver, and it’s not precisely short-lived.
We ended up being creative, by creating a simple function that takes two parameters, shard number, and total shards and runs a different set of unit tests based on that given value. We created a unit test plugin that shard unit tests into a different set of unit tests.
It allows us to run only a subset of unit tests with the given parameter. To allow parallel execution of two different sources code, each engineer owns a single function (or more) in Lambda. Every time we run a full unit tests suite, it’ll upload the current code base into the function, and we can execute hundreds of shards in parallel.
1. Upload Codebase
2. Send Request
request = [TestRequest(idx, n_shard).start() for idx in xrange(n_shard)]
3. Combine Output
When implementing this architecture, we encountered multiple issues:
- Source Code Packaging
Our codebase is not pure Python and our first try failed because of the incompatibility of the binary dependencies. We had to re-compile the binaries using the platforms where the Lambda function is running and store it as a dependencies zip file.
- Maximum package size in Lambda limit
We were uploading our full virtualenv folder, and that caused the package size to bloat. We filtered out non-essential files before zipping them together with dependencies zip.
If multiple users are trying to update the function while someone else is running the unit tests, it can cause inconsistencies. We handled this by having a unique function per user.
We are considering open-sourcing this tool, but our requirements and situations are pretty unique in allowing this to happen.
- Pure Python integration testing
- Small 3rd-party dependencies (rare updates of the dependencies)
- Deployment happened multiple times a day and running the whole unit tests suite takes forever
If you are also in a similar situation, comment below and let us know to consider open sourcing this tool.
This tool allows us to save a lot of engineering time. To this day, the tool is already being used by over 70 different engineers, with roughly 30 executions a day. Assuming the situation at the beginning constant, that each run saves ~25 minutes of running time, we had saved 12.5 hours/day of engineering time (roughly $600/day, > $10k/month), while costing us only ~$100/month. The result is, profit!
Personally, what I like the most about this tool is my laptop resources are not being used at all. I don’t have to listen to the CPU fan spinning when running the whole unit test suite. I also don’t have the frustration of waiting for the unit tests to finish running before being able to do something else.
We, as a company, are continuously iterating and improving our products. As part of that, we are also improving our quality of life, including building productivity tools that remove our pain points like this.
If you are interested in working in a highly productive team, we are hiring.