Nupokati, or contract-based CI/CD of mobile apps
Hi, my name is Dmitry, and I am a release engineer at Avito’s CI/CD Speed team. For several years now, we have been responsible for everything involved in releasing our mobile apps — and not only that.
I want to show how our CI/CD system has evolved from a set of scripts and Teamcity builds into a full-fledged mobile release service allowing us to manage the entire release process of mobile apps via a user-friendly interface.
A bit of context
Avito mobile app is:
- Dozens of product teams.
- 20+ developers on each platform.
- Thousands of UI tests.
- Tens of thousands of UNIT tests.
- Hundreds of thousands of lines of code.
- Weekly releases.
The entire release process consists of the following phases:
- Creating a release branch and tagging it in git.
- Running all automatic code checks and all types of tests.
- Building a release candidate.
- Uploading a release candidate to AppStore/GooglePlay and the internal artifact storage.
- Sending information to monitoring systems.
- Loading data into the feature toggle management system.
- Compiling what’s new for QA and editors.
- Preparing Jira artifacts — setting a version in the tasks, creating tasks for editors, QA, and release engineers.
- Notifying all stakeholders that the release candidate is ready.
- Regression testing.
- Starting staged rollout and notifying thereof.
- Releasing the app for all users and notifying thereof.
In early 2019, all this was successfully supported by several dozen scripts in different languages and complex chains of Teamcity builds. Every Sunday, cron launched the Teamcity start configuration, while scripts and builds did all the work in 1–9.
If everything went well, the product teams analyzed the autotesting results and did some manual checks. Our editors wrote beautiful texts, while the release engineers followed the former and latter and waited eagerly for the moment when the coveted “release” button could be clicked. Idyll and victory of automation over routine work ensued.
But behind this perfect storefront, several problems were hiding.
Problem 1: complex build chains in Teamcity
The release process includes many stages. Each successive stage depends on the preceding one or more. Add to all these scores of external systems dependencies.
Such a system based on Teamcity works great and gets the job done, but with increasing complexity and speed, one inevitably faces a number of problems:
- Multiple builds and complex links between them.
- Support and making modifications is challenging.
- Challenges when testing the entire process or parts of it.
- It is difficult or impossible to start the process from the point of failure.
For example, when Build 1 is shutting down, Build 3 is waiting for the output from Builds 1 and 4, some parameters have been modified in Build 7, and after an hour the entire system collapses. You end up having to figure out what needs to be fixed or restart the whole process from scratch — a lot of time and energy is wasted, and sometimes what you get is actually “manual automation.”
This was aggravated by another problem, or rather a peculiarity of how our work was organized.
Problem 2: areas of responsibility
Historically, our large team consists of two fairly independent smaller teams. These are us, the CI/CD team, and our colleagues at the Testing team. We are responsible for the entire general part of the release, or CD — taking the appropriate code and delivering it to users. The guys at the Testing team are responsible for the entire platform-specific part — building the app, running the necessary tests, and delivering it all to us.
Accordingly, the release chain of builds contained both our builds (feature freeze, Jira tasks, notifications, preparation of artifacts for manual testing) and our colleagues’ builds. The builds were interdependent, which prevented the rapid development of our systems and processes, and those of our colleagues, because any indirect change could potentially mess the whole process up.
We tried to partially solve this problem with the help of nightly test releases. Every night, the whole process was run on test data, and in the morning we could see the system status. But there was another one problem.
Problem 3: people
We have to deal with a part of the release process that involves people. They are directly involved in it as testers, editors, release engineers. Some are involved indirectly, but have an interest in ensuring that users get the app: product managers, developers, marketers, analysts. Previously, all communication was carried out through Slack channels, and the actual status of the release was scattered across multiple places (Jira, Slack), only the release engineer was aware of it. The latter had to spend a lot of time answering the questions “when the release is released to all users?”, “can we start testing?”, “when will the next release start?”
We thought that it was time to approach the issue in a revolutionary way, rather than evolutionary, and deal with all the problems at once by taking the best of what we already have.
Delineating responsibility
As mentioned earlier, our large team consists of two smaller teams, each responsible for either CI or CD portion of the entire process.
Let’s see how we define these concepts.
CD:
- creating a release branch in git
- tagging in git
- launching CI
- preparing the release artifacts (Jira tasks, release notes)
- preparing the regression artifacts
- notifying of release stages
- rollout
CI:
- running all tests
- building the app
- building platform-specific artifacts
- uploading the app to the market
We see that there is a delineation of areas of responsibility at the process level, there is a clear delineation at the organizational level. But in the general release process at the Teamcity level, everything was mixed, which caused many problems.
At the same time, CI mostly is completely unaware of CD. CD runs the CI portion with the required parameters and waits for the builds to complete to get the necessary artifacts. It turns out that the tight link between builds and complex many-to-many dependencies were not justified. We decided to draw a distinction between CI and CD, establish a single point of interaction between these two, and bind it together with a “contract”.
A contract in its essence is a pair of JSON files, one of which is shared by CD to the CI portion, while the other is expected as an output from CI.
In this concept, CD is a process manager — it completely manages the release, configures CI, and expects certain outputs. CI simply does the work ordered by CD, and both parts can exist and be developed completely autonomously, as long as the contract is complied with.
An example of the contract’s input file config.json:
{
"schema_version": 1,
"project": "avito",
"release_version": "777.5",
"output_descriptor": {
"path":"http://artifactory.ru/releases/avito_android/777.5_1/output.json",
"skip_upload": false
},
"deployments": [
{
"type": "google-play",
"artifact_type": "bundle",
"build_variant": "release",
"track": "beta"
}
]
}
Here we inform the CI portion that we want to build a release of the Avito project no. 777.5, expect that the resulting output file will be uploaded using the path defined in output_descriptor, and also specify what artifacts and in what form should be built and where these should be uploaded thereafter.
An example of the contract’s output file output.json:
{
"schema_version": 1,
"teamcity_build_url": "https://tmct.ru/viewLog.html?buildId=17317583",
"build_number": "777",
"release_version": "777.5",
"git_branch": {
"name": "release-avito/777.5",
"commit_hash": "2c54c50c220bf91bc1a6ca10b34f53a540c80551"
},
"test_results": {
"report_id": "5f3e94fd23d67bf434e5c1b8",
"report_url": "https://
tests.avito.ru/report/AvitoAndroid/FunctionalTests/2c54c50c220bf91",
"report_coordinates": {
"plan_slug": "AvitoAndroid",
"job_slug": "FunctionalTests",
"run_id": "2c54c50c220bf91"
}
},
"artifacts": [
{
"type": "apk",
"name": "avito-777.5-777-release.apk",
"uri": "http://example.com/artifactory/android/avito/777.5-777/avito-777.5-777-release.apk",
"build_variant": "release"
},
]
}
It contains CI results and all data relevant to the downstream process, such as links to artifacts and test results.
Nupokati: a mobile app release service
Thanks to the contract, we solved the problem of the tight links and complex dependencies between different parts of our release pipeline. But this did not solve the problem of transparency, maintainability, and manageability of the entire process. We were still dealing with a tangle of Teamcity-builds on our side of the pipeline.
Therefore, we decided to abandon Teamcity in CD and implement an in-house mobile app release service.
What did we expect from the new service?
- No complex links and implicit dependencies.
- Restarting the release from the point of failure.
- Transparency of the release process for everyone involved.
- Simple support, customization, and testing.
- Compatibility with the company’s various mobile platform projects.
This is how the mobile release service Nupokati emerged — we ultimately adopted the working title.
It consists of a Python CD management service and a mobile release dashboard.
Using cron, the CD service checks the release calendar, starts the required release, launches CI, performs all the necessary interactions with external services, and notifies all stage stakeholders.
The main control entity in the CD service is the Release.
It is built of steps.
Here’s an example of a small part of a release:
This allows us to keep modularity and flexibility and quickly connect new projects to the service, simply by building the desired pipeline of steps. Moreover, each release has its explicit state, which allows us to restart the release from any point of the release pipeline and get a predictable result.
The dashboard serves as a control panel for the entire process and shows a particular release’s current status:
Here one can find all information about the release and links to artifacts:
All release management is also done from here:
The current status of the release train is also displayed:
In addition, on the dashboard, information about past releases and a calendar with upcoming ones are available:
Conclusions
Such a seemingly simple thing as a contract between CI and CD turned out to be revolutionary. It allowed us to get a convenient system for mobile app release development, support, and management. The contract eliminated all the existing problems without adding new ones in the process.
It is important to understand that the approach of inventing the wheel in any difficult situation is not universal, and there are multiple solutions to any given problem. But in our case, the development and the time it took paid off in terms of ease of use and support and hundreds of saved hours of work. Our solution just works and performs its function, making happy everyone involved in the release process.