Magical bugs and where to find them
By Gabor Bakos
Everybody wants to be as fast as possible. Develop features then ship, pronto, while at the same time obtaining feedback, gathering findings and improving everything. We in Skyscanner’s mobile development team also like to operate this way, but as Ringo Starr once sang, “You know it don’t come easy.”
Our codebase is growing apace (now 496,682 lines of iOS code without external dependencies and 526,076 for Android without the XMLs). And, as I’m sure you know, bigger codebases don’t support fast iteration. So our release cycle got slower and it took more time to create a stable Release Candidate build after separating from the develop branch
Finding a solution
There are two possible ways to construct a static code analysis system.
- The passive way: when you’re gathering information then attempting to act on your findings. It’s an ad-hoc method, however, and doesn’t always make for complete integrity as it simply analyses the project. It is useful as a way to provide an overview of the system, but if there are too many issues or the size of the project is too big, the data can be overwhelming and ultimately useless.
- The active way: when you’re deliberately looking for issues and if there is any violation, you break the build. But it must be integrated into the build process, otherwise the developer might ignore it. If, for example, an issue occurs during integration time when the pull request is created, it is too late. It is vital that a clear and immediate signal is given.
In an ideal world these two methods can and should coexist. In other words, scan the overall health and break the build if there is a violation. We tried this, first scanning, then analysing the project but ultimately there were too many issues and we weren’t able to formulate action points. It wasn’t even obvious where to start, so we dropped this approach and adopted an active static code analysis system. You can find the exact flow of the active approach below.
First we focused only on iOS and Objective C. We chose Clang as our framework from the available options like OCLint, because, this is the compiler and it already indicated issues out of the box. Clang comes with an extensive Diagnostic framework and has an extensive list of warning flags. In order to gain leverage of this, compiler flags needed to be modified for the project. There is a predefined warning set that is already switched on but we can enable new warnings or warning groups and ultimately we can elevate these warnings into errors.
Fixing our own domain was the scope so we excluded 3rd party libs and focused only on our internal components, resulting in 43 modules that needed to be configured with the same ruleset. We created a simple script to modify the “Other C Flags” part of xcconfig files for those modules’ target configurations with an associated array of warnings that we wanted to elevate into errors. It can be found it in Xcode at the Targets/Build Settings/Other C Flags route if it ever needs to be changed manually. With this approach the developers were able to focus on the error flags and it wasn’t necessary to know how the script is working. Every error should look like the following pattern -Werror=<warning flag>. So these strings needed to be added to the config. Technically we iterated over the relevant build targets and added the constructed warning flags to the build config programmatically. With that approach the Xcode project file was consistent across targets and modules.
As I already mentioned, we had more than 9000 warnings in the app and when we enabled the ‘every warning is an error’ flag (-Wall), the result was overwhelming. Obviously we weren’t able to handle this amount of errors. So we created a list of warnings that the compiler had already indicated — this was the first set of rules that we would sort out. Like the Clang deprecated-declarations warning, which indicates C-style parameters in Objective-C method declarations which is deprecated. Or the Android ObsoleteLayoutParam which indicates the issue if the given layout_param is not defined for the given layout, meaning it has no effect.
Five teams volunteered to fix all of them, so we organized a Warning Elimination session every week with one person from each team for four hours. The process changed a lot during this time. We picked 2–3 errors, and enabled and fixed them one by one. Obviously there are cases when a fix is not an option, so in these cases we suppressed the error, but it was the very last option that we took. The biggest issue was fixing those issues that came in the footsteps of the previous fixes, to stop a domino effect.
Here I would like to emphasise the importance of the feedback loop. We experimented with various ways of error visualisation like enabling check in of just one build type, or just in CI time, or specific sets of rules in specific build time. Ultimately everything enabled in build time won, as the developer received the feedback in time and it wasn’t troublesome at all (even though we thought it might be). Also, it is not enough to check the code during development time. It has to be checked during CI time as well. We were fortunate in having the diagnostic system built into the compiler — both came free — however the CI time error visualization needed a tool. For that we used the Danger plugin, an awesome tool for automating various operations whenever the Merge Request is created. Technically, we created a log about the errors with the compiler and then Danger parsed it and posted back a message about the build health, so we were able to see the existing issues on the merge request as a comment.
Once we had rock solid implementation of the framework for iOS, we shifted towards our Android app. Fortunately, Android comes with a straightforward static code analysis tool called Android Lint. It is part of the development tool chain and is available as a command line tool, as well as integrated within the Android Studio. Basically there is a lint.xml where you can modify the severity of an issue and all you need to do is initiate the lint checker from the gradle build script.
The tool generates a highly detailed report about what the issue is, where to find it and what its severity. The analysis of the issues were faster and easier as the description and the meta data clarified the problem very cleanly. Also as there are issue priorities, it was easier to know what to fix first and what to leave until later. It is possible to calculate a warning entropy and configure the build script to brake if this entropy got higher. With this feature it is possible to eventually eliminate every warning naturally. However as we focused on the active warning removal as we didn’t need this feature.
After six weeks of hard work, we enabled 54 warning flags as errors which covered all of our 9000 Objective-C compiler warnings. As it was an effort of multiple teams and it had an impact on the company culture as well. Developers started to add flags for newly found issues and fixed those backwards by themselves. Initially, it wasn’t a popular task to fix flaws and refactor the code. There was a case, for example, when complete restructure of a module was required. So we realized that keeping up the team spirit was of equal importance to the system itself. We tried not just to facilitate the progress but to get fully get involved in every iteration.
With Android we followed the previous process. So we created a list of issues, then elevated them one by one up to error level, then fixed them. There were far less warnings by default in the project so we fixed 1658 in 5 weeks and it led us into 52 issue type usage. Android Studio uses its own lint ruleset which can be configured separately. It is important to sync these linters with each other to see the new ruleset during development time and not just build time.
From our previous twelve releases, five ended on time; they were rolled out within the targeted one week. With some process changes backed by static code analyses in place, five out of six were on time, so our release time improved a lot as the issues during stability period dropped a lot.
We won’t stop here. We’re planning to add new flags to the system and we intend to remove the remaining non Objective C warnings as well reach the zero state. Once we get there we’ll experiment with the “zero warning policy”, where no pull request can be merged with warnings alongside the errors. Also there are other tools to experiment with, like infer from Facebook. Not to mention that symbolic execution is a vast new field to discover. o we’ll continue our journey to find those pesky flaws and make our application easier and safer.
We have high hopes for the future.
Now, why not join the team?
Skyscanner is a great place to work. I’ve found it to be a really positive, supportive atmosphere, with many clever and kind people of varied interests and specialisms. We’ve got a few roles available* so please have a look if you think you would like it:
- Product Tribe Lead (London)
- Technical Product Manager (Barcelona)
- Senior Principal Engineer(Barcelona, Edinburgh, Glasgow, London)
- Senior Software Development Managers (London)
- Senior Product Security Engineer (Barcelona, Edinburgh)
- Technical Apps Manager (Shenzhen)
*At time of publishing these positions were open, but if they are no longer open do check out the other Skyscanner Engineering, Design and Marketing job roles.
Lastly, if you want to find out more about engineering at Skyscanner and hear about vacancies first, we’ll soon be launching a Skyscanner Engineering newsletter, for which you can sign up right now.
About the author
My name is Gabor Bakos, a mobile developer working in the Apps Enablement Squad in Budapest. We’re trying to make our developers’ lives easier and code quality is an area which has been close to my heart for many years. Outside of work, I’m a Crossfit enthusiast who is really fond of gaming and a wannabe dog trainer of my 3 year old sheltie called Merlin.