Development Best Practices

Saving Time and Effort with Static Analysis

Static analysis has become an indispensable tool in software development — especially for agile teams.

Abhishek Joshi
Zendrive Engineering
7 min readJul 12, 2019

--

Early detection of bugs, quick detection of potential security threats and the thorough enforcement of internal coding practices are just a few of the benefits of implementing a static analyzer.

tl;dr — See the Workflow of the new CI process section below.

Why CI/CD?

At Zendrive, things move fast. New features and patches roll out every week to keep up with changing requirements. Developers share and submit changes several times a day. This process, known as continuous integration (CI), consists of a main repository where each new submission is verified by an automated build, followed by an execution of test suites.¹

Traditionally, the gap between successive code check-ins is typically long, making bug resolution a painful and arduous task. CI fixes that, allowing developers to detect bugs early-on, and also make the whole process cost-efficient. Developers can spend less time debugging.

To meet requirements on the customer’s end, the code must not only be checked in to the development repository but also deployed to the production environment. For this reason, CI is almost invariably followed by a continuous delivery (CD) mechanism, deployed to the production environment with every single successful build.²

Since CD aims at building, testing and deploying software with greater speed and frequency, a straightforward and repeatable process is instrumental to reduce the cost, time and risk of delivery.

CI machinery stacked on top of CD mechanism, very conspicuously, forms the CI/CD pipeline.

Code review and the need for static analysis

Code review is a software quality assurance activity where humans inspect an author’s source code.³ It typically involves a whole crew of developers, involving manual inspection of the source code and checks for adherence with accepted coding standards, detection of design defects and mistakes, followed by basic sanity checks. A typical code review cycle can take anywhere from 3 to 10 iterations.

An arduous process that leaves many wondering, could faster review cycles be achieved? The defection of coding standards, type mismatch, and other trivial mistakes as a result of human error certainly don’t help — but what if some of these could be automatically detected?

Enter static analysis.

Static analysis software picks up on these problem patterns, helping programmers eliminate all kinds of bugs, vulnerabilities, and warnings. Code replication, security threats, and warnings are all tracked too. Even the most inconspicuous mistakes that can easily sneak past reviewers are all detected on the go. As a result, codebases are less prone to errors and review cycles are (finally) faster.

Formally, static analysis was understood as being the analysis of the codebase executed without explicitly running the program.⁴ Analysis can be performed either over the source code by constructing its abstract syntax tree, or even over the object code in some cases. Hence static analysis is typically integrated with the build task of the codebase. Most of the mistakes committed by programmers fall into a pretty wide range of categories, which can be understandably mapped to a set of programming patterns. Static analysis tools identify these patterns to detect bugs.

Here’s an example of how a static analysis tool detects bugs.

In this example, the bug is an unchecked NullPointerException. In the first snippet, the function BuildNumbers() might return a Nullable object, so when used in the for loop, the program might throw an unchecked exception. With the help of a static analysis tool, however, developers can easily manage and detect such bugs with the help of rules embedded within.

The second snippet shows a way to handle this case, which is easily overlooked by programmers and reviewers. The correct way to handle such a case is as a checked exception. Therefore, an automated tool to detect such vulnerabilities is a necessity.

In my time as an intern on Zendrive’s Android SDK team, I’ve come to deeply appreciate the benefits of static analysis, specifically in situations like the one described. Although my specific set of needs are different than most developers, the same fundamental principles carry over to almost any issue and can be easily replicated in any other CI/CD pipeline.

SDK Team’s CI pipeline

At Zendrive, we use git repositories to version control our codebase. We also use Gerrit, an open source web-based code review tool.⁵ We use it to host remote repositories for distributed collaboration. Reviewers can add comments and assign integral review labels from -2 to +2 to each change list (CL). The higher the label, the greater the approval. Only a +2 review label allows for merging the code with the main codebase. Gerrit also boasts a clean REST and SSH API, allowing users to customize and integrate their applications.

The CD mechanism is established on a Jenkins server. This is often used to automate aspects of the CI/CD pipeline.⁶ Jenkins is used by Zendrive for automating a lot of things. This includes CL verification, unit testing, and SDK release. It is highly customizable and hosts a myriad of plugins for a huge number of tasks. The CL verification job uses the Gerrit trigger plugin to fetch data from the latest uploaded patch set. It pulls the changes from the latest patch set and builds the codebase in this state. It then posts back a review label depending upon the success of the build. The figure below summarizes the CI dataflow in a simplified manner.

Zendrive’s CI architecture at a high level of abstraction

Now we have all the background information required to understand the process of integrating static analysis functionality into your CI/CD pipeline. For the convenience of developers, I added functionality which leverages Gerrit’s API. This posts the errors generated by the static analyzer as inline robot comments on respective patch sets.

Choosing a static analyzer

The first task was to choose a suitable static analysis tool which will integrate easily with the build tasks of Zendrive’s SDK. We kept the following criteria in mind for choosing an appropriate static analysis tool:

  • The tool should support the languages our codebase is written in (in our case, Java and Kotlin).
  • It should easily integrate with Android’s build task; preferably as a Gradle plugin.
  • It should have a robust warning suppression mechanism to prevent false positives.
  • It should support flexible rule creation for detection of custom bugs and enforcement of internal coding practices.
  • The tool should have strong bug identification performance ratings, verified with proof-of-concept tests.

We conducted the following two proof of concept tests:

  1. Checking if the tool is effectively detecting known bugs from an earlier build.
  2. Verifying the tool on other similar codebases; if yes, it’s a major proponent of the tool’s reliability.

After a thorough survey and rigorous testing, we found that Google’s error-prone was best suited for our codebase according to the preferences listed above. To do this, we used an easy-to-use Gradle plugin written by Thomas Broyer. Next, our challenge was to use the error report dumped by error-prone to post appropriate comments on the relevant patch sets.

Error-prone challenges

A major challenge that we faced was parsing the error report dumped by error-prone. This was because the format of the dump looked arbitrary in the first attempt. To tackle this, we wrote some basic grammar that covered all sorts of formats that the errors were printed in for our codebase. This grammar was just conjecture. It worked for all the formats we had seen but there was no way to know if it covered all the formats that exist.

To solve this, we decided to run error-prone on all sorts of open-sourced Java repositories. We then checked if our grammar worked with the dumped error reports while iteratively including cases that weren’t parsed. This was the best approach we could take and it has worked well for us so far.

Pro tip: Try to find a tool which will dump its error report in a uniform format like JSON or XML.

Once the error report was parsed, all that was left to be done was post these errors on Gerrit. For this purpose, we wrote scripts that leveraged Gerrit’s REST APIs. The whole process was deployed to Jenkin’s CL verification job (mentioned earlier).

Workflow of the new CI process

The whole workflow of the CI process is summarized below with the diagram. Changes to the source code are made on the local machine and pushed to Gerrit for review. A Gerrit trigger is invoked from the Jenkins machine. This fetches the changes pushed from the local machine along with required metadata of that particular patch set.

The fetched source code is built along with the new changes. Since error-prone is used as a gradle plugin, it is invoked after the JavaCompileDebug tasks. The error report is then redirected from the console to a suitable file.

A parser reads the error report and converts the format into comments that suit the payload of Gerrit’s API. All files not belonging to that CL are filtered. They are then posted to Gerrit as inline robot comments on the relevant patch set.

This cycle continues until there are no more errors and the CL is checked in after a complete peer review.

CI architecture after integrating static analysis functionality and comment posting utility

Conclusion

The ideas discussed in this blog post have been generalized so that you can easily replicate them. There is a significant amount of initial investment that goes into setting up this pipeline. But, the payoffs are worth it in the long run.

About the Author

Abhishek Joshi worked at Zendrive as an intern in the first half of 2019. He is a student at BITS Pilani. This article represents a portion of his learning at Zendrive, and we wish him the best with his future endeavors.

References

[1] https://en.wikipedia.org/wiki/Continuous_integration
[2] https://en.wikipedia.org/wiki/Continuous_delivery
[3] https://en.wikipedia.org/wiki/Code_review
[4] https://en.wikipedia.org/wiki/Static_program_analysis
[5] https://en.wikipedia.org/wiki/Gerrit_(software)
[6] https://en.wikipedia.org/wiki/Jenkins_(software)

--

--