Automating Code Quality: Integrating Static Analysis for Java and Android (Even on a mature code base)

Published in

Tile Engineering

9 min readOct 15, 2018

Background

When I started at Tile three years ago, my predecessors had already written and released a functional Android application. We had decent unit test code coverage and thousands of lines of written code. As the team grew, we discovered a bottleneck in our efficiency was our code review process. Around the time we came to this realization, Tile’s engineering team held a hackathon and I used that time to investigate how to automate code quality. This is my first post in my series on the subject of automating code quality at Tile for Android.

Tile Android’s Environment

At Tile we have a vanilla Android application written in java. We use trunk based development and create pull requests on Github from our branch to our main line. On each open pull request we use our Jenkins instance to run all the unit tests and all our static analysis tools: Android Lint, PMD, Checkstyle and Findbugs. We enforce a zero high priority issue policy, which means you cannot merge if any of the static analysis tools find a high priority issue.

Static Analysis

Java is a mature language and fortunately has free to use open source tools to check for common Java mistakes. A quick search on Google will point you to a wikipedia page dedicated to static analysis, and on there you will find a list of tools you can use for Java. I encourage you to choose a couple and read about what the tool will capture, how it can be installed, and how it can be configured. The configuration is important because some tools are subjective, such as those related to code style. You want a tool that fits the needs of your team. Also some tools are easily integrated into Gradle and can be run as a gradle task.

For Android, Google provides an open source static analysis tool called Android Lint.

Integrating Static Analysis

Follow the instructions for each respective tool and install it in your project and on your automation server. We want the static analysis to run at least once before merging, but I recommend you find what works best for your team and timeline.

Depending on your automation server, it may have plugins that will read the outputs of the static analysis tools and pass or fail the build based on the results.

Static Analysis Enforcement Policy Dilemmas

Integrating into a new code base is ideal because you can enforce zero errors, but sometimes, well most of the time, it is not realistic. So, what if you are trying to add to an existing code base….

Issue #1: You run for the first time on your code base and have an explosion of high priority issues

Ideal case you have time to fix all the issue, but I would recommend you read through and use your best judgement to prioritize what should be fixed immediately and what should be added to the backlog.

Issue #2: You know you have errors, but want to prevent future errors

As I stated earlier, some static analysis tools have plugins for your Automation server. For example, Checkstyle has a plugin for Jenkins. One of the options in that plugin is to have thresholds.

The issue here is it is not automated…Someone would have to be responsible for adjusting the thresholds after a issue is addressed. Also you could get into the case where a developer fixes one issue, but introduces another issue in the same commit. The delta is no change and the build would pass.

Issue #2 Continued

An observant reader might point out the checkbox underneath the delta table. “Compute new warnings (based on the last successful build unless another reference build is chosen below)”. Unfortunately, for the use case where you have this analysis run on each open pr, this will not work as it is will pass or fail depending on the history of previous runs.

Allow me to illustrate:

Policy: Check the compute new warnings box, which effectively is fail the build if the delta of errors is higher than 1.

Assume we have three branches. Mainline, the branch we cut stable releases off of, and two feature branches.

In Branch B, the developer decided to fix one of the high priority errors, while in Branch A, the developer focused on their feature. If the build order is:

Build Branch A on automation server
Build Branch B on automation server

Then the build will be successful, but if the order is

Build Branch B on automation server
Build Branch A on automation server

Then the build for Branch A would fail because Branch A does not have the fix on Branch B, and as a result, qualifies as introducing a new error. However, once these two branch are merged, the error count will go back down to 48. It not feasible to require Branch A to have the fix in it because it is unnecessary overhead and wasted work.

Workaround

Create a solution to enforce zero errors on files touched in a pull request.

At Tile our static analysis pipeline is as follows:

Preprocessing

We are going to be using Github’s Rest Api V3. I am going to give examples for pull requests that are coming from the same fork and for pull requests across forks.

If the pull request branch is on the same fork as the destination:

Choose your favorite scripting language and from the command line run:

curl https://api.github.com/repos/[owner]/[repo]/pulls/[pr_number]

Note: you may need to attach a -u with your user credentials if your repository is private

2. Next pipe the output of that command to your scripting language of choice. In the response your are looking for two SHA commit hashes: head and base.

In the above curl call’s response you should see the following:

"head": {
"ref": "transition",
"sha": "SHA_HEAD",
...}

and

"base": {
"ref": "master",
"sha": "SHA_BASE",...}

3. Take the two SHA’s and run another terminal command:

git diff --name-status SHA_BASE...SHA_HEAD

4. Parse the output to get the file types you care about. Look for additions, modifications, renames and copies. For the latter two you will want to parse the output line to get the new file name and location. You may want to exclude tests as well.

Example output:

M app/build.gradleM app/src/androidTest/java/com/apps/AFileTest.javaM app/src/main/java/com/apps/AFile.java

Possible status letters:

A: addition of a file
C: copy of a file into a new one
D: deletion of a file
M: modification of the contents or mode of a file
R: renaming of a file
T: change in the type of the file
U: file is unmerged (you must complete the merge before it can be committed)
X: "unknown" change type (most probably a bug, please report it)

5. Write to an external file the full file path to the source file by combining the output of the git command plus the path to the directory where the source code lives.

Example:

I will be using a pull request from Google’s Gson library.

curl https://api.github.com/repos/google/gson/pulls/649

In response I extract the following SHAs

"head": {
    "label": "google:duplicate_keys_in_map",
    "sha": "0b46657468afe0f089fd382293539734cc6e66a3",
    ...
},
....
"base": {
    "label": "google:master",
    "sha": "09c3a95b142640e132f24311266a504d7f505c21",
    ...
}

I then run git diff on the two SHAs

git diff --name-status 09c3a95b142640e132f24311266a504d7f505c21...0b46657468afe0f089fd382293539734cc6e66a3A .gitignoreA gson/.gitignoreM 
gson/src/main/java/com/google/gson/internal/bind/MapTypeAdapterFactory.javaM   gson/src/test/java/com/google/gson/functional/MapAsArrayTypeAdapterTest.javaM gson/src/test/java/com/google/gson/functional/MapTest.java

I then write to my output file only the files ending in .java and are not tests. Assuming I have the repository on my machine at /Users/eric/localGson/, then my file would only have a single entry.

/Users/eric/localGson/gson/src/main/java/com/google/gson/internal/bind/MapTypeAdapterFactory.java

If the pull request branch is on another branch:

Parsing the files changed across two forks without checking out the other fork requires more work because the output is not as clean.

Choose your favorite scripting language and from the command line run:

curl https://api.github.com/repos/[owner]/[repo]/pulls/[pr_number]

Note: you may need to attach a -u with your user credentials if your repository is private

2. Next pipe the output of that command to your scripting language of choice. In the response you are looking for a field called diff_url.

In the above curl call’s response you should see the following:

"diff_url": "https://github.com/[owner]/[repo]/pull/[pr_number].diff"

3. Take that diff_url and run:

curl -s -L https://github.com/[owner]/[repo]/pull/[pr_number].diff | git apply --stat -

4. In the output it will list out the files changed:

.../samples/apps/directory/ClassA.kt    |    3 +--.../samples/apps/directory/ClassB.kt    |    2 +-2 files changed, 2 insertions(+), 3 deletions(-)

5. Parse out the class you care about and write the full file path to an external file.

Example

I will be using a pull request from Google’s Gson library.

curl https://api.github.com/repos/google/gson/pulls/942

In the json response I parse out

"diff_url": "https://github.com/google/gson/pull/942.diff"

I run a curl on the diff_url

curl -s -L https://github.com/google/gson/pull/942.diff | git apply --stat -gson/src/main/java/com/google/gson/JsonObject.java |    9 +++++++++.../test/java/com/google/gson/JsonObjectTest.java  |   15 +++++++++++++++2 files changed, 24 insertions(+)

I then write to an output file only java files that are not tests. The output file should have a single entry. The repository lives at /Users/eric/localGson/ .

/User/eric/localGson/gson/src/main/java/com/google/gson/JsonObject.java

Running Static Analysis

From the preprocessing output file run static analysis on each of those files. The mechanism to get to this work with each build system and tool will vary.

If you are using Gradle as your build system, the goal is to change the include and exclude set of each gradle task to only be the files changed. I cannot do a comprehensive review because each project’s setup will introduce new challenges. I will give a few pointers.

Some pointers for getting it to work with:

In our build.gradle we have defined new tasks that extend each of the tool tasks, and we have included a helper function:

task checkstyleChangedFilesOnly(type: Checkstyle) {
// ours is empty here because we dynamically apply rules to our
// checkstyle, such as the rule configuration file's location. 
// However, It is far easier to set it in xml because some
// properties are not dynamic
}// helper functionprivate void addSourceForJavaClasses(String taskName, Task theTask) {
    if (taskName.contains("ChangedFilesOnly".toString())) {
        theTask.setProperty("source", getFilesChanged())
    } else {
        theTask.setProperty("source", fileTree('src'))
    }
}

You will have to configure the reports to output in a file format that you can parse. I recommend setting them to output .XML because it is easier to parse
To get FindBugs to run on only the files changed, we use the following snippet:

tasks.whenTaskAdded { theTask ->
    def taskName = theTask.name.toString()
    if (taskName.contains("ChangedFilesOnly".toString())) {
        theTask.setProperty("classes", fileTree(dir:  
        "${buildDir}/intermediates/classes/", includes: getIntermediates()))
    } else {
        theTask.setProperty("classes",    
        fileTree("${buildDir}/intermediates/classes/"))
    }
}

Caveats: There may be files that have more static analysis errors than lines of code, and for these I recommend you create a black list file. Files in this blacklist will not cause a build to fail.

Parsing the output

Refer to the tools documentation to get the format of the output file. Then using any programming language take the output file and parse out the errors. For example for Checkstyle we parse the xml in python with the following snippet:

def runAnalysis(self, xmlFile):
    # parse through the xml
    e = xml.etree.ElementTree.parse(xmlFile).getroot()
    violations = 0
    for atype in e.iter('error'):
        violations += 1
        print atype

    print "Violations: " + str(violation)
    return violations

Publishing the result

For publishing we use Github’s status to annotate the pull request. Refer to the following documentation to see how.

What a successful Pull Request output looks like for our Android team. Note: We are not concerned as much about style

Conclusion

This is a very brief overview of how Tile’s Android team helps automate code quality through the use of static analysis. I briefly went over how to enforce no new errors, but it will take some trial and error to get it working. In my opinion, getting the build tools to only run on your files changed file will take the longest.