Eliminating Technical Debt using Control Flow Graph Analysis

Abhisek Datta
Engineering @ Chargebee
3 min readJun 10, 2022

At Chargebee, we have been using CodeQL for a while to solve security problems related to finding all variants of a given vulnerability. The same approach can however be used to solve an important engineering problem — Technical Debt reduction and dead code elimination.

Technical Debt Accumulation

Any moderately complex piece of software is an aggregation of code that we write and the libraries that we adopt from open sources in our build. It’s a continuous choice between building from scratch or adopting open source. In any case, evolution of our code base increases the technical debt —

  1. Deprecated or unused lines of code
  2. Libraries included in build but no longer used
  3. Dependency on legacy or unmaintained libraries

Add to the complexity if you are using a legacy build system that uses unmanaged jars sprayed across your Git repositories.

Challenges in Dead Code Elimination

Build tools like Gradle or Maven provide out of box support for identifying dependencies. However, older ant based build systems cannot use such feature readily. Even so, modern build tools will not be capable of detecting unused code blocks or dependencies, especially in case of transitive dependencies.

Using CodeQL to Identify Unused External Libraries

CodeQL is the code analysis platform used by security researchers to automate variant analysis.

We are looking at adopting CodeQL for identifying different variants of a vulnerability, found internally or reported by external security vendors. As part of evaluation, we wrote context specific CodeQL classes modelling our controller class (Java) that can be used to write queries for common vulnerabilities.

We internally ran an experiment to leverage CodeQL to identify unused libraries in a sample application. The general idea is given below

  1. Build a CodeQL database for a sample application — This represents a Control Flow Graph (CFG) for us to query upon
  2. Write a CodeQL query to identify all cross package MethodCall i.e. caller is defined in com.example.sampleApp and callee is NOT in the same package. To reduce false positive, we filter out the java.* package as well.
  3. Create a sorted list of all GAV based on existing jar manifests
  4. Any library (jar) for which we do not have at least one cross package Method Call is potentially unused and can be removed.

An example control flow graph (CFG) is given below that visualises the idea where we need to capture method calls across packages.

An example control flow graph demonstrating cross package method call

An example CodeQL query for [2] would look like this

Example CodeQL query for listing external packages

This approach can also be used to identify unused block of code, including class, method or a package with minimal customisation of the above query.

Challenges and Constraints

The approach presented in this post works in general cases but fails to handle dynamically resolved or transitive dependencies. For example, consider external-lib-1 is dependent on external-lib-2. Our approach above will not consider this case. We did not attempt to solve this problem as we believe an application should only manage its immediate dependencies and let the build tool take care of transitive dependencies. Controlling external dependencies, including transitive dependencies, for security or quality gate requirements can be implemented using private repository manager and not really within the scope of this problem.

For comments or feedback, you can get in touch with me over Twitter 😀

If you are interesting in our work and want to solve complex problems in SaaS products, platform & cloud infrastructure engineering — we are hiring!

--

--