Highwheel-Modules: Filling the Gap (Part 2)

In my previous post, I described programming languages as tools to translate low level abstractions into machine executable code and architecture diagrams as tools to convert concepts in high level abstractions.

I claimed that there is an abstraction gap from high level abstractions to low level abstractions that has not been addressed yet by any tool. Highwheel-Modules is a personal project that I’ve written to fill this gap and to automate the verification of the structural properties of software via static code analysis.

In this post, I will present the tool from its conception to its internal details.

Genesis

Highweel-Modules started as a fork of Highwheel by our very own Henry Coles. Quoting the description of the project:

“Highwheel is a tool to detect and visualise class and package cycles in Java code”.

The idea behind the project is simple: given a collection of compiled byte-code, produce an output that depicts the complete dependency graph of the collection. Represent classes as vertices and dependencies as edges, highlighting cycles whenever present.

I highly recommend you try it out because it’s a remarkable tool. It enables you to have a snapshot of your projects and to identify structural properties that would be otherwise very difficult to find.

The really ingenious part of Highwheel is that instead of relying on a direct interpretation of the source code, it uses the byte-code as source of truth and the ASM byte-code manipulation framework to build the dependency graph through a visitor pattern. These are important features for two reasons:

  1. The project works seamlessly on byte-code generated from any language that compiles to it, hence all the major JVM languages like Java, Scala, Groovy and Kotlin are covered.
  2. Using byte-code as source of truth makes the analyser very precise and allows to find dependencies that are not explicitly imported in the preamble of the source code with import statements. For example, if a class uses another class that lives in the same package, Highwheel is able to detect the usage because the fully qualified class name is always annotated in the byte-code.
Figure 1. Example of dependency graph between classes

While useful, the representation of the dependency graph at the class level is not ideal because it is at a very low level of abstraction. Big projects with thousands of classes cannot be depicted with Highwheel in meaningful ways due to the huge amount of information that the graph contains.

However, it is reasonably simple to address the abstraction issue: instead of using the classes as nodes it is possible to use packages and the dependencies between classes can be used to compute the edges in the graph.

Figure 2. Example of dependency graph between packages

Assuming that classes A, B, C, D from Figure 1 are in package x, class E is in package y and classes F, G, H are in package z, the dependency graph between packages x, y, z is the one depicted in Figure 2.

The package dependency graph building rules are straightforward:

  1. Create one node for every package of interest.
  2. Ignore every dependency between two classes X, Y which are in the same package.
  3. Given two packages foo and bar, if there is a class X in foo that has a dependency towards class Y in bar, then create an edge between foo and bar.

In the example, the dependencies A -> C, B -> C and D -> C in package x are discarded because they don’t cross the package boundary. The dependency C -> E causes an edge between package x and package y because the dependency crosses the package x boundary and so on.


Highwheel-Modules

Highwheel-Modules is nothing more than the implementation of the idea described above using the core Highwheel algorithms to carry out the analysis.

There are couple of key differences between Highwheel and Highwheel-Modules:

  1. The focus of the project is not visualisation but verification. Highwheel-Modules consumes a module specification and tries to match it against the real dependency graph of a project.
  2. The definition of the modules of interest needs to be provided as part of the specification.
  3. A formal language is used to build the specification to be fed to the analysis.

An example of specification can be taken directly from the Highwheel-Modules code-base itself:

modules: 
Utils = "com.github.fburato.highwheelmodules.utils.*"
Core = "com.github.fburato.highwheelmodules.core.*"
Cli = "com.github.fburato.highwheelmodules.cli.*"
MavenPlugin = "com.github.fburato.highwheelmodules.maven.*"
rules:
MavenPlugin -> Core
Cli -> Core
Core -> Utils
MavenPlugin -> Utils
Cli -> Utils

As you can see, every Highwheel-Modules specification is made up of two sections: the modules section and the rules section.

The modules section is used to define what are the packages that constitute a module. Every module is named with a symbolic identifier and it is associated to a list of regular expressions in the glob syntax that are used match fully qualified class names.

The rules section is used to describe what are the dependencies that are allowed or disallowed between the modules defined in the modules section.

Fractal design

The specification language allows to define the dependencies between modules at any granularity (down to the individual classes). For example, the specification above describes the entire Highwheel-Modules project, but every module can be further broken up and specified at a lower level of abstraction.

The following is the specification for the “Core” module of Highwheel-Modules:

modules:
Algorithms = "com.github.fburato.highwheelmodules.core.algorithms.*"
ExternalAdapters = "com.github.fburato.highwheelmodules.core.externaladapters.*"
Model = "com.github.fburato.highwheelmodules.core.model.*"
Specification = "com.github.fburato.highwheelmodules.core.specification.*"
ModuleAnalyser = "com.github.fburato.highwheelmodules.core.ModuleAnalyser", "com.github.fburato.highwheelmodules.core.AnalyserException", "com.github.fburato.highwheelmodules.core.AnalyserModel"
Facade = "com.github.fburato.highwheelmodules.core.AnalyserFacade"
rules:
Algorithms -> Model
ExternalAdapters -> Model
Specification -> Model
ModuleAnalyser -> Algorithms
ModuleAnalyser -> Model
ModuleAnalyser -> ExternalAdapters
Facade -> ModuleAnalyser
Facade -> Model
Facade -/-> Algorithms
Facade -> Specification

Every sub-module in multi-module projects built with Maven can have its specification at lower levels of abstraction. The parent project should use the highest level of abstraction to connect together the individual sub-modules instead.

The Toolkit

Highwheel-Modules comes in three different flavours:

  • A command line tool: accepts as arguments a list of directories or JARs and runs the analysis on all *.class files available in the directories and JARs. By default it attempts to read a file named spec.hwm in the current working directory, but a path to a different file containing the specification can be passed as option.
  • A Maven Plugin: runs the analysis on all *.class files available in the target/classes directory of the project using the spec.hwm file in the base directory as specification.
  • An SBT Plugin: part of a separate project but based on the core module of the main Java project, runs the analysis on all *.class files available in the build/classes directory of the project using the spec.hwm file in the base directory as specification.

The plugin and the command line tools are all highly configurable with sensible defaults and I recommend you check the project’s README to see what the additional capabilities are.

However, there are some key features that I think make them already useful for production applications:

  1. The Maven and SBT plugins are designed to have the build failing in case the analysis identifies a dependency that is not expected because it violates the specification. This feature makes it possible to run the analysis as part of your regular build and have immediate feedback on structural errors in the dependencies between modules.
  2. The command line tool returns a non-zero exit code in case the analysis fails in order to have semantics similar to the plugins.
  3. Both the plugins and the command line tool output architectural measurement like fan-in and fan-out of the modules. These metrics should be used to determine stability and abstractness of modules.
  4. When a specification violation is detected, all the tools will output the real path between classes that makes the analysis fail. For example, if one of the rules in the specification for the example in Figure 2 was x -/-> z then the analysis will fail. The tools will output that x -> y -> z and they will provide examples for the dependency x -> y (in this case C -> E) and for the dependency y -> z (in this case E -> G).
  5. The analysis considers dependencies out of the scope of the specification to identify unknown transitive dependencies. This means that if the specification regards only modules A and B and requires that A -/-> B, if there is one class C that is not part of A or B such that A -> C -> B, the transitive dependency between A and B will be still detected and the analysis will fail (thanks to Henry Coles for the feature suggestion).

Conclusions

I think that Highwheel-Modules can help software architects to define the structural properties of software systems and have them verified as part of the normal build process of every byte-code compiled project.

As far as I’m aware, it is the first tool of its kind. It has the potential to make apparent and concrete design decisions that until this point have only been confined to diagrams and never verified in the code.

The decision of making the specification part of the build and the code-bases is not accidental: I would like the spec.hwm files of projects to be treated as code and as such they should be subjected to modification, code review, discussion and alteration as projects evolve.

Specifications should never be considered as the ultimate and definitive decisions imposed by the architects but they should be living documents that are corrected whenever necessary and provide an high level representation of the system.

At NCR Edinburgh, we are already using it as part of our builds and I would love to receive feedback about the tool usage and contributions on GitHub.

Until then, I’ll go back to my ivory tower.