Measuring Modularity

Creating Objective Measures of Modularity in Python

Published in

Ro Engineering Blog

4 min readAug 6, 2020

I’ve spent most of my career in monolithic codebases. C, C++, Python, Javascript — none are safe from our temptation to paint outside the lines, mixing disparate areas of code.

When working in a monolith, it is essential to maintain modularity. Keeping modules small and well-isolated with clearly defined APIs reduces the chances of introducing bugs and shortens development time. A fully modular, or polylithic project, gives us the flexibility of modular design and the benefit of transactions and synchronous execution inherent in monolithic development. Let’s look at how we might structure and validate such a polylith in Python.

Defining Modularity

Modularity is a measure of interdependence within modules and independence between modules. Measuring that independence is a matter of counting the dependencies between files inside and outside of each module.

Note that it is vital that modularity be objectively measurable. It is the inverse of invalid interdependency, i.e., interdependency between modules that bypasses defined interfaces. We can therefore numerically represent modularity as a number between 0 and 1:

modularity = valid_dependencies / total_dependencies

For example, let us say that Module A depends on Module B. There are several valid dependencies in which Module A is calling interface functions of Module B. However, there is one invalid dependency in which Module A calls the internal logic of Module B directly, bypassing that module’s interface.

The more invalid interdependencies, the lower the modularity score. To achieve a polylithic structure, we need to detect and remove these invalid dependencies.

Modularity and Python

In the simplest structure, modules — as we define them here — have at least two parts: an interface and a core. The interface should be the only file depended on by any file outside of the module.

In this structure, an invalid dependency would look like this:

module_b/core.py

# Module B core logic - do not access outside Module Bdef module_b_core_function_one():
    passdef module_b_core_function_two():
    pass

module_b/interface.py

# Module B public interfacefrom module_b.core import module_b_core_function_one
from module_b.core import module_b_core_function_twodef do_some_work():
    module_b_core_function_one()
    module_b_core_function_two()

module_a/logic.py

import module_bmodule_b.interface.do_some_work() # ok, it uses the interface
module_b.core.module_b_core_function_one() # interface violation

Detecting Invalid Dependencies

Invalid interdependencies can be detected using static analysis (parsing and evaluating code without running it). Using a tool like SciTools Understand, we can parse and extract entities and relationships. I will use Understand in this example, but anything that can parse and analyze Python should work. Detecting dependencies happens in four steps:

Extract a list of all files used in the system
Extract a list of which entities (classes, functions, vars, etc.) are “owned” by which files
Use entity relationships to infer file relationships
Validate that any relationships between modules use only interface files

Step One — Extract Files

Everything is an entity to Understand. Entities can be classes, files, functions, variables, and more. For this step, we need to extract every file entity in the system into a list.

Step Two — Extract Ownership

Understand does not have a clean way to identify which functions, classes, etc. are defined inside a particular file. In some languages (Java, C#, etc.) it’s possible to have definitions span multiple files (such as partial classes). For Python, we need to identify the relationship between file entities and other entities that are a “Declare” or “Define” type, that will tell us which file own which entities.

Step Three — Get File Relationships

Now that we have relationships between entities (refs) and file entity ownership, we can infer which files are related to other files by examining how the entities are used across them. Note that for this step, we ignore “import” references and only look at real usage.

Note that a quick and dirty way to do this is to look only at import statements. It is possible, though, to both import something that is not used and use something that is not directly imported (such as a function of an imported module that may be defined in some other module entirely).

Step Four — Validate Module Definitions

In the final step, we need to validate that the relationships between files match our understanding of the defined interfaces. To do this, we will need to provide some information about how an interface should look. We can do this in a variety of formats (YAML, JSON, etc.), but I will use JSON here as an example:

[{
    “name”: “module_a”,
    “files”: [“module_a/*”],
    “interfaces”: [“module_a/interface.py”]
},
{
    “name”: “module_b”,
    “files”: [“module_b/*”],
    “interfaces”: [“module_b/interface.py”]
}]

The above defines two modules. For each module, we need to check that any references into the module from outside of the module reference only the files marked as interfaces. If there are any violations, they can be listed. It is also possible here to use Understand to find the exact file path and line number of the violation.

Conclusion

When we think of clean code, we often focus on entities inside a single file: classes, functions, etc. In monolithic development, it is essential to also look at the overall hidden structure of the code and how files relate to each other. By measuring modularity and identifying interface violations automatically, we can ensure that our code stays in well-isolated and easily digestible modules.