Cohesion in software: Two perspectives

Nikola Luburić
Clean CaDET
Published in
7 min readFeb 22, 2021

Cohesion in software is an interesting concept that helps us write better code. Highly-cohesive modules help us create design which is more focused and easier to maintain.

Before we begin, let’s define some basic terminology to avoid confusion. We’ll use the term module as a generic term to encompass software constructs that encapsulate some logic — this includes functions, classes, packages, assemblies... We’ll use the term elements of a module to define its parts — a class consists of members (e.g., fields, methods), while a package consists of classes. Now, on with the show.

Cohesion determines the degree to which a part of a codebase forms a meaningful atomic module.

The elements of a highly cohesive module work together towards a common, well-defined goal and have a clear (single) responsibility. This responsibility is defined by the module’s name and described by its interface that sets its inputs and outputs. For a function, these are the parameters and return values. For a cohesive package, this would be the coordinator class that presents an entry point for the package.

Let’s look at a few classes to determine if they have a well-defined goal.

The left class is playing with a few responsibilities, more than its name suggests anyway…

It’s worth looking at the field distribution in the previous classes. In this article, we will be looking at two categories of cohesion, where the left PDFReporter violates both of them. We will first look at the easier-to-define structural cohesion and then move on to the more vague semantic cohesion.

Structural cohesion is a metric that is calculated based on the number of connections between a module’s elements.

Let’s look at some code to gain an intuitive understanding of this metric.

We could argue that the StoreApplication class has a well-defined goal. It provides most of the logic required to support a very simple digital inventory — the methods work towards this high-level goal. However, if the features required of this application expand, the StoreApplication will quickly become a God Class [1].

We can utilize structural cohesion to discover suitable ways of refactoring our class and separating its lower-level responsibilities. One type of connection between a class’s elements appears as access to a field from a method. Looking at the code, we see that the 1st method accesses the 1st field, the 2nd method accesses all fields, while the 3rd and 4th methods access the 3rd field. Without knowing the semantics, we can propose a candidate refactoring solution. The 1st and 2nd fields can be grouped with the 1st and 2nd methods into a ProductRepository class. The 3rd field and the 3rd and 4th methods can remain where they are or move to some form of ProductService or maybe ProductCache.

While the above example is simple enough to do a manual analysis of the connections, the question is how do we utilize structural cohesion in larger classes? Many formulas exist for calculating a class’s structural cohesion [2]. A common basic formula is illustrated below, along with a simple example.

How does this formula hold for data transfer object classes? What about classes without fields?

A metric like structural cohesion can be an indicator of multiple concerns. For the StoreApplication class, its structural cohesion is 6/12 = 0.5. This number would decrease with each new method that works with the file system or the Product list. Low cohesion should prompt us to consider extract class, move method, or similar refactorings that increase the structural cohesion.

Notice that this formula doesn’t consider connections between methods. It also doesn’t apply for classes without fields (i.e., functional class). These issues stem from how we define connections. The structural cohesion of a functional class can define connections as method invocation — where one method calls another. This addition does not have to be limited to functional classes.

Next, notice that structural cohesion does not apply to classes without methods (i.e., data class with public fields) and gives a low number for data classes that contain only getters and setters. Data classes (with or without getters and setters) do not have structural cohesion as there are no structural connections between fields.

Before moving on to semantic cohesion, calculate the structural cohesion of the following code as an exercise (the solution is 0.75).

Our original definition of cohesion focuses on responsibilities and goals, which are semantic concepts. They are mostly defined by the name and intent we have for a module and its elements.

Semantic cohesion determines the degree to which the elements of a module are semantically related.

Semantic cohesion is neatly explained in [3], though the idea predates the article and is closely related to the Single Responsibility Principle [4]. To illustrate semantic cohesion, examine the following video. Try to mentally map the abstract features to code and determine what the ExpandInventory function looks like when placed in either module.

Looking at the formula defined in [3], we define semantic cohesion as:

It takes some time to wrap your head around it, but it is quite simple.

To test your understanding of the formula, apply it to the following examples against the blue and violet key.

For the answers, check out [3].

The problem with semantics is that they are ambiguous. A lot of the burden is placed on the engineer’s ability to accurately understand and express the meaning in the code. This is further complicated by juxtaposing business concerns (i.e., behavior that exists without the software) and application concerns (i.e., behavior that exists because of the software). Domain-Driven Design [5] studies how to overcome these issues, which are rooted in communication and language.

Much of software design involves the ongoing question “where should this code go”? The goal is to constantly look for the best way to organize code to make it easier to write, easier to understand, and easier to change later. While conventions and patterns can tell us where to place a Controller or Repository class, it is the semantics and meaning of a module that helps us create truly cohesive structures.

Through code, we talk with the computer. Often we have many simultaneous conversations. We talk about how to securely transfer information, how data is cached, and what makes a valid entity.

Taking an analogy from [6], we can view a system with low-cohesion modules like a loud party with many different conversations going on at once. It is difficult to focus on one conversation in such a scenario, and words often get misheard. A system consisting of highly-cohesive modules is like a well-designed online discussion group. Each area in the online group is focused on one topic, so the discussion is easy to follow. If you are looking for thoughts on a particular subject, there is only one room you visit.

Another long-term issue with low-cohesive modules is their tendency to be adhesive, as illustrated with a dung beetle.

It’s easy to add new stuff to an adhesive module — way easier than considering structural and semantic cohesion.

While most of the article examined the cohesion of classes, it is worth doing some mental gymnastics to map these concepts to other types of modules. For example, highly-cohesive methods do one thing which they describe through their name. Methods with low-cohesion have multiple code regions, each often starting with a set of variables and doing their own thing(s). In the worst case, such regions are intertwined and tightly coupled, making them difficult to understand and refactor. Highly-cohesive packages can perform a meaningful part of the business or application logic with little support from the rest of the packages. When they are also loosely coupled to other packages, it should be feasible to replace them, move them to another application, or promote them to a microservice.

Finally, it’s worth pointing out that cohesion is closely related to concepts such as coupling [3][6], the Single Responsibility Principle [4], and orthogonality [7]. It is worth exploring these concepts to fully understand the value of cohesion.

This article was developed as part of the Clean Code and Design Educational Tool (Clean CaDET) project. Educational content like the one presented here represents an essential pillar of our Smart Tutor module. This content is used to train future software engineers at the Faculty of Technical Sciences, Novi Sad, Serbia. Through posts like this and the Clean CaDET project, we hope to expand our reach to the global community. Therefore, we appreciate all feedback that can help us improve and better train the next generation of software engineers.

[1] https://en.wikipedia.org/wiki/God_object
[2] Izadkhah, H. and Hooshyar, M., 2017. Class cohesion metrics for software engineering: A critical review. Computer Science Journal of Moldova, 73(1), pp.44–74.
[3] https://ttulka.medium.com/how-cohesion-and-coupling-correlate-dd1716ca04fa
[4] Martin, R.C. and Martin, M., 2006. Agile principles, patterns, and practices in C# (Robert C. Martin). Prentice Hall PTR.
[5] Evans, E. and Evans, E.J., 2004. Domain-driven design: tackling complexity in the heart of software. Addison-Wesley Professional.
[6] https://docs.microsoft.com/en-us/archive/msdn-magazine/2008/october/patterns-in-practice-cohesion-and-coupling
[7] Thomas, D. and Hunt, A., 2019. The Pragmatic Programmer: your journey to mastery. Addison-Wesley Professional.

--

--

Nikola Luburić
Clean CaDET

A student, researcher, and teacher devoted to software engineering and educational technologies.