Understanding Cohesion in Software Engineering

7 min readJul 8, 2024

Cohesion is a crucial concept in software engineering, representing how closely related and focused the responsibilities of a single module or class are. High cohesion within a module or class leads to better maintainability, understandability, and reusability of code. In this blog post, we will delve into what cohesion is, how it can be measured, and why it matters, illustrated with simple and real-world examples.

What is Cohesion?

Cohesion refers to the degree to which the elements inside a module or class belong together. High cohesion means that the elements within a module are highly related and work together to perform a specific task, while low cohesion indicates that the elements have little in common and perform many unrelated tasks.

Why is Cohesion Important?

Maintainability: High cohesion makes code easier to maintain because changes are likely to be localized within a module.
Understandability: When a module performs a single task, it is easier to understand.
Reusability: Modules with high cohesion are often more reusable because they perform a single, well-defined task.

How to Measure Cohesion?

Measuring cohesion can be done both qualitatively and quantitatively. Let’s explore both approaches.

Qualitative Assessment

Expert Judgment: Experienced developers can often assess the cohesion of a module by reviewing its code and responsibilities. If a module has a single responsibility and all its elements are working towards fulfilling that responsibility, it is said to have high cohesion.

Quantitative Metrics

Several metrics can be used to quantify cohesion. Here are some of the most common:

Lack of Cohesion in Methods (LCOM)

LCOM metrics measure the dissimilarity of methods within a class based on their access to instance variables.

LCOM1: Counts the number of pairs of methods that do not share any fields minus the number of pairs of methods that do share fields.

LCOM1 = ( number of method pairs not sharing a field) −(number of method pairs sharing a field) / 2

A higher LCOM1 value indicates lower cohesion.

LCOM2: Measures the number of connected components in a graph where methods are nodes and edges represent shared attributes. The more disconnected the graph, the lower the cohesion.

LCOM3: Defined as the number of disjoint sets of methods (disconnected graph components).

Example of LCOM3 Calculation

Consider a class with four methods: M1, M2, M3, and M4. Suppose M1 and M2 share an attribute, and M3 and M4 share another attribute. This results in two disjoint sets of methods: {M1, M2} and {M3, M4}.

LCOM3 = number of disjoint sets = 2

A high LCOM3 value indicates low cohesion because it implies many disjoint sets of methods within the class.

Example of LCOM1 Calculation

Consider a class with three methods and three instance variables:

class Example:
    def __init__(self):
        self.var1 = 0
        self.var2 = 0
        self.var3 = 0

    def method1(self):
        return self.var1

    def method2(self):
        return self.var1 + self.var2

    def method3(self):
        return self.var2 + self.var3

To calculate LCOM1:

Method pairs that share fields: (method1, method2) and (method2, method3)
Method pairs that do not share fields: (method1, method3)

LCOM1 is calculated as:

LCOM1 = (1−2)/2 = −0.5

Since negative LCOM1 values are adjusted to 0, this indicates the class has high cohesion.

Cohesion of Methods (CoM)

Measures the ratio of the number of methods that use each attribute to the total number of methods. High CoM values suggest high cohesion.

CoM = ( number of method-attribute pairs ) / no. of methods * no. of attributes

Where:

Method-attribute pairs: The total number of times that methods access or use class attributes.
Number of methods: The total number of methods in the class.
Number of attributes: The total number of attributes in the class.

Example Calculation of CoM

class Example:
    def __init__(self):
        self.var1 = 0
        self.var2 = 0
        self.var3 = 0

    def method1(self):
        return self.var1

    def method2(self):
        return self.var1 + self.var2

    def method3(self):
        return self.var2 + self.var3

In this Example class:

method1 uses var1
method2 uses var1 and var2
method3 uses var2 and var3

Let’s break down the calculation step-by-step:

Count the number of method-attribute pairs:

method1 uses var1 (1 pair)
method2 uses var1 and var2 (2 pairs)
method3 uses var2 and var3 (2 pairs)

Total number of method-attribute pairs = 1 + 2 + 2 = 5

Count the number of methods:

There are 3 methods: method1, method2, method3

Count the number of attributes:

There are 3 attributes: var1, var2, var3

Calculate CoM:

CoM= number of method-attribute pairs / ( number of methods×number of attributes )

= 5 / 3 * 3 = 5/9 = 0.56

A CoM value of 0.56 indicates a moderate level of cohesion. This means that while the methods in the class are somewhat related, there is still room for improvement in terms of making the class more focused and cohesive.

Interpretation of CoM Values

High CoM Value: Indicates high cohesion, meaning methods frequently share attributes and the class is likely well-focused on a specific task.
Moderate CoM Value: Suggests moderate cohesion. Methods share some attributes, but the class could be further refined for better focus and cohesion.
Low CoM Value: Indicates low cohesion, meaning methods rarely share attributes and the class may be handling too many unrelated tasks.

Why is CoM Important?

CoM is important because it provides a quantitative measure of how well a class or module adheres to the principle of cohesion. High cohesion within a class ensures that the class has a single, well-defined purpose and makes the code easier to maintain, understand, and reuse.

Structural Metrics

Data Dependency: Evaluates how data is shared among methods within a class. High cohesion is indicated by frequent data sharing among methods.
Graph-Based Metrics: Uses graphs where nodes represent methods and edges represent shared data or interactions. Cohesion is measured by the density of connections within the module.

Conceptual Metrics

Similarity-Based Measures: Uses natural language processing (NLP) to assess the similarity of names and comments within methods of a module. Higher similarity suggests higher cohesion.
Semantic Analysis: Analyzes the semantics of the code to ensure that methods and variables serve a common purpose.

Tools and Automation

Several tools can automate the calculation of cohesion metrics, providing objective insights into the cohesion of your codebase:

SonarQube: A popular static analysis tool that measures various code quality metrics, including cohesion.
Understand: A static analysis tool that provides detailed metrics on code quality.
CodeMR: A tool specifically designed for measuring and visualizing code metrics, including cohesion.

Real-World Example: E-commerce Application

Imagine an e-commerce application where we have different modules like Product, Order, and User. We'll focus on the Order module to demonstrate cohesion.

Low Cohesion Example:

In a low cohesion scenario, the Order class might handle too many unrelated tasks, like processing payments, managing shipping, and handling customer notifications.

class Order:
    def __init__(self, order_id, customer, product_list):
        self.order_id = order_id
        self.customer = customer
        self.product_list = product_list
        self.payment_status = "Pending"
        self.shipping_status = "Not Shipped"

    def process_payment(self, payment_info):
        self.payment_status = "Paid"

    def ship_order(self):
        self.shipping_status = "Shipped"

    def notify_customer(self):
        print(f"Order {self.order_id} for {self.customer} is {self.payment_status} and {self.shipping_status}")

    def calculate_total_price(self):
        total_price = sum(product['price'] for product in self.product_list)
        return total_price

Here, the Order class has low cohesion because it handles payment processing, shipping, customer notifications, and price calculations, which are unrelated tasks.

High Cohesion Example:

To improve cohesion, we can split the responsibilities into separate classes:

class Order:
    def __init__(self, order_id, customer, product_list):
        self.order_id = order_id
        self.customer = customer
        self.product_list = product_list

    def calculate_total_price(self):
        total_price = sum(product['price'] for product in self.product_list)
        return total_price

class Payment:
    def __init__(self):
        self.payment_status = "Pending"

    def process_payment(self, payment_info):
        self.payment_status = "Paid"

class Shipping:
    def __init__(self):
        self.shipping_status = "Not Shipped"

    def ship_order(self):
        self.shipping_status = "Shipped"

class Notification:
    @staticmethod
    def notify_customer(order_id, customer, payment_status, shipping_status):
        print(f"Order {order_id} for {customer} is {payment_status} and {shipping_status}")

In this improved version, each class has a single responsibility:

Order handles order details and price calculation.
Payment handles payment processing.
Shipping handles order shipping.
Notification handles customer notifications.

Calculating Cohesion Metrics

Example for LCOM (Lack of Cohesion in Methods)

Let’s calculate LCOM1 for the original Order class:

Attributes: order_id, customer, product_list, payment_status, shipping_status
Methods: process_payment, ship_order, notify_customer, calculate_total_price

Pairs of methods sharing attributes:

process_payment and notify_customer share payment_status
ship_order and notify_customer share shipping_status
calculate_total_price shares product_list with __init__

Pairs of methods not sharing attributes:

process_payment and ship_order
process_payment and calculate_total_price
ship_order and calculate_total_price

Let’s assume there are 3 pairs of methods sharing attributes and 3 pairs not sharing attributes.

LCOM1 = (number of method pairs not sharing a field−number of method pairs sharing a field) / 2

=3−3/2=0

An LCOM1 value of 0 indicates high cohesion, but this is a simple example. In more complex scenarios, calculating LCOM can reveal low cohesion more clearly.

Example for CoM (Cohesion of Methods)

For the original Order class:

process_payment uses payment_status
ship_order uses shipping_status
notify_customer uses order_id, customer, payment_status, and shipping_status
calculate_total_price uses product_list

Number of method-attribute pairs:

process_payment: 1
ship_order: 1
notify_customer: 4
calculate_total_price: 1

Total method-attribute pairs: 7

Total methods: 4

Total attributes: 5

CoM = number of method-attribute pairs / (no. of methods * no. attributes )

= 7 / 4*5 = 7/20 = 0.35

A CoM value of 0.35 indicates moderate cohesion, but there’s room for improvement.

Tips for Improving Cohesion

Single Responsibility Principle: Ensure each module or class has only one reason to change. This keeps responsibilities focused.
Modular Design: Break down larger classes or modules into smaller, more focused ones.
Clear Interfaces: Define clear and specific interfaces for classes and modules.
Encapsulation: Encapsulate related data and methods together, keeping unrelated ones separate.
Refactoring: Regularly refactor code to improve cohesion by identifying and separating unrelated responsibilities.

Summary

Cohesion is a fundamental principle in software engineering that affects the maintainability, understandability, and reusability of code. By aiming for high cohesion, we can create more modular and robust systems. Using a combination of qualitative assessments and quantitative metrics, such as LCOM and CoM, we can measure and improve the cohesion of our code. Remember, a highly cohesive module is one where the elements work together to perform a single task effectively. This can be achieved by clearly separating responsibilities, as demonstrated in the improved example of the e-commerce application’s Order module.

Conclusion

By understanding and measuring cohesion, software developers can create code that is easier to maintain, understand, and reuse. High cohesion leads to better software quality, reduced complexity, and more robust systems. Use the tips and techniques discussed in this blog to assess and improve the cohesion of your own codebases. Happy coding!