Understanding Cohesion in Software Engineering
Cohesion is a crucial concept in software engineering, representing how closely related and focused the responsibilities of a single module or class are. High cohesion within a module or class leads to better maintainability, understandability, and reusability of code. In this blog post, we will delve into what cohesion is, how it can be measured, and why it matters, illustrated with simple and real-world examples.
What is Cohesion?
Cohesion refers to the degree to which the elements inside a module or class belong together. High cohesion means that the elements within a module are highly related and work together to perform a specific task, while low cohesion indicates that the elements have little in common and perform many unrelated tasks.
Why is Cohesion Important?
- Maintainability: High cohesion makes code easier to maintain because changes are likely to be localized within a module.
- Understandability: When a module performs a single task, it is easier to understand.
- Reusability: Modules with high cohesion are often more reusable because they perform a single, well-defined task.
How to Measure Cohesion?
Measuring cohesion can be done both qualitatively and quantitatively. Let’s explore both approaches.
Qualitative Assessment
- Expert Judgment: Experienced developers can often assess the cohesion of a module by reviewing its code and responsibilities. If a module has a single responsibility and all its elements are working towards fulfilling that responsibility, it is said to have high cohesion.
Quantitative Metrics
Several metrics can be used to quantify cohesion. Here are some of the most common:
Lack of Cohesion in Methods (LCOM)
LCOM metrics measure the dissimilarity of methods within a class based on their access to instance variables.
LCOM1: Counts the number of pairs of methods that do not share any fields minus the number of pairs of methods that do share fields.
LCOM1 = ( number of method pairs not sharing a field) −(number of method pairs sharing a field) / 2
A higher LCOM1 value indicates lower cohesion.
LCOM2: Measures the number of connected components in a graph where methods are nodes and edges represent shared attributes. The more disconnected the graph, the lower the cohesion.
LCOM3: Defined as the number of disjoint sets of methods (disconnected graph components).
Example of LCOM3 Calculation
Consider a class with four methods: M1, M2, M3, and M4. Suppose M1 and M2 share an attribute, and M3 and M4 share another attribute. This results in two disjoint sets of methods: {M1, M2} and {M3, M4}.
LCOM3 = number of disjoint sets = 2
A high LCOM3 value indicates low cohesion because it implies many disjoint sets of methods within the class.
Example of LCOM1 Calculation
Consider a class with three methods and three instance variables:
class Example:
def __init__(self):
self.var1 = 0
self.var2 = 0
self.var3 = 0
def method1(self):
return self.var1
def method2(self):
return self.var1 + self.var2
def method3(self):
return self.var2 + self.var3
To calculate LCOM1:
- Method pairs that share fields: (method1, method2) and (method2, method3)
- Method pairs that do not share fields: (method1, method3)
LCOM1 is calculated as:
LCOM1 = (1−2)/2 = −0.5
Since negative LCOM1 values are adjusted to 0, this indicates the class has high cohesion.
Cohesion of Methods (CoM)
Measures the ratio of the number of methods that use each attribute to the total number of methods. High CoM values suggest high cohesion.
CoM = ( number of method-attribute pairs ) / no. of methods * no. of attributes
Where:
- Method-attribute pairs: The total number of times that methods access or use class attributes.
- Number of methods: The total number of methods in the class.
- Number of attributes: The total number of attributes in the class.
Example Calculation of CoM
class Example:
def __init__(self):
self.var1 = 0
self.var2 = 0
self.var3 = 0
def method1(self):
return self.var1
def method2(self):
return self.var1 + self.var2
def method3(self):
return self.var2 + self.var3
In this Example
class:
method1
usesvar1
method2
usesvar1
andvar2
method3
usesvar2
andvar3
Let’s break down the calculation step-by-step:
Count the number of method-attribute pairs:
method1
usesvar1
(1 pair)method2
usesvar1
andvar2
(2 pairs)method3
usesvar2
andvar3
(2 pairs)
Total number of method-attribute pairs = 1 + 2 + 2 = 5
Count the number of methods:
- There are 3 methods:
method1
,method2
,method3
Count the number of attributes:
- There are 3 attributes:
var1
,var2
,var3
Calculate CoM:
CoM= number of method-attribute pairs / ( number of methods×number of attributes )
= 5 / 3 * 3 = 5/9 = 0.56
A CoM value of 0.56 indicates a moderate level of cohesion. This means that while the methods in the class are somewhat related, there is still room for improvement in terms of making the class more focused and cohesive.
Interpretation of CoM Values
- High CoM Value: Indicates high cohesion, meaning methods frequently share attributes and the class is likely well-focused on a specific task.
- Moderate CoM Value: Suggests moderate cohesion. Methods share some attributes, but the class could be further refined for better focus and cohesion.
- Low CoM Value: Indicates low cohesion, meaning methods rarely share attributes and the class may be handling too many unrelated tasks.
Why is CoM Important?
CoM is important because it provides a quantitative measure of how well a class or module adheres to the principle of cohesion. High cohesion within a class ensures that the class has a single, well-defined purpose and makes the code easier to maintain, understand, and reuse.
Structural Metrics
- Data Dependency: Evaluates how data is shared among methods within a class. High cohesion is indicated by frequent data sharing among methods.
- Graph-Based Metrics: Uses graphs where nodes represent methods and edges represent shared data or interactions. Cohesion is measured by the density of connections within the module.
Conceptual Metrics
- Similarity-Based Measures: Uses natural language processing (NLP) to assess the similarity of names and comments within methods of a module. Higher similarity suggests higher cohesion.
- Semantic Analysis: Analyzes the semantics of the code to ensure that methods and variables serve a common purpose.
Tools and Automation
Several tools can automate the calculation of cohesion metrics, providing objective insights into the cohesion of your codebase:
- SonarQube: A popular static analysis tool that measures various code quality metrics, including cohesion.
- Understand: A static analysis tool that provides detailed metrics on code quality.
- CodeMR: A tool specifically designed for measuring and visualizing code metrics, including cohesion.
Real-World Example: E-commerce Application
Imagine an e-commerce application where we have different modules like Product
, Order
, and User
. We'll focus on the Order
module to demonstrate cohesion.
Low Cohesion Example:
In a low cohesion scenario, the Order
class might handle too many unrelated tasks, like processing payments, managing shipping, and handling customer notifications.
class Order:
def __init__(self, order_id, customer, product_list):
self.order_id = order_id
self.customer = customer
self.product_list = product_list
self.payment_status = "Pending"
self.shipping_status = "Not Shipped"
def process_payment(self, payment_info):
self.payment_status = "Paid"
def ship_order(self):
self.shipping_status = "Shipped"
def notify_customer(self):
print(f"Order {self.order_id} for {self.customer} is {self.payment_status} and {self.shipping_status}")
def calculate_total_price(self):
total_price = sum(product['price'] for product in self.product_list)
return total_price
Here, the Order
class has low cohesion because it handles payment processing, shipping, customer notifications, and price calculations, which are unrelated tasks.
High Cohesion Example:
To improve cohesion, we can split the responsibilities into separate classes:
class Order:
def __init__(self, order_id, customer, product_list):
self.order_id = order_id
self.customer = customer
self.product_list = product_list
def calculate_total_price(self):
total_price = sum(product['price'] for product in self.product_list)
return total_price
class Payment:
def __init__(self):
self.payment_status = "Pending"
def process_payment(self, payment_info):
self.payment_status = "Paid"
class Shipping:
def __init__(self):
self.shipping_status = "Not Shipped"
def ship_order(self):
self.shipping_status = "Shipped"
class Notification:
@staticmethod
def notify_customer(order_id, customer, payment_status, shipping_status):
print(f"Order {order_id} for {customer} is {payment_status} and {shipping_status}")
In this improved version, each class has a single responsibility:
Order
handles order details and price calculation.Payment
handles payment processing.Shipping
handles order shipping.Notification
handles customer notifications.
Calculating Cohesion Metrics
Example for LCOM (Lack of Cohesion in Methods)
Let’s calculate LCOM1 for the original Order
class:
- Attributes:
order_id
,customer
,product_list
,payment_status
,shipping_status
- Methods:
process_payment
,ship_order
,notify_customer
,calculate_total_price
Pairs of methods sharing attributes:
process_payment
andnotify_customer
sharepayment_status
ship_order
andnotify_customer
shareshipping_status
calculate_total_price
sharesproduct_list
with__init__
Pairs of methods not sharing attributes:
process_payment
andship_order
process_payment
andcalculate_total_price
ship_order
andcalculate_total_price
Let’s assume there are 3 pairs of methods sharing attributes and 3 pairs not sharing attributes.
LCOM1 = (number of method pairs not sharing a field−number of method pairs sharing a field) / 2
=3−3/2=0
An LCOM1 value of 0 indicates high cohesion, but this is a simple example. In more complex scenarios, calculating LCOM can reveal low cohesion more clearly.
Example for CoM (Cohesion of Methods)
For the original Order
class:
process_payment
usespayment_status
ship_order
usesshipping_status
notify_customer
usesorder_id
,customer
,payment_status
, andshipping_status
calculate_total_price
usesproduct_list
Number of method-attribute pairs:
process_payment
: 1ship_order
: 1notify_customer
: 4calculate_total_price
: 1
Total method-attribute pairs: 7
Total methods: 4
Total attributes: 5
CoM = number of method-attribute pairs / (no. of methods * no. attributes )
= 7 / 4*5 = 7/20 = 0.35
A CoM value of 0.35 indicates moderate cohesion, but there’s room for improvement.
Tips for Improving Cohesion
- Single Responsibility Principle: Ensure each module or class has only one reason to change. This keeps responsibilities focused.
- Modular Design: Break down larger classes or modules into smaller, more focused ones.
- Clear Interfaces: Define clear and specific interfaces for classes and modules.
- Encapsulation: Encapsulate related data and methods together, keeping unrelated ones separate.
- Refactoring: Regularly refactor code to improve cohesion by identifying and separating unrelated responsibilities.
Summary
Cohesion is a fundamental principle in software engineering that affects the maintainability, understandability, and reusability of code. By aiming for high cohesion, we can create more modular and robust systems. Using a combination of qualitative assessments and quantitative metrics, such as LCOM and CoM, we can measure and improve the cohesion of our code. Remember, a highly cohesive module is one where the elements work together to perform a single task effectively. This can be achieved by clearly separating responsibilities, as demonstrated in the improved example of the e-commerce application’s Order
module.
Conclusion
By understanding and measuring cohesion, software developers can create code that is easier to maintain, understand, and reuse. High cohesion leads to better software quality, reduced complexity, and more robust systems. Use the tips and techniques discussed in this blog to assess and improve the cohesion of your own codebases. Happy coding!