Understanding Encapsulation in Object-Oriented Programming with Python

Python for AI, data science and machine learning Day 4

Published in

Data Bistrot

10 min readMar 26, 2024

As whe have seen in Understanding Object-Oriented Programming in Python, object-oriented programming (OOP) is a paradigm that uses “objects” to design applications and computer programs. It utilizes several key concepts, including encapsulation, inheritance, and polymorphism, to increase the modularity and flexibility of code. In this article, we’ll focus on encapsulation, a fundamental aspect of OOP that helps in achieving data hiding and abstraction.

What is Encapsulation?

Encapsulation is the mechanism of bundling the data (attributes) and methods (functions) that operate on the data into a single unit, known as an object. It restricts direct access to some of an object’s components, which can prevent the accidental modification of data. To understand encapsulation, let’s break down its key features:

Data Hiding: An object’s internal state is hidden from the outside world. This is also referred to as information hiding.
Access Restrictions: External code cannot directly access the object’s internal state. Instead, it must use specific methods provided by the object (like getters and setters) to read or modify its state.
Simplification: By interacting with an object through a well-defined interface, the complexity of the system is reduced, making it easier to understand and maintain.

Encapsulation in Python

Python’s approach to encapsulation is somewhat unique. Unlike languages such as C++ or Java, Python does not have keywords like public, private, or protected to explicitly enforce access restrictions.

Python adopts a more open approach to class data and methods, essentially treating all as public. This design choice might seem unconventional to those accustomed to the strict access controls of other languages, but it embodies a core Python philosophy: “we’re all consenting adults.”

The principle “we’re all consenting adults” extends beyond mere code structure; it reflects the ethos of the Python community at large. It suggests that developers should trust one another to use class attributes and methods responsibly, rather than enforcing strict barriers through access modifiers like private or protected, as seen in languages like Java.

However, Python supports a convention to achieve a similar effect.

Naming Conventions in Python

Single Leading Underscore: `_`

The first and most widely recognized convention is the use of a single leading underscore (_) to denote attributes or methods that are not intended to be part of the public interface of a class. These are internal implementations that are subject to change and should not be relied upon directly by outside code.

class MyClass:
    def public_method(self):
        print("This is a public method")
    
    def _internal_method(self):
        print("This is an internal method")

In this example, _internal_method is meant to be used internally within MyClass or by subclasses, indicating that it's not part of the stable public interface. While nothing in Python prevents access to _internal_method, the underscore serves as a clear signal to other developers that it should be used with caution.

Single Underscore (`_`) Use Cases

In this example, _connect_to_database is an internal method of DatabaseConnector meant to be used by other methods within the class, like connect, and not directly accessed from outside the class.

class DatabaseConnector:
    def connect(self):
        self._connect_to_database()  # Internal use indicated by single underscore

    def _connect_to_database(self):
        print("Connecting to the database")

In Python, a single underscore is also used to indicate that a variable is temporary or insignificant. This is commonly seen in loops or unpacking expressions where one or more values are intentionally unused.

for _ in range(5):
    print("Repeat action")

_, value = (1, 2)  # Only interested in the second value

Here, _ is used to ignore the loop variable and the first value in the tuple, focusing on the repetition and the value of interest, respectively.

Double Leading Underscore: `__`

Python also uses a naming convention involving a double leading underscore (__) to create a stronger indication of privacy. This triggers a feature known as name mangling, where the interpreter changes the name of the variable in a way that makes it harder (but not impossible) to access from outside the class.

class MyClass:
    def __init__(self):
        self.__private_var = "This is a private variable"

    def __private_method(self):
        print("This is a private method")

With name mangling, __private_var and __private_method are internally renamed to _MyClass__private_var and _MyClass__private_method, respectively. This mechanism doesn't make the attribute or method truly private but does signal a stronger intent of non-public access, effectively serving as a deterrent to direct access from outside the class. The intent is to create a stronger indication of a “private” member that should not be accessed directly from outside the class, protecting its internal state and behavior.

Double Underscore (`__`) Use Cases

In this example, __balance is intended to be accessed only through the methods of the Account class, like deposit, safeguarding the integrity of the account balance.

class Account:
    def __init__(self, balance):
        self.__balance = balance  # Name mangling to make it harder to access directly

    def deposit(self, amount):
        if amount > 0:
            self.__balance += amount  # Accessing the mangled name internally

account = Account(100)
# Direct access to '__balance' would fail; Python mangles the name to '_Account__balance'

Double underscores can also be used to avoid naming conflicts in subclasses that might inadvertently override base class attributes or methods.

class Base:
    def __init__(self):
        self.__hidden = 'Base'

class Derived(Base):
    def __init__(self):
        super().__init__()
        self.__hidden = 'Derived'  # Does not actually override 'Base.__hidden'

base = Base()
derived = Derived()
# 'Base.__hidden' and 'Derived.__hidden' are two different attributes due to name mangling.

Here, __hidden in both Base and Derived classes are mangled differently, ensuring that the Derived class's __hidden attribute doesn't override the Base class's __hidden attribute, preventing unintended side effects.

In conclusioone, the use of single (_) and double (__) underscores in Python serves different purposes. A single underscore is a hint for internal use or to ignore values, fostering a convention-based approach to indicate the intended scope of use. The double underscore, through name mangling, provides a stronger barrier to indicate "private" members and avoid naming conflicts, aligning with the principle of "We are all adults here" while still protecting the class's internal workings.

Let’s delve into some other examples to better understand how encapsulation works in Python.

class Test:
    def __private_symbol(self):
        print("This is a private method.")

    def normal_symbol(self):
        print("This is a normal method.")

# Accessing the public method
t = Test()
t.normal_symbol()  # Outputs: This is a normal method.

# Attempting to access the private method directly will result in an AttributeError
# t.__private_symbol()  # Uncommenting this line will raise an error

# However, the private method can still be accessed through its mangled name
t._Test__private_symbol()  # Outputs: This is a private method.

For a method or variable named __private_symbol in a class Test, Python mangles the name to _Test__private_symbol. This is why, when you inspect the directory of the class using dir(Test), you see the mangled name _Test__private_symbol instead of __private_symbol.

Understanding the Convention

Private Members: Should be considered as non-public parts of the API. They are meant to be internal to the class and subject to change without notice. Therefore, external use of these members is discouraged to maintain code compatibility and prevent accidental misuse.
Name Mangling: This mechanism is not intended as a way to securely hide information. Instead, it’s a way to help avoid naming conflicts in subclasses that might inadvertently overwrite private members of superclasses.

Consider a simple class BankAccount that represents a bank account:

class BankAccount:
    def __init__(self, account_number, balance=0):
        self.__account_number = account_number  # private attribute
        self.__balance = balance  # private attribute

    def deposit(self, amount):
        if amount > 0:
            self.__balance += amount
            print(f"Amount {amount} deposited successfully.")
        else:
            print("Deposit amount must be positive.")

    def withdraw(self, amount):
        if amount > 0 and amount <= self.__balance:
            self.__balance -= amount
            print(f"Amount {amount} withdrawn successfully.")
        else:
            print("Insufficient balance or invalid withdrawal amount.")

    def get_balance(self):
        return self.__balance

    def set_balance(self, balance):
        if balance >= 0:
            self.__balance = balance
        else:
            print("Balance cannot be negative.")

In this example, the BankAccount class encapsulates the __account_number and __balance attributes, making them private. This prevents direct access from outside the class, thereby enforcing encapsulation. The class provides public methods like deposit(), withdraw(), and getter/setter methods for __balance to interact with these private attributes.

Encapsulation in a Data Science Context: A Python Example

In data science, encapsulation can be particularly useful for managing and manipulating datasets in a secure and structured manner. Let’s consider a practical example where encapsulation is applied in a data science project.

Imagine we’re working with a dataset that tracks user engagement metrics for a website. Our goal is to encapsulate the dataset and its manipulation methods within a class, providing a clear and simple interface for data operations while hiding the complexity of data processing.

Example: User Engagement Metrics Class

We’ll create a class UserEngagementData that encapsulates user engagement data for a website. This class will offer methods to add new data, calculate average engagement metrics, and retrieve specific data points, all while keeping the raw data private.

import pandas as pd

class UserEngagementData:
    def __init__(self):
        # Initialize a private DataFrame to store user engagement data
        self.__data = pd.DataFrame(columns=['user_id', 'page_views', 'time_spent'])

    def add_engagement_data(self, user_id, page_views, time_spent):
        """Add a new record of user engagement data."""
        new_data = {'user_id': user_id, 'page_views': page_views, 'time_spent': time_spent}
        self.__data = self.__data.append(new_data, ignore_index=True)

    def average_engagement(self):
        """Calculate average page views and time spent across all users."""
        if not self.__data.empty:
            return {
                'average_page_views': self.__data['page_views'].mean(),
                'average_time_spent': self.__data['time_spent'].mean()
            }
        else:
            return {'average_page_views': 0, 'average_time_spent': 0}

    def get_user_engagement(self, user_id):
        """Retrieve engagement data for a specific user."""
        user_data = self.__data[self.__data['user_id'] == user_id]
        if not user_data.empty:
            return user_data.iloc[0].to_dict()
        else:
            return "User data not found."

# Example usage
engagement_data = UserEngagementData()
engagement_data.add_engagement_data(user_id=1, page_views=5, time_spent=120)
engagement_data.add_engagement_data(user_id=2, page_views=3, time_spent=80)

print(engagement_data.average_engagement())
print(engagement_data.get_user_engagement(1))

In this example, the UserEngagementData class encapsulates the user engagement dataset (stored as a pandas DataFrame) and provides public methods to interact with this data.

This design has several advantages:

Data Integrity: By restricting direct access to the dataset, we prevent accidental modifications from external code, ensuring the dataset’s integrity.
Simplicity: Users of the class do not need to know about the underlying data structure (pandas DataFrame) or the logic used to calculate averages. They interact with the dataset through simple, intuitive methods.
Flexibility: If we decide to change the internal implementation (e.g., switch to a different data storage mechanism), we can do so without affecting the code that uses this class.

Leveraging Property Decorators and @method_name.setter for Encapsulation in Python

In Python, the @property decorator is a built-in decorator that allows us to define methods in a class that can be accessed like attributes. This feature is particularly useful for implementing encapsulation, offering a Pythonic way to use getters and setters for managing access to private variables.

Understanding the Property Decorator

The @property decorator transforms a class method into a property. This means that the method can be accessed as if it were an attribute, without the need to call it like a function. This is particularly useful for two main reasons:

Getter Method: By decorating a method with @property, you can access a private attribute without exposing it directly.
Setter Method: By using the @setter decorator, you can define a setter method that allows you to set the value of a property. This method can include logic for validation or modification of the data being set, providing control over how attributes are modified.

Example: Using Property Decorators for User Profiles

Let’s create a class UserProfile that uses property decorators to manage access to a user's age, ensuring that only valid ages can be assigned.

class UserProfile:
    def __init__(self, name, age):
        self.name = name
        self.__age = age  # Private attribute

    @property
    def age(self):
        """Getter method for age."""
        return self.__age

    @age.setter
    def age(self, value):
        """Setter method for age with validation."""
        if isinstance(value, int) and 0 < value < 150:
            self.__age = value
        else:
            raise ValueError("Age must be an integer between 1 and 149.")

# Example usage
user = UserProfile("John Doe", 25)
print(user.age)  # Accesses the getter method

user.age = 30  # Accesses the setter method
print(user.age)

# Trying to set an invalid age
try:
    user.age = -5
except ValueError as e:
    print(e)  # This will raise the ValueError defined in the setter method

`@property decorator and` @method_name.setter `code explanation`

In this example, the UserProfile class has a private attribute __age that is encapsulated from the outside. The @property decorated age method acts as a getter, allowing us to retrieve the user's age without directly accessing the private variable.

The @age.setter method enables setting the value of __age, with the added benefit of being able to include validation logic to ensure the age is within a reasonable range. The isinstance() function returns True if the specified object is of the specified type. The other part of the validation is the check for Age range.

This use of property decorators and @method_name.setter for encapsulation, not only protects the integrity of the data, but also provides a clear and intuitive interface for interacting with class attributes. It encapsulates the inner workings of the class, exposing only what is necessary and safe to the outside world.

The @property decorator in Python offers an elegant and efficient way to implement encapsulation. By providing a mechanism for controlled access to private variables, it helps maintain the integrity of data while making classes easier to use and maintain. This approach to encapsulation aligns well with the principles of object-oriented programming, emphasizing data hiding and the importance of interfaces.

If you want to know more about about @propertydecorator:

Understanding the @property Decorator in Python

Python’s Most Powerful Decorator

Why Use Encapsulation?

Encapsulation offers several benefits:

Security: Sensitive data is hidden from external access, reducing the risk of unintended modifications.
Simplicity: Clients of an object do not need to understand its internal complexities. They interact with it through a simple interface.
Modularity: Encapsulating data and operations within objects makes the code more modular and easier to maintain.

Encapsulation is a powerful OOP concept that can significantly enhance the structure and security of data science projects. By encapsulating data and operations within classes, we create a more modular, maintainable, and easy-to-use codebase. This approach is particularly beneficial in data science, where managing complex datasets and operations is common.

Encapsulation is a powerful concept in object-oriented programming that helps in structuring the code in a way that bundles data and operations, restricts access to internal states, and provides a clear interface for interaction.

In Python, although encapsulation is not enforced by language-specific keywords, the convention of prefixing attribute names with double underscores effectively serves this purpose or using the property decorator. Understanding and implementing encapsulation can lead to more secure, modular, and maintainable codebases.

Complete series:

10 Days of Python OOP for AI, Data Science, and Machine Learning

Welcome to my series on on Object-Oriented Programming (OOP) Python for AI, Data Science, and Machine Learning. Over…

medium.com