Advanced Python: Abstract Base Class & abc Module

Yuvrender Gill
CodeX
Published in
3 min readJan 14, 2022

Python provides a package Abstract Base Class (ABC) to facilitate implementing abstract classes and the module is called abc for obvious reasons. Abstract classes allow the developers to produce the blueprints for subclasses to have certain behavior, so lets see how we do it with use case from a common data engineering scenario…

Photo by Joshua Aragon on Unsplash

Background Context

Consider we are writing code to develop a data format converter to a customer format of your choice, for your machine learning ingestion system. Since data systems are living organisms and their internal chemistry change all the time and there are tons of different data formats available out there it’s best to approach this problem the object oriented way. So let us create a class for our converter.

Now as mentioned earlier the two main design challenges we have are multiple data formats and possible changes in the input data. So to deal with the issues we would want a base class with that provides a blue print to must have functionality in our different converters sub-classes.

Now lets create an abstract class for defining our base converter’s blueprint. Name of my custom format will be Naruto.

class NarutoConverter():
def convert(self): pass
def clean_string(self): pass
class JSONtoNaruto(NarutoConverter):
def __init__(self, input_file):
self.__input = input_file

Our converter has two methods convert to convert the input format to the Naruto format and clean_string method to deal with the varying string quality of the input strings.

Now since we are creating abstract class we don’t want anyone to create instances of this class as it’s just blue print for the convertors to come. However the way we create the class above doesn’t prevent instantiation of this class.

Second issue with the above code is that it doesn’t enforce the subclasses to implement the functionality of convert and clean_string methods. In this case the code above for JSONtoNaruto will not throw an error even though the methods are not implemented. Lets see how can we deal with these issues in next section.

Python’s abc Module

Now let us see how to declare the NarutoConverter with abc library and achieve our goal of abstraction.

from abc import ABC, abstractmethodclass NarutoConverter(ABC):    @abstractmethod
def convert(self): pass
@abstractmethod
def clean_string(self): pass

The basic recipe involves creating an abstract class as subclass of ABC from abc module and using the decorator @abstractmethod before every abstract method declaration. This helps enforce the abstract class functionality to your custom subclass. Now if we try to instantiate the NarutoConverter class, it will raise an exception and prevent instantiation of this abstract class. Now lets see the implementation of the subclass JSONtoNaruto.

class JSONtoNaruto(NarutoConverter):def __init__(self, input_file):
self.__input = input_file

def convert(self):
naruto_file = self.__input
return naruto_file
def clean_string(self):
clean_str = self.__input.str
return clean_str

In the code above we have implemented both the methods. If we run the code without implementing either of the methods then python raises either of the following errors. This behavior is enforced by using the @abstractmethod decorator which prevents class instantiation without method implementation.

TypeError: Can't instantiate abstract class JSONtoNaruto with abstract methods convertTypeError: Can't instantiate abstract class JSONtoNaruto with abstract methods clean_string

Conclusion

Data engineering deals with pipelines and since there are many moving parts in a pipeline DAG, object oriented is very common approach while designing your pipeline code. Designing blueprints of classes is something data engineers deal with on a regular basis (unless your are a functional programming paradigm person) to produce reusable pipeline components. Abstract classes provide an easy way to provide design blueprint for various components of pipelines and keep your code base clean, logical and manageable as it scales.

Make sure to follow and subscribe to have an early access to the content.

About Me: I am a data engineer experienced in GCP, SQL and Python stack helping startups setup and scale their data infrastructure. Want to collaborate? Reach out to me on Linkedin.

--

--

Yuvrender Gill
CodeX
Writer for

I help startups build cutting-edge machine learning and data systems. I believe in impact through education & tech. | MLOps | DevOps | Data Eng | Design |