Advanced Python: Abstract Base Class & abc Module
Python provides a package Abstract Base Class (ABC) to facilitate implementing abstract classes and the module is called abc
for obvious reasons. Abstract classes allow the developers to produce the blueprints for subclasses to have certain behavior, so lets see how we do it with use case from a common data engineering scenario…
Background Context
Consider we are writing code to develop a data format converter to a customer format of your choice, for your machine learning ingestion system. Since data systems are living organisms and their internal chemistry change all the time and there are tons of different data formats available out there it’s best to approach this problem the object oriented way. So let us create a class for our converter.
Now as mentioned earlier the two main design challenges we have are multiple data formats and possible changes in the input data. So to deal with the issues we would want a base class with that provides a blue print to must have functionality in our different converters sub-classes.
Now lets create an abstract class for defining our base converter’s blueprint. Name of my custom format will be Naruto
.
class NarutoConverter():
def convert(self): pass
def clean_string(self): passclass JSONtoNaruto(NarutoConverter):
def __init__(self, input_file):
self.__input = input_file
Our converter has two methods convert
to convert the input format to the Naruto
format and clean_string
method to deal with the varying string quality of the input strings.
Now since we are creating abstract class we don’t want anyone to create instances of this class as it’s just blue print for the convertors to come. However the way we create the class above doesn’t prevent instantiation of this class.
Second issue with the above code is that it doesn’t enforce the subclasses to implement the functionality of convert
and clean_string
methods. In this case the code above for JSONtoNaruto
will not throw an error even though the methods are not implemented. Lets see how can we deal with these issues in next section.
Python’s abc Module
Now let us see how to declare the NarutoConverter
with abc
library and achieve our goal of abstraction.
from abc import ABC, abstractmethodclass NarutoConverter(ABC): @abstractmethod
def convert(self): pass @abstractmethod
def clean_string(self): pass
The basic recipe involves creating an abstract class as subclass of ABC
from abc
module and using the decorator @abstractmethod
before every abstract method declaration. This helps enforce the abstract class functionality to your custom subclass. Now if we try to instantiate the NarutoConverter
class, it will raise an exception and prevent instantiation of this abstract class. Now lets see the implementation of the subclass JSONtoNaruto
.
class JSONtoNaruto(NarutoConverter):def __init__(self, input_file):
self.__input = input_file
def convert(self):
naruto_file = self.__input
return naruto_file def clean_string(self):
clean_str = self.__input.str
return clean_str
In the code above we have implemented both the methods. If we run the code without implementing either of the methods then python
raises either of the following errors. This behavior is enforced by using the @abstractmethod
decorator which prevents class instantiation without method implementation.
TypeError: Can't instantiate abstract class JSONtoNaruto with abstract methods convertTypeError: Can't instantiate abstract class JSONtoNaruto with abstract methods clean_string
Conclusion
Data engineering deals with pipelines and since there are many moving parts in a pipeline DAG, object oriented is very common approach while designing your pipeline code. Designing blueprints of classes is something data engineers deal with on a regular basis (unless your are a functional programming paradigm person) to produce reusable pipeline components. Abstract classes provide an easy way to provide design blueprint for various components of pipelines and keep your code base clean, logical and manageable as it scales.
Make sure to follow and subscribe to have an early access to the content.
About Me: I am a data engineer experienced in GCP, SQL and Python stack helping startups setup and scale their data infrastructure. Want to collaborate? Reach out to me on Linkedin.