ON Data Engineering
Python’s Data Classes a Data Engineer’s best friend
Data Engineering application of data classes
Data classes are a relatively new introduction to Python, first released in Python 3.7 which provides an abstraction layer leveraging type annotations to define container objects for data. Compared to a normal Python class, data classes make do of some of the syntactic sugar for instantiation, and there are a number of areas where data class can add value to data engineering.
Understanding Data Classes
Data classes
The data class library introduces a lightweight way to define objects, providing getters and setters for the different fields define within it.
from dataclasses import dataclass@dataclass
class CustomerDataClass:
As shown above, it relies on a decorator pattern to wrap around classes and enrich them with specific features.
Data class and field definitions
The data class leverages a series of fields defined within the class along with their Python-type annotations.
@dataclass
class CustomerDataClass:
customer_id: int