ON Data Engineering

Python’s Data Classes a Data Engineer’s best friend

Data Engineering application of data classes

Julien Kervizic
Hacking Analytics
Published in
5 min readJul 15, 2021

--

Photo by Nam Hoang on Unsplash

Data classes are a relatively new introduction to Python, first released in Python 3.7 which provides an abstraction layer leveraging type annotations to define container objects for data. Compared to a normal Python class, data classes make do of some of the syntactic sugar for instantiation, and there are a number of areas where data class can add value to data engineering.

Understanding Data Classes

Data classes

The data class library introduces a lightweight way to define objects, providing getters and setters for the different fields define within it.

from dataclasses import dataclass@dataclass
class CustomerDataClass:

As shown above, it relies on a decorator pattern to wrap around classes and enrich them with specific features.

Data class and field definitions

The data class leverages a series of fields defined within the class along with their Python-type annotations.

@dataclass
class CustomerDataClass:
customer_id: int

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com