Functional Data & ML Engineering, Evolutionary Data Architectures; ThDPTh #13

Why functional data engineering is the right approach to batch ETL, Machine Learning can use a functional approach as well and how to build evolutionary data architectures.

--

Data will power every piece of our existence in the near future. I collect “Data Points” to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

🎄 (1) Functional Data Engineering

Two years ago, Maxime Beauchemin, the creator of both Apache Airflow and Superset published an article about why the functional paradigm is as important in data engineering as it is in software engineering. I very much agree and I feel this idea is still not completely absorbed by the community. Indeed I think it carries over to machine learning just as well, where true functional programming usually isn’t the case.

Functional programming avoids states & mutable data; Good functions are “wrappers” that are testable, unlike a lot of what happens in data pipelines. I particularly agree with the following quote:

“Thinking of partitions as immutable blocks of data and systematically overwriting partitions is the way to make your tasks functional. A pure task should always fully overwrite a partition as its output.”

--

--

Sven Balnojan
Three Data Point Thursday MOVED => www.thdpth.com

Head of Marketing @ Arch | Data PM | “Data Mesh in Action” | Join my free data newsletters at http://thdpth.com/ and http://finishslime.com