ETL in the age of LLMs

Thibaut Gourdel
3 min readJul 11, 2024

ETL is decades old and has probably never seen such a significant technological shift in some of its core concepts. While ETL has evolved over the years, transitioning from on-premises to cloud-based architectures and embracing ELT, these changes have largely relied on the same fundamentals. However, the advent of GenAI and LLMs is truly transformative, impacting both the development of ETL through new kinds of automations and how ETL functions with access to intelligent new capabilities. Let’s explore these two angles and examine what is evolving:

Automated and More Efficient Development

  • 🗣️ Natural language: The first and obvious use case we can think of is the use of natural language to generate pipelines or components of pipeline. Data and ETL pipelines often involve repetitive tasks, and LLMs are typically effective at generating working code for these tasks. Additionally, LLMs can automate complex tasks such as data mapping by understanding patterns and relationships which was extremely complicated before. The ability to use natural language for generating pipelines and other data tasks is definitely where LLMs have a strong impact and we’re headed in the data space.
  • 📖 Automated documentation: Documentation has always been a critical aspect of data pipeline development. The data that ETL processes extract and load is…

--

--

Thibaut Gourdel

I write about data engineering and ETL. I'm building Amphi, a low-code python-based ETL for data manipulation and transformation.