TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Introspecting Databases with Airflow and SQLAlchemy

6 min readApr 13, 2021

--

Photo by Paweł Bukowski on Unsplash

For a recent data migration project that utilized Airflow, I needed to connect to a database and automatically introspect its schemas and tables. One database that needed to be introspected had hundreds of schemas. And each schema had dozens of varying tables, most of which had different columns and column constraints.

In other words, we had database metadata made of of tens- or hundreds of thousands of objects that no one would manually type out into config files or hard-coded Python lists. Knowing that Airflow uses the SQLAlchemy module to communicate with databases as dialect- and engine-agnostically as possible, I had a hunch I could leverage Airflow’s SQLAlchemy connection to the database to generate the lists of schemas, tables, columns, or whatever database objects I needed to access.

My hunch was right! SQLAlchemy and Airflow’s DB API hook make it easy to get a hold of the SQLAlchemy engine for any connected database.

Preliminaries

There are a couple preliminary steps to follow along with this demonstration. We’ll use PostgreSQL, but just about any common database will do.

Install Airflow

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (1)