Modularize SQL in Jupyter Notebooks Using DuckDB
Enable Modular SQL in Jupyter Notebooks using DuckDB, JupySQL, and Parquet
In April 2022, Meta announced the release of its internal data tool, called SQL Notebooks, combining the capabilities of Jupyter and SQL editors for data analytics. (https://engineering.fb.com/2022/04/26/developer-tools/sql-notebooks/).
The integration of the SQL IDE into Jupyter Notebook has existed for a long time, probably since the inception of Jupyter Notebook’s development. I found the modular SQL functionality to be impressive. Instead of creating and maintaining overly complex SQL scripts that even the author may struggle to understand after writing, the intermediary query results allow for breaking up the SQL into more manageable and understandable scripts
I was eager to incorporate this functionality into my daily data analytics process, so I decided to recreate it using a combination of tools, including Jupyter, DuckDB, JupySQL, and Parquet.
- Jupyter Notebook/Lab is a web-based IDE, mainly used for data analytics. For more information on Jupyter, please refer to https://jupyter.org/.
- DuckDB in-process SQL OLAP database engine. For more information on DuckDB, please refer to my article on the tool —…