Modularize SQL in Jupyter Notebooks Using DuckDB

Enable Modular SQL in Jupyter Notebooks using DuckDB, JupySQL, and Parquet

Sung Kim
Geek Culture

--

In April 2022, Meta announced the release of its internal data tool, called SQL Notebooks, combining the capabilities of Jupyter and SQL editors for data analytics. (https://engineering.fb.com/2022/04/26/developer-tools/sql-notebooks/).

The integration of the SQL IDE into Jupyter Notebook has existed for a long time, probably since the inception of Jupyter Notebook’s development. I found the modular SQL functionality to be impressive. Instead of creating and maintaining overly complex SQL scripts that even the author may struggle to understand after writing, the intermediary query results allow for breaking up the SQL into more manageable and understandable scripts

Photo by Elena G on Unsplash

I was eager to incorporate this functionality into my daily data analytics process, so I decided to recreate it using a combination of tools, including Jupyter, DuckDB, JupySQL, and Parquet.

  • Jupyter Notebook/Lab is a web-based IDE, mainly used for data analytics. For more information on Jupyter, please refer to https://jupyter.org/.
  • DuckDB in-process SQL OLAP database engine. For more information on DuckDB, please refer to my article on the tool —…

--

--

Sung Kim
Geek Culture

A business analyst at heart who dabbles in ai engineering, machine learning, data science, and data engineering. threads: @sung.kim.mw