Trading Data Analytics — Part 1: First steps with DuckDB and Parquet Files

Xavier Escudero
Quant Factory
Published in
3 min readFeb 9, 2024

--

Data Analytics: MinIO and DuckDB Integration

In this article we’ll integrate DuckDB to obtain the parquet files stored previously in MinIO and execute our first queries.

We’ll use DuckDB as the analytical execution engine that allows us to run SQL queries directly on our existing datasets such as Pandas DataFrames, Polars DataFrames, CSV files, parquet files or traditional databases as Postgres. With that in mind we can focus on using SQL queries to extract the data we want.

Install JupyterLab Desktop App for your operating system. We’ll use it for exploration and experimentation. After conducting experimentation and analysis within JupyterLab, we may transition it to building applications, using for example Streamlit.

Start JupyterLab and select File > New > Notebook.

In a new cell copy the next content:

import duckdb

bucket_name = 'screener'
conn = duckdb.connect()
conn.execute("INSTALL httpfs")
conn.execute("LOAD httpfs")
conn.execute("SET s3_region='us-east-1'")
conn.execute("SET s3_url_style='path'")
conn.execute("SET s3_endpoint='server:9000'")
conn.execute("SET s3_use_ssl=false")
conn.execute("SET s3_access_key_id='minio_user'")
conn.execute("SET s3_secret_access_key='minio_password'")
conn.execute(f"CREATE TABLE screener AS SELECT *…

--

--

Xavier Escudero
Quant Factory

Innovation enthusiast, passionate about automation in several fields, like software testing and trading bots