Member-only story

Dani
5 min readJul 4, 2022

--

Read the full article for free at: https://www.arecadata.com.

High-performance open-source Data Lakehouse at home

Components of the Data Lakehouse

Ever wanted to deploy your own Data Lake and on top of it a so-called Lakehouse architecture? The good news is, that now it’s easier than ever with tools like Minio, Trino (with its multitude of connectors), and others. In this article we’ll cover how these components actually fit together to form a “Data Lakehouse” and we’ll deploy an MVP version via Docker on our machine to run some analytical queries.

Code showcased is available here: https://github.com/danthelion/trino-minio-iceberg-example

Data Lake? Lake House? What the hell?

Different Data architectures (source: https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html)

The term “Data Lakehouse” was coined by Databricks and they define it as such:

In short, a Data Lakehouse is an architecture that enables efficient and secure Artificial Intelligence (AI) and Business Intelligence (BI) directly on vast amounts of data stored in Data Lakes.

Basically, if you have a ton of files laying around in an object storage such as s3 and you would like to run complex analytical queries over them, a Lakehouse can help you achieve…

--

--

Responses (1)