What is a Data Lakehouse?

New Paradigm or just a Buzzword?

Christianlauer
Geek Culture

--

Photo by Luca Bravo on Unsplash

What are Data Lakehouses? Just another buzzword or actually the successor to Data Lakes and Warehouses? In order to combine the advantages of Data Warehouses and Lakes, many companies have developed a hybrid BI environment. They store raw data in Data Lakes, while loading parts of it into the Data Warehouse as needed. The Data Lakehouse should combine the advantages of Data Lakes and Data Warehouses into a hybrid concept. The two systems are not operated side by side, but as a novel single system.

Data Warehouses vs. Data Lakes

Both, Data Lakes and Data Warehouses are established terms when it comes to storing Big Data, but the two terms are not synonymous. A Data Lake is a large pool of raw data for which no use has yet been determined. A Data Warehouse, on the other hand, is a repository for structured, filtered data that has already been processed for a specific purpose [1].

While Data Warehouses use the classic ETL process in combination with structured data in a relational database, a Data Lake uses paradigms such as ELT and a schema on read as well as often unstructured data [2].

--

--

Christianlauer
Geek Culture

Big Data Enthusiast based in Hamburg and Kiel. Thankful if you would support my writing via: https://christianlauer90.medium.com/membership