Data Lakehouse vs. Data Lake
What are the Differences and how they are build up on each other
With the replacement of the classic Data Warehouse by new modern often cloud-based systems such as Data Lakes, certain problems are occuring. Because a Data Lake is a large container of all possible and often still raw data, these can not be used well for e.g. Self Service BI tools. This is where the Data Lakehouse comes into play. Data Lakehouses are a mixture of Data Lakes and classical Data Warehouses.
Data Warehouses and Data Lakes
Data Lakes and Data Warehouses are established terms when it comes to storing Big Data, but the two terms are not synonymous. As said before Data Lake is a large pool of raw data for which no use has yet been determined. A Data Warehouse, on the other hand, is a repository for structured, filtered data that has already been processed for a specific purpose [1].
While Data Warehouses use the classic ETL process in combination with structured data in a relational database, a Data Lake uses paradigms such as ELT and a schema on read as well as often unstructured data [2].