Data Lakes, Data Warehouses & Data Hubs

How to combine them ?

Laurent Bel
Pernod Ricard Tech
3 min readMar 15, 2020

--

Data Lakes, Data Warehouses and Data Hubs are key components of every Data landscape. Knowing strengths and weaknesses of each solution is important. Being able to combine them in an agile way can be a challenge. In this article, you’ll find a proposition to combine them in a simple way.

First, let’s brief agree upon the different component’s definitions.

The Data Lake

The Data Lake is a central repository that contains structured and unstructured data. Data quality is usually low with raw data that are not transformed.

“ A gigantic shared folder with millions of files more or less organized”

Data Lake icon that we will use

The Data Warehouse

The Data Warehouse is a central repository of structured data from multiple sources. Data quality is high, and it is used for reporting and dashboarding. Data are loaded in the Data Warehouse using ETL typically.

“A big database to power reports and dashboards”

Data Warehouse icon that we will use

The Data Hub

The Data Hub is a central place to share and facilitate data exchanges between applications. Data quality is high. Applications are connected to the Data Hub through API typically

“A database surrounded by APIs to facilitate exchanges with applications”

Data Hub icon that we will use

Comparison

A basic summary of each component:

Combining them together

Since they all serve different purpose, you will quickly end up deploying all of them. Here is a proposition to articulate and position them into your landscape:

Conclusion

You now have the basics to articulate your Data Warehouse, Data Lake and Data Hubs.

--

--

Laurent Bel
Pernod Ricard Tech

Leading the IT Architecture & Innovation team at Pernod Ricard. Interested in IT technology in general.