Data Lakes, Data Warehouses & Data Hubs

How to combine them ?

Published in

Pernod Ricard Tech

3 min readMar 15, 2020

Data Lakes, Data Warehouses and Data Hubs are key components of every Data landscape. Knowing strengths and weaknesses of each solution is important. Being able to combine them in an agile way can be a challenge. In this article, you’ll find a proposition to combine them in a simple way.

First, let’s brief agree upon the different component’s definitions.

The Data Lake

The Data Lake is a central repository that contains structured and unstructured data. Data quality is usually low with raw data that are not transformed.

“ A gigantic shared folder with millions of files more or less organized”

The Data Warehouse

The Data Warehouse is a central repository of structured data from multiple sources. Data quality is high, and it is used for reporting and dashboarding. Data are loaded in the Data Warehouse using ETL typically.

“A big database to power reports and dashboards”

The Data Hub

The Data Hub is a central place to share and facilitate data exchanges between applications. Data quality is high. Applications are connected to the Data Hub through API typically

“A database surrounded by APIs to facilitate exchanges with applications”

Comparison

A basic summary of each component:

Combining them together

Since they all serve different purpose, you will quickly end up deploying all of them. Here is a proposition to articulate and position them into your landscape:

Conclusion

You now have the basics to articulate your Data Warehouse, Data Lake and Data Hubs.

Sources of inspiration

Here is a non exhaustive list of articles that were source of inspirations to write this post.

How to differentiate a Data Hub, a Data Lake and a Data Warehouse

Data Hubs are getting more attention as many enterprises are looking at the different solutions in the market to build…

blog.semarchy.com

Data lakes, hubs and warehouses - when to use what

Often in any "technical" field (and I use that term very loosely), it can be quite hard to differentiate between the…

blogs.dxc.technology