What are the differences: Data warehouse, Database, Data mart and Data lake
If there is one thing I don’t wish to be asked about, it’s the difference between guarantee and warranty. Can anyone relate?
Words like guarantee and warranty are so interchangeably used that the subtle difference in their meanings gets lost, even though that is precisely what differentiates these words apart. Similarly, in a database management system, you might have heard jargon like “data warehouse”, “database”, “data mart” and “data lake” being thrown around. But how exactly do we distinguish these terms?
Consider that we have three tables, one of which contains a list of humanitarian organisations, another which includes the various sectors that each organisation contributes to, and a third which specifies the amount of funding invested by the organisations into each sector. The combination of these tables form a database, as illustrated by the Fund Transaction database below.
A database has a defined structure consisting of rows and columns. It is preferably used for transactional purposes, and therefore stores data that changes on a real time basis. This could include financial transactions or other activities like password resets.
In addition to Fund Transaction databases like the one above, a company might need several other databases to support its work. These could include a Humanitarian activity progress database or a Sub-sector database for example. When this is the case, a data warehouse collects the data from these various databases and stores it all in a single location.
What’s the purpose of having a data warehouse though?
Suppose a humanitarian organisation makes several financial contributions towards a sector. The transaction table in the Fund Transaction database will update the Amount column value to the latest amount contributed. However, this does not allow you to view the previous transactions as only the most recent value will be added to the table. The data warehouse is therefore used to store historical data, as opposed to a database which stores current data. It is used for analytical purposes, and is the source for reports and dashboards to extract insights.
Now that you have an idea of what databases and data warehouses are, let’s move on to data marts.
A data mart is very similar to a data warehouse, except for the fact that it focuses on a particular subject or department. If for instance you work in the department dedicated to the Finance Sector, you may wish to access only the data relevant to that sector without having to filter through anything else. That’s exactly what the data mart does: it saves processing time by storing only your essential data.
If all the above jargon is clear then let’s proceed to our final term: Data Lakes.
If a data mart is like a swimming pool with fixed boundaries and cleansed water, then a data lake is like flowing water with no limitations. A data lake stores data just like data marts and data warehouses do, but it serves a different purpose. Unlike the others, the data lake stores all forms of data — whether it is structured, semi-structured or unstructured. It is like a data dump with the intention that it might be used in the near future. While a data warehouse is used for repetitive and structured reporting like monthly fund transaction reports or analysing fund transfers by region, data lakes complement the warehouse by providing recommendations as to which organisation has been successfully working towards a particular sector depending on the progress of the humanitarian activity initiated. In doing so, data lakes help in accelerating analytics.
Though I still continue to not know the exact difference between warranty and guarantee, I hope this tech jargon now makes sense to you!
— — — — — — — — — — — — — — —
Did you like this article? Read other technical topics explained in simple language at https://www.fieldsdata.org/blogs
Fields Data is a humanitarian data-preparedness organisation leveraging local expertise to mitigate the effects of disasters.