Data Warehousing Basics For Food Junkies

Krishna Kanth
Beginner @ Data Science
4 min readOct 21, 2014

--

I took a long time, like more than a month, to come back to journaling about BI here in this collection. Now, why is that?

That’s because, all along this time, I’ve been taking a beginners tour in the land of Data Warehousing and Business Intelligence, and believe me, it is one-of-a-kind domain in the world of Information Technology.

Today, I shall briefly describe how I acquainted with Data Warehousing, and in turn I shall try to explain its basic definition in a practical way.

Data Warehouse:

We shall first take a look at the actual definition:

“A warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of data in support of management’s decision making process.”

Ok, I know. That was a bit technical, so, below I tried to demonstrate it in a real life scenario, explaining each of its four main features.

A Data Warehouse (DWH, in short) is something that accepts data from multiple sources and in various formats. It stores all that data and helps us in achieving something with it in the future. So, in short, a Data Warehouse is like our refrigerator.

A refrigerator is used to store various kinds of food items that come from different sources – some would be vegetables, some would be meat, some would be eggs, some would be dairy products, some would be beverages, some would be chocolates, and some would be something else.

  1. Subject oriented:

In a fridge, we don’t mix up stuff; we seperate the items by type – vegetables, fruits, milk, meat and etc. We store them in different bowls and containers to differentiate and to quickly and easily access them.

That’s the Subject oriented feature of a DWH.

2. Integrated:

Food items would have different names in different languages, and so people would be addressing the same item differently according to their comfort and habit, but ultimately they all mean the same thing.

For example, consider the following data:

> Male and Female
> M and F

Although they appear different, they both represent gender information only. So, instead of using two different representations, you can generalize them with one specific notation.

So, that’s Integration in a DWH.

All the data stored inside a DWH would be generalized and is brought under one roof for all of its related types.

3. Time Variant:

All the items that you store in a refrigerator were placed there at a particular time – sometimes which goes back to years as well. If you want to take a look at how long has a product been in existence, you can just take a look at its manufactured date, or to look at how long can it be used without bad effects, you will only have to see for its expiration date. This is all possible only when you keep the item, and not modify it or throw it away.

So, the Time Variant feature of a DWH says that it stores data from the long gone past – data in a historical manner.

4. Non-Volatile:

And once something is placed inside a refrigerator, we don’t change it often. We keep it there until we pick it up and use it.

So, the Non-Volatile feature says that the data placed in a DWH is not modified. It is kept the way it came originally. We will refine it, yes, but we don’t change anything. We don’t change the identity of the product, neither its name nor its details. We leave the facts as they are, and we just use them as needed.

How is a DataWarehouse different from a Database:

Whenever we talk about data storage, we usually tend to think about a database. But do not confuse a Data Warehouse with a Database. There is a difference.

While a Data Warehouse is like a refrigerator, a Database would be something like a well-served dinner plate. It holds the items placed in a proper edible order, cooked and ready to be consumed. It is used for our daily meals.

In a DB you modify data very regularly, insert new records, and update the old and original ones. But in a DWH, you almost never modify the original data, you even store data with redundancies, where as in a DB environment redundancies are almost always avoided. As you would be knowing, a DB is filled with data that is normalized. But in a DWH, the data storage is always de-normalized. What more proof do you need to always remember that they both are different from eachother!

So, in a nutshell, a Database is mainly used for an organization’s daily operations and transactions (OLTP), where as a Data Warehouse collects all the information produced from such daily operations, and keeps it for future reference and analysis purposes (OLAP).

That’s a brief introduction I could come up with about Data Warehousing, in a practical manner. To know more, go explore!

And just in case, if all the above explanation seems absurd, forget it and go get something to eat from your refrigerator, and when you do, observe!

--

--