Part 1 — Modernising a Data platform & BigQuery concepts

Nikhil (Srikrishna) Challa
5 min readJul 5, 2020

--

Modernisation is all about adopting to the latest trends without deviating from the core principles

I always wonder the information on Internet is so vast and yet disparate which makes the life a little tougher for anyone who is keen to gather a study material. Expecting some practical examples in the available documents, makes it even tougher.

Since I was one of you all who faced similar challenges while learning, I thought it would be a good idea to bring all the information that I gathered over months and years and apply my experience to it with an expectation that the readers will be able to relate, visualise and more importantly learn and find it useful.

In this series, I intend to write and share my experience about modernisation of data warehouses. I believe that, with the advent of Cloud computing, the next big thing in the world of data is “Modernisation”.

On a fine Sunday morning, when I started talking to my wife about some of these topics, my son asked me, “Daddy, what did you just say? “Modetation”!? (Well! Yeah, the jargon has caught his attention. It didn’t take me long to realise that he meant, “Modernisation”)

I have given him a chalk and a slate to write the alphabets. He immediately said, “Why do I need them as I can use an iPad and do it with my hand”. I said, you can, yes and that is called ‘modernisation” — The process of adopting something that is most recent and not continuing to persist in the remote past.

Applying the same principle to data, a Data warehouse modernisation is the process of adopting the modern practices which is all about making the lives of users/practitioners easier and make the technology accessible to people with less expertise. I would also say that the democratization and modernisation go hand in hand. (Not getting into the details as that is good to be left for another day)

Below is the list of contents for this series (It can possibly be more than what is mentioned below — Stay tuned!)

Part 1 — Introduction to Data warehouse

Part 2 — Explaining Data Warehouse with a practical example

Part 3 — What does it take for a DWH to be modernised?

Part 4 — The Technology behind Modern Analytical Platforms

Part 5 — Key concepts of BigQuery — I

Part 6 — Key concepts of BigQuery — II

Part 7 — Key concepts of BigQuery — III

Part 8 — TBD

Data warehouse as a concept was first introduced in 1980’s when the theory of data driven decisions has started becoming prominent.

The system journal released by IBM, “An architecture for a business information system” has then introduced a new term called “Business Data warehouse”.

An excerpt from the article, below:

“The transaction-processing environment in which companies maintain their operational databases was the original target for computerisation and is now well understood. On the other hand, access to company information on a large scale by an end user for reporting and data analysis is relatively new. Within IBM, the computerisation of informational systems is progressing, driven by business needs and by the availability of improved tools for accessing the company data.”

There are few key terms which has really paved way for the modernised DWH architecture that we are seeing today, such as “Processing environment”, “Data of large scale”, “Compute”, “fast analytical processing” etc.,

That is what today’s modern data warehouses is all about — Fast processing, accommodate large, fast, big or whatever the data is, easier integration with upstream and downstream systems.

A simple representation of a DWH architecture

Three key aspects of a traditional DWH are:

· Data Collection

· Data Storage

· Data Analysis

This time, my son hasn’t asked me, but if he does, I would probably explain it to him with following analogy:

He likes stories and any kid would, for that matter. Additionally, my son likes story telling which involves real-life characters to make it more interesting.

In order to tell a story, I would first gather the names of all his friends and people whom he interacts with on a regular basis — This is the phase where I am gathering data/collecting data from different incidents that he is a part of.

Normally I remember all those names (information/data) which means it is all there in my brain, but because of the other mess that I normally get associated with , I might actually forget those names by the next round of story-telling and hence, I save those names in a notepad — This phase is called Data Storage.

I then try and analyse the incidents to come up with an interesting plot or a story which may or may not satisfy him, but I will certainly have a possibility to refine it or add characters to the story in order to make it more interesting — The Data Analysis phase.

In the process of story-telling, I mould the characteristics of the people in the story to match a narrative — This is called the data transformation.

Applying similar concepts to our original topic, organisational data is scattered across places. Customer data is in data source 1, transactional data is in data source 2, call centre data in data source 3, so on and so forth. I will need to bring them all together to get a 360 degree view of what is happening in my organisation. This is the core concept of a data warehouse.

DWH is basically a 3 tier architecture:

· Storage layer — To store the data

· Compute layer — To process the data

· Service layer — To present the insights/Tell the story

contd…

--

--