Building Data Warehouse. Part 4: Using data warehouse

Sergei Stralenia
5 min readMay 23, 2023

--

On my previous article we stopped on setting up the data warehouse database.

Today I will explain how you can use it to get a value for company business. Also we will discuss data pipeline architecture and its variations.

https://unsplash.com/photos/hpjSkU2UYSU

This is the last article of a series about company data warehouse. In this series I explain:

Layers of pipeline architecture

During this articles I would like to introduce some terms that I skipped at the beginning of these series as I didn’t want to make things complicated for you. But as you are almost a data engineer let deep dive into some new terms that will help you to understand the end of this article.

First is Layer. Layer is a logical part of data pipeline. Usually data pipeline is divided into 3 layers:

  1. Data source layer
  2. Data storage layer
  3. Presentation layer
Regular datapipeline

As we discussed data source and storage layers in my previous articles, it’s time to get acquainted with presentation layer.

Presentation layer

Presentation layer is the last building and most important block in the architecture. Frankly speaking it’s a layer for what we built the entire data pipeline!

It is also called a business intelligence (BI) interface and gives access to our data for end users, that are usually data or business analysts that use a presentation layer to create dashboards with data visualisation using BI tools, build business-reports, use data for data science/machine learning, custom report tooling, etc.

Let’s discuss every tool in detail.

Business Intelligence tool

BI tool is an instrument that helps you work with data from a business perspective:

  • Exploring data
  • Aggregate data (e.g group sales by month and find average)
  • Visualizing data in any way: build charts, trend lines, pivot tables
  • Build dashboards with charts and metrics (e.g sales dashboard with sales by day, sales by month, top sales companies, pending sales, etc)
  • Set alerts on critical data parts (e.g if sales count is less than 100 at the end of the day — send me email of slack notification)

There are plenty of tools on the BI market that you can use: Tableau, Microsoft Power BI, Metabase, Apache Superset, Qlik, etc.

As usual the choice depends on different factors:

  • Price (there are great open source alternatives such as Metabase and Apache Superset)
  • Ecosystem you use (e.g. if your company uses Microsoft products Power BI looks good choice for you)
  • Provided set of functions (Tableau is very powerful and flexible)
  • etc

Business/custom reports

BI tool is a powerful instrument but, of course, it cannot cover all possible needs. For example, if your client wants to get a specific report every Monday on email it’s the case.

As you have cleaned and well-organized data in one place you can build any reporting tools or systems using in-house team or hire a freelancer for this purpose.

Data Science

If you have ever built data science systems you know that first and the most important part of building system is collecting, cleaning and organizing data. But wait, we have already done it during building the data warehouse. That’s why it is prefect for data science tasks.

Architecture variations

In my second article I said that I would explain the standard way to build data pipeline architecture but I can leave you without briefly explaining the others.

Tiers of pipeline architecture

There is another term that I want to share: N-tier architecture. It says about number of stages that you have in your data storage layer.

Let’s overview every architecture by examples.

Two-tier architecture

That’s what we built in these series of articles. This architecture is called two-tier because we have 2 databases in storage layers: data lake and data warehouse.

One-tier architecture

In this architecture you build only a data lake and connect the presentation layer to it.
There are bunch of reasons to select this architecture:

  • No need to create data warehouse (saves money, time and effort to build and support it)
  • Modern BI tools can work with unstructured data (for example Amazon Athena can query data in json files stored in S3 buckets)
  • ETL jobs can process unstructured data and store handled data back to data lake
One-tier architecture

This approach became very popular nowadays and has its own name Lakehouse.

Three-tier architecture

According to this architecture you should create the next stage in the data storage layer that is built from a data warehouse. This stage is called Data mart.

Three-tier architecture

Data Marts are created for a special subdomain of your product, for business function or group of people. For example, there might be a data mart that contains information only about product and its deliveries and another data mart with whole company information for the financial department.

On the image above I drew data mart as a separate database, but in reality data mart is usually created by using data warehouse views.

Summary

Presentation layer gives access to data for end users through BI tools, reporting systems and data science.
Data architecture might be one, two and three tiers depending on how much stages your data storage layer has.

My congratulations, you are a data engineer now! I’m joking. This rabbit whole is deeper than you think and we didn’t cover many nuances and pitfalls but I gave you strong fundamentals which help you to build data architecture at any scale.

Thank you for reading! 🙏

--

--

Sergei Stralenia

Developer Success Lead at Paralect/Tech consultant at goLance Inc