Building OLAP Dimensional Model in BigQuery, using dbt as a Data Transformation Tool.

Chisom Nnamani
Towards Data Engineering
5 min readDec 12, 2023

--

Data Engineering Project

Image design by author

This project is about building a dimensional data warehouse in BigQuery by transforming an OLTP system in MySQL into an OLAP system in BigQuery, using dbt as our data transformation tool.

The OLTP system contains the sales data of Northwind; a specialty food export-import company.

Aim

To bring Northwind’s data reporting up to speed by embracing Dimensional Modeling.

Introducing OLAP for Northwind OLTP Database

What’s the current setup or architecture?

  • Northwind traders are companies that buy and sell special foods worldwide.
  • This is a practice database made by Microsoft to showcase its product features and for learning purposes.
  • The current setup combines on-site and older systems.
  • They use MySQL for their main daily sales transactions.
  • MySQL is also used for creating and running reports, but it’s not efficient because analytical queries slow down the transaction system.

Why do we need a new setup or architecture?

  • To scale up more easily.
  • To make reports faster.
  • To ease the burden on day-to-day operations.
  • To enhance data security with better access control.

How do we set up a new system or architecture?

  • Northwind traders can shift their current database to GCP.
  • The on-site MySQL can be swapped with a fully managed cloud SQL.
  • To handle reports, we’ll create an OLAP data warehouse on GCP using BigQuery.
  • We’ll construct a Dimensional Data Warehouse on BigQuery following Kimball’s method, incorporating dim and fact tables.

Identifying Business Requirements

Throughout the interview process with the business and stakeholders, the following business processes were identified:

Sales Overview:

Overall sales reports to understand better, what is being sold to our customers, what sells the most, where, and what sells the least, the goal is to have a general overview of how the business is going.

Product Inventory:

Understand the current inventory levels and how to improve stock management, what suppliers we have, and how much is being purchased. This will allow Northwind to understand stock management and potentially land better deals with suppliers

Customer Reporting:

Allow customers to understand their purchase orders, and how much and when they are buying, empowering them to make data-driven decisions while Northwind utilizes this data in combination with its sales data.

This means the business is looking forward to getting insights on sales overview, product inventory, and customer reporting.

Identifying required tables from ERD

ERD of the MySQL OLTP system

From the above ERD diagram of the OLTP transactional system, we identify the following required tables that will enable us to meet the business requirements:

  • Customers —Customers who buy items from Northwind
  • Employees — Those who work for Northwind
  • Orders — Sales Order transactions taking place between the customers & Northwind
  • Order Details — Order Details for the Orders placed by the customer
  • Inventory Transaction — Transaction details of each inventory
  • Products — Current Northwind products that customers can purchase
  • Shippers — Shipped orders from Northwind to customers
  • Suppliers — Supplies Northwind with required items
  • Invoices — Invoice created for each order

Proposed Data Modelling Concepts

After a series of interviews with the business stakeholders and understanding their business requirements, we proposed the following data modeling concepts:

  • Conceptual Data Model
Data model design by author
  • Logical Data Model
Data model design by author
  • Physical Data Model
Data model design by author

Proposed Kimball Data Warehouse Architecture

Below is the proposed data warehouse architecture, which focuses on how the data flows from the old MySQL OLTP database to the new modernized dimensional data warehouse in BigQuery during the migration.

Image design by author

With the insights from the data modeling and the data warehouse architecture design, we go ahead to create the three layers (datasets) in Bigquery using dbt. These (staging, warehouse, and analytics_obt or reporting) layers are identified by the “dbt prefix”.

Data layers created by the author

The above three layers help to achieve the business requirements of Northwind and Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights.

Results

  • The new Data Warehouse uses Bigquery for analytics and Business Intelligence which is more efficient than the previous MySQL system.
  • The Reporting is derived from One Big Table denormalized from Dimensional models
  • Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights

New to dbt?

Check out the full project documentation on GitHub here, and there you will find the commands on how to install dbt, connect to BigQuery, and create the different data model layers.

Resources:

  • Learn more about dbt [in the docs]
  • Check out [Discourse] for commonly asked questions and answers
  • Join the [chat] on Slack for live discussions and support
  • Find [dbt events] near you
  • Check out [the blog] for the latest news on dbt’s development and best practices

I hope you found this article interesting. Follow me on X (formerly Twitter) and connect with me on Linkedin! 💡

--

--