From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 2: Data Modeling and Schema Design)

Tahnik Ahmed
6 min readApr 21, 2024

--

This blogpost is a continuation of, From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 1: Architecture)

Data Modeling and Schema Design for the Data Warehouse:

As I’ve mentioned earlier, the notion of this series of articles is to build data warehouse which supports the data consumers’ needs like Analytical Modeling, Ad hoc Analysis, Predictive Modeling, Machine Learning Model Training and Deployment etc. for companies with retailing operations. To achieve the state and maturity to support those needs, the data modeling and schema designing should be as important as prepping the whole infrastructure itself.

Conceptual Data Modeling:

What does an ideal data model for a retail company would look like? Fact and dimensions tables are key components to design relational data warehouse [1] [2]. Let’s begin with the conceptual data modeling for a retail company which has two sales channels:

Online Sales Channel and Reselling Sales Channel (Through Employees)

If we compare the context of this retail company with facts and dimensions, those channels above may have dimensions like, Products, Customers, Resellers, Employees, Stores, Sales Territory, Accounts, Currency, Date etc. [4] [6]. These dimensions may have clustered dimensions like Product Category, Product Subcategory etc. and dimension like Date may have more grains in terms of granularity. If we try to visualize this conceptual data model, we may have the following diagram:

“Conceptual Data Model for the Data Warehouse”

Physical Data Modeling:

Now let’s build the physical data model with the necessary fact and dimensions tables in the context of operations within a retail company. [5] The attributes along with their datatypes are mentioned in this model. The facts and dimensions tables for the data warehouse are as follows,

“Physical Data Model for the Data Warehouse”

Schema Design:

The design for the warehouse is basically built around snowflake schema [1] [2] [3]. In a snowflake schema,

  • Dimension tables are normalized into multiple related tables, creating a hierarchical or “snowflake” structure.
  • The central fact table is surrounded by a perimeter of dimensions, and at least one dimension keeps its levels separate.

In the context of retail businesses, data modeling plays a crucial role. Specifically, the snowflake schema is commonly used. It involves normalizing dimension tables by breaking them into related tables, creating a hierarchical structure. This approach helps optimize querying large amounts of data efficiently, especially when dealing with complex attribute hierarchies or shared attributes across dimensions.

Details of the Data Model and Schema:

All the tables mentioned in this blogpost can be found in this repository,

You can also find them in your Synapse Workspace if you have already used the ARM (Azure Resource Manager) template provided in this repository.

Fact Tables:

Fact tables store quantitative data related to business events or transactions. [5] They serve as the central repository for metrics and measurements. In a retail context, fact tables typically capture information about sales, orders, and inventory.

FactInternetSales:

  • This table likely contains details about online sales transactions.
  • Columns may include order IDs, product IDs, quantities sold, prices, and timestamps.
  • Useful for analyzing sales trends, revenue, and customer behavior in the online channel.

FactResellerSales:

  • Tracks sales made through resellers or distributors.
  • Similar to FactInternetSales but specific to reseller channels.
  • Helps evaluate reseller performance, inventory turnover, and profitability.

Dimension Tables:

Dimension tables provide descriptive context for data in fact tables. They contain attributes used for categorization and filtering.

DimAccount.fmt:

  • This table likely contains information related to financial accounts or ledger entries.
  • Attributes may include account IDs, account types (e.g., savings, checking), and balances.
  • Useful for financial reporting and analysis.

DimCurrency.fmt:

  • Stores details about different currencies used in transactions.
  • Includes currency codes (e.g., USD, EUR), exchange rates, and currency names.
  • Enables currency conversion and multi-currency reporting.

DimCustomer.fmt:

  • Contains customer-related data:
  • Customer IDs, names, and contact information.
  • Demographics (age, gender, location).
  • Purchase history and loyalty status.
  • Essential for customer segmentation, marketing, and personalized recommendations.

DimDate.fmt:

  • Represents dates and time periods:
  • Day, month, year, week, etc.
  • May include attributes like holidays, fiscal periods, and seasons.
  • Enables time-based analysis, trend tracking, and seasonality insights.

DimDepartmentGroup.fmt:

  • Likely describes organizational departments or functional groups.
  • Attributes could include department names, IDs, and hierarchical relationships.
  • Useful for organizational reporting and resource allocation.

DimEmployee.fmt:

  • Contains employee-related information:
  • Employee IDs, names, roles, and contact details.
  • Reporting structures (managers, subordinates).
  • Supports workforce analytics, performance evaluation, and HR management.

DimGeography.fmt:

  • Provides geographical context:
  • Regions, countries, cities, postal codes.
  • Latitude, longitude, and time zones.
  • Enables location-based analysis, market segmentation, and regional insights.

DimOrganization.fmt:

  • Describes organizational entities (e.g., companies, subsidiaries).
  • Attributes may include organization names, hierarchies, and ownership relationships.
  • Useful for organizational reporting and governance.

DimProduct.fmt:

  • Contains product-related details:
  • Product names, descriptions, SKUs.
  • Categories, subcategories, and attributes (color, size, etc.).
  • Essential for product analysis, inventory management, and sales tracking.

DimProductCategory.fmt:

  • Focuses specifically on product categories.
  • Hierarchical structure (e.g., electronics → smartphones → accessories).
  • Helps analyze sales by category and understand product groupings.

DimProductSubCategory.fmt:

  • Further refines product categorization.
  • Subcategories within broader categories (e.g., apparel → footwear → sneakers).
  • Useful for detailed product analysis and inventory classification.

DimPromotion.fmt:

  • Stores information about promotional campaigns:
  • Promotion IDs, names, descriptions.
  • Start and end dates, discount codes.
  • Supports analyzing the impact of promotions on sales and customer behavior.

DimReseller.fmt:

  • Likely related to resellers, distributors, or partners.
  • Contains details about reseller organizations:
  • Reseller IDs, names, contact information.
  • Geographic coverage and performance metrics.
  • Helps evaluate reseller effectiveness and sales channels.

DimSalesTerritory.fmt:

  • Describes sales territories or regions.
  • Attributes may include territory names, managers, and geographic boundaries.
  • Useful for sales performance analysis, target setting, and resource allocation.

So, how to design and build such data warehouses in Azure?

The third part of this blogpost discusses about the designing, data modeling, warehousing processes in practicality and how they are implemented thoroughly with Azure data services.

Please follow the next part of this article: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part Three: Design and Creation)”, where I attempted to simulate the process of designing and building a data warehouse on Azure for a retail company.

Thumbnail: “From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 3: Design and Creation)”

Appendix:

Thumbnail: “From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 2: Data Modeling and Schema Design)”

--

--