Data Warehouse : Fact and Dimension Table

Kalpesh Pawar
3 min readJul 22, 2023

A data warehouse is a large, centralized repository of integrated, historical data from various sources, designed to support business intelligence (BI) and data analytics activities. It provides a structured and optimized environment for storing, managing, and querying data, making it easier for organizations to perform complex analysis and gain valuable insights from their data.

Key characteristics of a data warehouse include:

  1. Integration: Data warehouses integrate data from multiple sources, such as transactional databases, operational systems, spreadsheets, and external data feeds. The data is standardized and transformed into a consistent format for efficient analysis.
  2. Time-Variant Data: Data warehouses store historical data, allowing users to perform trend analysis, track changes over time, and make informed decisions based on historical patterns.
  3. Subject-Oriented: Data in a data warehouse is organized by subject areas or domains relevant to business operations. These subject areas are optimized for analytical queries rather than transactional processing.
  4. Non-Volatile Data: Once data is loaded into the data warehouse, it is considered non-volatile, meaning it is not frequently updated or changed. Instead, historical records are retained and new data is added over time.
  5. Denormalized Structure: Data in a data warehouse is often denormalized or structured in a way that simplifies and accelerates queries, improving the performance of analytical processes.
  6. High Performance: Data warehouses are designed for fast querying and reporting, enabling users to retrieve insights quickly even from massive datasets.
  7. Data Transformation: Data in a data warehouse undergoes an Extract, Transform, Load (ETL) process, where data is extracted from source systems, transformed into a consistent format, and loaded into the data warehouse.
  8. Business Intelligence and Analytics: Data warehouses serve as the foundation for business intelligence and data analytics tools, providing data analysts and decision-makers with the necessary data to generate reports, dashboards, and visualizations.

In a data warehouse, facts and dimension tables are essential components that facilitate the organization and analysis of data.

Fact Table:

  • A fact table stores quantitative and numerical data, also known as measures, related to specific business events or activities. These measures are typically numerical values, such as sales amounts, quantities sold, revenue, or counts.
  • Each row in the fact table represents a specific instance of the event or activity and contains foreign keys that link to dimension tables, providing context to the measures.
  • Fact tables are typically large and can contain millions or billions of records, making them the central data storage for data warehouses.
  • They are used to aggregate and analyze data across different dimensions, enabling users to perform complex queries and gain insights into business performance.
  • Fact tables often have date or time dimensions to support time-based analysis (e.g., sales by month, quarterly revenue).

Example of a Fact Table:Copy code

| Date       | ProductID | CustomerID | SalesAmount | QuantitySold |
|------------|-----------|------------|-------------|--------------|
| 2023-07-01 | 101 | 1001 | $500 | 10 |
| 2023-07-01 | 102 | 1002 | $250 | 5 |
| 2023-07-02 | 103 | 1003 | $800 | 15 |
| ... | ... | ... | ... | ... |

Dimension Table:

  • Dimension tables contain descriptive attributes that provide context to the measures in the fact table. They define the characteristics or perspectives by which data is analyzed.
  • Each row in a dimension table represents a unique instance of a specific dimension, and the primary key of the dimension table is linked to the foreign key in the fact table.
  • Dimension tables are relatively smaller than fact tables since they hold descriptive attributes that don’t change frequently.
  • Common examples of dimension tables include product dimensions (e.g., product name, category, brand), customer dimensions (e.g., customer name, location, age), and time dimensions (e.g., date, month, year)

Example of a Dimension Table (Product Dimension):

| ProductID | ProductName    | Category   | Brand      |
|-----------|----------------|------------|------------|
| 101 | Laptop | Electronics| ABC Corp |
| 102 | Smartphone | Electronics| XYZ Corp |
| 103 | Headphones | Electronics| Acme Corp |
| ... | ... | ... | ... |

--

--

Kalpesh Pawar

Data Engineering Intern at Celebal Technologies | 2x Azure | AWS | Web Developer | Medium✍ | IT undergraduate @DYPCOE