Facts and Dimensions — What are they?

Let’s decode!

Binayak Basu
12 min readJun 12, 2024

Facts

Facts represent the quantitative data in a data warehouse and are typically stored in fact tables. They are the measurable events or transactions that occur within a business process. Key characteristics of facts include:

  • Measures: These are the numeric values that provide quantitative data about a business process, such as sales amount, quantity sold, revenue, profit, etc.
  • Granularity: This refers to the level of detail of the data stored in the fact table. High granularity means more detailed data (e.g., individual transactions), while low granularity means aggregated data (e.g., daily or monthly totals).
  • Foreign Keys: Fact tables usually contain foreign keys that reference the primary keys in dimension tables. These keys help in linking facts to their associated dimensions.
  • Additive, Semi-additive, Non-additive: Measures can be additive (can be summed across dimensions), semi-additive (can be summed across some dimensions but not others), or non-additive (cannot be summed at all).

Dimensions

Dimensions provide the context and descriptive information about the facts. They are typically stored in dimension tables. Key characteristics of dimensions include:

  • Attributes: These are the descriptive fields that provide additional information about the dimensions, such as product name, customer name, location, time, etc.
  • Hierarchies: Dimensions often have hierarchical relationships that allow for data to be viewed at different levels of granularity. For example, a time dimension might have hierarchies like year > quarter > month > day.
  • Categories and Groupings: Dimensions help in categorizing and grouping facts for analysis and reporting purposes.
  • Slowly Changing Dimensions (SCDs): These are dimensions that change over time. Handling these changes is important for maintaining historical accuracy and integrity of the data.

Example Scenario

Consider a retail sales scenario:

Fact Table: Sales Fact Table

  • Measures: SalesAmount, QuantitySold, Discount
  • Foreign Keys: ProductID, CustomerID, StoreID, DateID

Dimension Tables:

  • Product Dimension Table: ProductID, ProductName, Category, Brand
  • Customer Dimension Table: CustomerID, CustomerName, CustomerAddress, CustomerSegment
  • Store Dimension Table: StoreID, StoreName, StoreLocation, StoreManager
  • Date Dimension Table: DateID, Date, Month, Quarter, Year

Usage in Analysis

  • Facts: Provide the numerical data to be analyzed (e.g., total sales amount).
  • Dimensions: Provide the descriptive data to analyze the facts in various ways (e.g., sales by product category, sales by region, sales by customer segment over time).
https://www.researchgate.net/figure/Sales-Dimensional-Model_fig10_314332952

Deep Dive into FACTS!

Additive Measures

Additive measures are those that can be summed across all dimensions without any restrictions. These are the simplest and most common types of measures used in fact tables.

Example:

Sales Amount: If you have a SalesAmount measure in a sales fact table, you can sum it across different dimensions such as time, product, and region to get total sales.

  • Sum of SalesAmount by product category.
  • Sum of SalesAmount by region.
  • Sum of SalesAmount by month.

This measure makes sense and is meaningful when aggregated across all dimensions.

Semi-additive Measures

Semi-additive measures are those that can be summed across some dimensions, but not others. These measures are partially additive, meaning aggregation works for certain dimensions but not for all.

Example:

Account Balance: If you have an AccountBalance measure in a banking fact table, you can sum it across dimensions like account type or customer, but not across time.

  • Sum of AccountBalance by account type.
  • Sum of AccountBalance by customer.
  • Not Summable by time: Adding account balances across multiple time periods (days, months) does not make sense because the balance is a snapshot at a specific point in time.

Non-additive Measures

Non-additive measures are those that cannot be summed across any dimension. Aggregation of these measures does not produce meaningful results.

Example:

Ratios and Averages: Measures like Profit Margin or Average Price are non-additive because summing these measures across any dimension does not make sense.

  • Profit Margin: The profit margin of multiple products or time periods cannot be summed to get a meaningful total profit margin.
  • Average Price: The average price of items over different periods or categories cannot be summed. Instead, weighted averages or other aggregations might be more appropriate.

Detailed Examples

Additive Measure Example

Imagine a retail sales fact table with SalesAmount as an additive measure:

### Additive Measure Example
Imagine a retail sales fact table with `SalesAmount` as an additive measure:

| Date | ProductID | StoreID | SalesAmount |
|------------|-----------|---------|-------------|
| 2024-06-01 | 101 | 1 | 200 |
| 2024-06-01 | 102 | 1 | 150 |
| 2024-06-01 | 101 | 2 | 300 |
| 2024-06-02 | 101 | 1 | 250 |

- **Sum by Date**:
- Total SalesAmount on 2024-06-01 = 200 + 150 + 300 = 650
- **Sum by ProductID**:
- Total SalesAmount for Product 101 = 200 + 300 + 250 = 750

Semi-additive Measure Example

Consider a banking fact table with AccountBalance as a semi-additive measure:

### Semi-additive Measure Example
Consider a banking fact table with `AccountBalance` as a semi-additive measure:

| Date | AccountID | CustomerID | AccountBalance |
|------------|-----------|------------|----------------|
| 2024-06-01 | 1 | 101 | 5000 |
| 2024-06-01 | 2 | 102 | 3000 |
| 2024-06-02 | 1 | 101 | 5500 |
| 2024-06-02 | 2 | 102 | 3100 |

- **Sum by CustomerID**:
- Total AccountBalance for Customer 101 = 5000
(Account 1 balance on 2024-06-01) + 5500 (Account 1 balance on 2024-06-02)
= Not meaningful
- **Sum by Date**:
- Total AccountBalance on 2024-06-01 = 5000 + 3000 = 8000
- Aggregation over time does not make sense because balances represent
snapshots.

Non-additive Measure Example

Consider a product performance fact table with ProfitMargin as a non-additive measure:

### Non-additive Measure Example
Consider a product performance fact table with `ProfitMargin` as a
non-additive measure:

| Date | ProductID | ProfitMargin |
|------------|-----------|--------------|
| 2024-06-01 | 101 | 0.20 |
| 2024-06-01 | 102 | 0.15 |
| 2024-06-02 | 101 | 0.22 |
| 2024-06-02 | 102 | 0.18 |

- **Sum by Date**:
- Summing ProfitMargin for 2024-06-01 = 0.20 + 0.15 = 0.35 (not meaningful)
- **Sum by ProductID**:
- Summing ProfitMargin for Product 101 = 0.20 + 0.22 = 0.42 (not meaningful)
- Instead, calculating average or weighted average profit margin would be
appropriate.

What is a Fact-less FACT?

A factless fact table is a type of fact table in a data warehouse that does not have any measurable facts or numeric data. Instead, it captures the occurrence of events or records relationships between dimensions. Factless fact tables are used primarily to model many-to-many relationships or to track events that happen over time, without needing to measure anything quantitatively.

Characteristics of Factless Fact Tables

  1. No Numeric Facts: These tables do not contain numeric measures. They consist solely of foreign keys that reference dimension tables.
  2. Captures Events: They are used to record the occurrence of events, such as attendance, enrollment, or log entries.
  3. Relational: They can also capture the relationships between dimensions that do not involve quantitative measures.

Types of Factless Fact Tables

  1. Event Tracking: Captures the occurrence of specific events.
  2. Coverage Tables: Capture all possible combinations of dimensions to ensure coverage.

Factless Fact Table: Event Tracking

Scenario: Tracking student attendance in classes.

Dimensions

Date Dimension:

CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE
);

-- Example data
INSERT INTO DateDimension (DateID, Date) VALUES
(1, '2024-06-01'),
(2, '2024-06-02');

Student Dimension:

CREATE TABLE StudentDimension (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100)
);

-- Example data
INSERT INTO StudentDimension (StudentID, StudentName) VALUES
(1, 'John Doe'),
(2, 'Jane Smith');

Class Dimension:

CREATE TABLE ClassDimension (
ClassID INT PRIMARY KEY,
ClassName VARCHAR(100)
);

-- Example data
INSERT INTO ClassDimension (ClassID, ClassName) VALUES
(1, 'Mathematics'),
(2, 'Science');

Factless Fact Table: Attendance

CREATE TABLE AttendanceFact (
DateID INT,
StudentID INT,
ClassID INT,
PRIMARY KEY (DateID, StudentID, ClassID),
FOREIGN KEY (DateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (StudentID) REFERENCES StudentDimension(StudentID),
FOREIGN KEY (ClassID) REFERENCES ClassDimension(ClassID)
);

-- Example data
INSERT INTO AttendanceFact (DateID, StudentID, ClassID) VALUES
(1, 1, 1),
(1, 2, 2),
(2, 1, 1),
(2, 2, 1);

Factless Fact Table: Coverage

Scenario: Ensuring all product promotions are recorded.

Dimensions

Date Dimension:

CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE
);

-- Example data
INSERT INTO DateDimension (DateID, Date) VALUES
(1, '2024-06-01'),
(2, '2024-06-02');

Product Dimension:

CREATE TABLE ProductDimension (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100)
);

-- Example data
INSERT INTO ProductDimension (ProductID, ProductName) VALUES
(1, 'Laptop'),
(2, 'Smartphone');

Promotion Dimension:

CREATE TABLE PromotionDimension (
PromotionID INT PRIMARY KEY,
PromotionName VARCHAR(100)
);

-- Example data
INSERT INTO PromotionDimension (PromotionID, PromotionName) VALUES
(1, 'Summer Sale'),
(2, 'Winter Sale');

Factless Fact Table: Product Promotions

CREATE TABLE ProductPromotionFact (
DateID INT,
ProductID INT,
PromotionID INT,
PRIMARY KEY (DateID, ProductID, PromotionID),
FOREIGN KEY (DateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (ProductID) REFERENCES ProductDimension(ProductID),
FOREIGN KEY (PromotionID) REFERENCES PromotionDimension(PromotionID)
);

-- Example data
INSERT INTO ProductPromotionFact (DateID, ProductID, PromotionID) VALUES
(1, 1, 1),
(1, 2, 1),
(2, 1, 2),
(2, 2, 2);

Factless fact tables are useful for modeling events or relationships that do not have quantitative measures. The provided SQL snippets demonstrate how to create such tables for two scenarios: tracking student attendance and ensuring product promotional coverage. These tables focus on capturing the occurrence of events or the relationships between dimensions, providing valuable insights even in the absence of numeric data.

Dimensions — a DEEP dive!

In data warehousing and business intelligence, as mentioned above, dimensions are used to describe and categorize the facts and measures in a fact table. They provide context to the quantitative data, allowing for detailed and meaningful analysis. Here are the various types of dimensions with detailed explanations:

Types of Dimensions

  1. Conformed Dimensions
  2. Role-Playing Dimensions
  3. Slowly Changing Dimensions (SCD)
  4. Junk Dimensions
  5. Degenerate Dimensions
  6. Inferred Dimensions
  7. Shrunken Dimensions
  8. Static Dimensions

1. Conformed Dimensions

Definition: Conformed dimensions are dimensions that are shared across multiple fact tables or data marts in a data warehouse. They have the same structure, content, and meaning across different areas of the business.

Example: A Date dimension used in both sales and inventory fact tables, ensuring consistent reporting on time across different business processes.

CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);

2. Role-Playing Dimensions

Definition: Role-playing dimensions are dimensions that can be used in different roles within the same database. They are essentially the same dimension table used multiple times in different contexts.

Example: A Date dimension can play different roles such as Order Date, Ship Date, and Delivery Date in an orders fact table.

CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);

-- Using DateDimension as OrderDate, ShipDate, DeliveryDate in OrdersFact table
CREATE TABLE OrdersFact (
OrderID INT PRIMARY KEY,
OrderDateID INT,
ShipDateID INT,
DeliveryDateID INT,
Amount DECIMAL(10, 2),
FOREIGN KEY (OrderDateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (ShipDateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (DeliveryDateID) REFERENCES DateDimension(DateID)
);

3. Slowly Changing Dimensions (SCD)

Definition: Slowly Changing Dimensions handle the changes in dimension data over time. There are different types of SCDs, primarily Type 1, Type 2, and Type 3.

  • Type 1 (SCD1): Overwrites old data with new data.
  • Type 2 (SCD2): Keeps historical data by creating new rows with versioning.
  • Type 3 (SCD3): Keeps a limited history by adding new columns.

Example:

-- SCD Type 2 Example
CREATE TABLE CustomerDimension (
CustomerID INT,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255),
EffectiveDate DATE,
ExpiryDate DATE,
CurrentFlag CHAR(1),
PRIMARY KEY (CustomerID, EffectiveDate)
);

4. Junk Dimensions

Definition: Junk dimensions are a collection of miscellaneous, often unrelated attributes, typically low-cardinality flags and indicators, combined into a single dimension table to reduce the clutter in the fact table.

Example:

CREATE TABLE JunkDimension (
JunkID INT PRIMARY KEY,
Flag1 CHAR(1),
Flag2 CHAR(1),
Indicator1 VARCHAR(50),
Indicator2 VARCHAR(50)
);

5. Degenerate Dimensions

Definition: Degenerate dimensions are dimensions that do not have their own dimension table. Instead, their values are stored in the fact table. These are typically unique identifiers of the transaction, like an order number or invoice number.

Example:

CREATE TABLE SalesFact (
SalesID INT PRIMARY KEY,
OrderNumber VARCHAR(20), -- Degenerate Dimension
DateID INT,
ProductID INT,
Amount DECIMAL(10, 2)
);

6. Inferred Dimensions

Definition: Inferred dimensions are placeholder dimensions created when a fact record arrives before the corresponding dimension data. These placeholders are updated with actual data later.

Example:

CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100) DEFAULT 'Unknown',
CustomerAddress VARCHAR(255) DEFAULT 'Unknown'
);

-- Inserting inferred record
INSERT INTO CustomerDimension (CustomerID) VALUES (1);
-- Later updating with actual data
UPDATE CustomerDimension SET CustomerName = 'John Doe',
CustomerAddress = '123 Main St' WHERE CustomerID = 1;

7. Shrunken Dimensions

Definition: Shrunken dimensions are subsets of a main dimension, typically used for aggregation or summarization purposes in aggregate fact tables.

Example:

-- Main Date Dimension
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);

-- Shrunken Year Dimension
CREATE TABLE YearDimension (
YearID INT PRIMARY KEY,
Year INT
);

8. Static Dimensions

Definition: Static dimensions do not change over time. Their data remains constant and is often used for reference purposes.

Example:

CREATE TABLE CountryDimension (
CountryID INT PRIMARY KEY,
CountryName VARCHAR(100)
);

-- Data remains constant, e.g., country names and IDs
INSERT INTO CountryDimension (CountryID, CountryName) VALUES
(1, 'United States'),
(2, 'Canada'),
(3, 'Mexico');

Dimensions in data warehousing provide essential context and categorization for fact tables, allowing for detailed analysis and reporting. Different types of dimensions serve various purposes, from handling changes over time to combining miscellaneous attributes. Understanding these types helps in designing a robust and efficient data warehouse schema.

A DEEPER dive into SCD

Slowly Changing Dimensions (SCD)

Definition: Slowly Changing Dimensions handle changes in dimension data over time. They are crucial for accurately tracking historical data and understanding how data evolves. There are primarily three types of SCDs: Type 1, Type 2, and Type 3.

Type 1 (SCD1): Overwrites Old Data with New Data

Definition: Type 1 SCD handles changes by overwriting old data with new data. This method does not maintain any historical data. The dimension table only contains the current state of the data.

Use Case: Suitable when historical changes are not important, and only the latest information is needed.

Example: Updating a customer’s address in a customer dimension table.

CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255)
);

-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress)
VALUES
(1, 'John Doe', '123 Old Address');
-- Update address
UPDATE CustomerDimension
SET CustomerAddress = '456 New Address'
WHERE CustomerID = 1;
-- The table now contains:
-- | CustomerID | CustomerName | CustomerAddress |
-- |------------|--------------|-----------------|
-- | 1 | John Doe | 456 New Address |

Type 2 (SCD2): Keeps Historical Data by Creating New Rows with Versioning

Definition: Type 2 SCD tracks historical changes by creating new rows for each change. Each row has versioning information, often including effective dates, expiry dates, and a current flag to indicate the active record.

Use Case: Suitable when historical data needs to be preserved to track changes over time.

Example: Tracking changes in customer addresses with versioning.

CREATE TABLE CustomerDimension (
CustomerID INT,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255),
EffectiveDate DATE,
ExpiryDate DATE,
CurrentFlag CHAR(1),
PRIMARY KEY (CustomerID, EffectiveDate)
);

-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress, EffectiveDate, ExpiryDate, CurrentFlag) VALUES
(1, 'John Doe', '123 Old Address', '2024-01-01', '9999-12-31', 'Y');
-- Update address, marking the old record as expired and adding a new record
UPDATE CustomerDimension
SET ExpiryDate = '2024-06-01', CurrentFlag = 'N'
WHERE CustomerID = 1 AND CurrentFlag = 'Y';
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress, EffectiveDate, ExpiryDate, CurrentFlag) VALUES
(1, 'John Doe', '456 New Address', '2024-06-01', '9999-12-31', 'Y');
-- The table now contains:
-- | CustomerID | CustomerName | CustomerAddress | EffectiveDate | ExpiryDate | CurrentFlag |
-- |------------|--------------|-----------------|---------------|-------------|-------------|
-- | 1 | John Doe | 123 Old Address | 2024-01-01 | 2024-06-01 | N |
-- | 1 | John Doe | 456 New Address | 2024-06-01 | 9999-12-31 | Y |

Type 3 (SCD3): Keeps a Limited History by Adding New Columns

Definition: Type 3 SCD tracks limited historical data by adding new columns to the dimension table. Typically, it adds columns to store previous values and change dates.

Use Case: Suitable when only a few historical changes need to be tracked, without the need for a full history.

Example: Tracking the current and previous addresses of a customer.

CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
CurrentAddress VARCHAR(255),
PreviousAddress VARCHAR(255),
AddressChangeDate DATE
);

-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CurrentAddress) VALUES
(1, 'John Doe', '123 Old Address');
-- Update address, storing the old address in a new column
UPDATE CustomerDimension
SET PreviousAddress = CurrentAddress, CurrentAddress = '456 New Address', AddressChangeDate = '2024-06-01'
WHERE CustomerID = 1;
-- The table now contains:
-- | CustomerID | CustomerName | CurrentAddress | PreviousAddress | AddressChangeDate |
-- |------------|--------------|------------------|-----------------|-------------------|
-- | 1 | John Doe | 456 New Address | 123 Old Address | 2024-06-01 |

Thus,

  • Type 1 (SCD1): Overwrites old data with new data, losing historical information.
  • Type 2 (SCD2): Maintains a complete history of changes by creating new rows with versioning.
  • Type 3 (SCD3): Tracks limited history by adding new columns to store previous values.

Choosing the right SCD type depends on the business requirements for historical data tracking and the complexity of changes expected in the dimension data.

Conclusion —giving you a different approach!

Imagine facts and dimensions like your social media. Facts are like your main feed — those core posts and videos that get all the likes and comments (aka metrics). Dimensions are the tags, locations, and hashtags that give context to your posts, helping you and others find and categorize content.

  • Facts: The juicy content. Think likes, shares, views — these are the numbers that show how popular your post is.
  • Dimensions: The context. These are the tags, locations, and times that add meaning to your posts. They help you sort through all the content and understand it better.

Whether you’re tracking your favorite trends or analyzing your latest TikTok stats, knowing how to organize your data with facts and dimensions is key. It’s like having the perfect blend of content and context to keep your followers engaged and your analytics on point. So, keep your data lit and your insights fire! 🔥

--

--

Binayak Basu

Master's in Economics, pursuing BS in Data Science. Passionate about data analysis, ML, Java, SQL. Helping others learn and uncover meaningful insights.