Facts and Dimensions — What are they?
Let’s decode!
Facts
Facts represent the quantitative data in a data warehouse and are typically stored in fact tables. They are the measurable events or transactions that occur within a business process. Key characteristics of facts include:
- Measures: These are the numeric values that provide quantitative data about a business process, such as sales amount, quantity sold, revenue, profit, etc.
- Granularity: This refers to the level of detail of the data stored in the fact table. High granularity means more detailed data (e.g., individual transactions), while low granularity means aggregated data (e.g., daily or monthly totals).
- Foreign Keys: Fact tables usually contain foreign keys that reference the primary keys in dimension tables. These keys help in linking facts to their associated dimensions.
- Additive, Semi-additive, Non-additive: Measures can be additive (can be summed across dimensions), semi-additive (can be summed across some dimensions but not others), or non-additive (cannot be summed at all).
Dimensions
Dimensions provide the context and descriptive information about the facts. They are typically stored in dimension tables. Key characteristics of dimensions include:
- Attributes: These are the descriptive fields that provide additional information about the dimensions, such as product name, customer name, location, time, etc.
- Hierarchies: Dimensions often have hierarchical relationships that allow for data to be viewed at different levels of granularity. For example, a time dimension might have hierarchies like year > quarter > month > day.
- Categories and Groupings: Dimensions help in categorizing and grouping facts for analysis and reporting purposes.
- Slowly Changing Dimensions (SCDs): These are dimensions that change over time. Handling these changes is important for maintaining historical accuracy and integrity of the data.
Example Scenario
Consider a retail sales scenario:
Fact Table: Sales Fact Table
- Measures:
SalesAmount
,QuantitySold
,Discount
- Foreign Keys:
ProductID
,CustomerID
,StoreID
,DateID
Dimension Tables:
- Product Dimension Table:
ProductID
,ProductName
,Category
,Brand
- Customer Dimension Table:
CustomerID
,CustomerName
,CustomerAddress
,CustomerSegment
- Store Dimension Table:
StoreID
,StoreName
,StoreLocation
,StoreManager
- Date Dimension Table:
DateID
,Date
,Month
,Quarter
,Year
Usage in Analysis
- Facts: Provide the numerical data to be analyzed (e.g., total sales amount).
- Dimensions: Provide the descriptive data to analyze the facts in various ways (e.g., sales by product category, sales by region, sales by customer segment over time).
Deep Dive into FACTS!
Additive Measures
Additive measures are those that can be summed across all dimensions without any restrictions. These are the simplest and most common types of measures used in fact tables.
Example:
Sales Amount: If you have a SalesAmount
measure in a sales fact table, you can sum it across different dimensions such as time, product, and region to get total sales.
- Sum of
SalesAmount
by product category. - Sum of
SalesAmount
by region. - Sum of
SalesAmount
by month.
This measure makes sense and is meaningful when aggregated across all dimensions.
Semi-additive Measures
Semi-additive measures are those that can be summed across some dimensions, but not others. These measures are partially additive, meaning aggregation works for certain dimensions but not for all.
Example:
Account Balance: If you have an AccountBalance
measure in a banking fact table, you can sum it across dimensions like account type or customer, but not across time.
- Sum of
AccountBalance
by account type. - Sum of
AccountBalance
by customer. - Not Summable by time: Adding account balances across multiple time periods (days, months) does not make sense because the balance is a snapshot at a specific point in time.
Non-additive Measures
Non-additive measures are those that cannot be summed across any dimension. Aggregation of these measures does not produce meaningful results.
Example:
Ratios and Averages: Measures like Profit Margin
or Average Price
are non-additive because summing these measures across any dimension does not make sense.
- Profit Margin: The profit margin of multiple products or time periods cannot be summed to get a meaningful total profit margin.
- Average Price: The average price of items over different periods or categories cannot be summed. Instead, weighted averages or other aggregations might be more appropriate.
Detailed Examples
Additive Measure Example
Imagine a retail sales fact table with SalesAmount
as an additive measure:
### Additive Measure Example
Imagine a retail sales fact table with `SalesAmount` as an additive measure:
| Date | ProductID | StoreID | SalesAmount |
|------------|-----------|---------|-------------|
| 2024-06-01 | 101 | 1 | 200 |
| 2024-06-01 | 102 | 1 | 150 |
| 2024-06-01 | 101 | 2 | 300 |
| 2024-06-02 | 101 | 1 | 250 |
- **Sum by Date**:
- Total SalesAmount on 2024-06-01 = 200 + 150 + 300 = 650
- **Sum by ProductID**:
- Total SalesAmount for Product 101 = 200 + 300 + 250 = 750
Semi-additive Measure Example
Consider a banking fact table with AccountBalance
as a semi-additive measure:
### Semi-additive Measure Example
Consider a banking fact table with `AccountBalance` as a semi-additive measure:
| Date | AccountID | CustomerID | AccountBalance |
|------------|-----------|------------|----------------|
| 2024-06-01 | 1 | 101 | 5000 |
| 2024-06-01 | 2 | 102 | 3000 |
| 2024-06-02 | 1 | 101 | 5500 |
| 2024-06-02 | 2 | 102 | 3100 |
- **Sum by CustomerID**:
- Total AccountBalance for Customer 101 = 5000
(Account 1 balance on 2024-06-01) + 5500 (Account 1 balance on 2024-06-02)
= Not meaningful
- **Sum by Date**:
- Total AccountBalance on 2024-06-01 = 5000 + 3000 = 8000
- Aggregation over time does not make sense because balances represent
snapshots.
Non-additive Measure Example
Consider a product performance fact table with ProfitMargin
as a non-additive measure:
### Non-additive Measure Example
Consider a product performance fact table with `ProfitMargin` as a
non-additive measure:
| Date | ProductID | ProfitMargin |
|------------|-----------|--------------|
| 2024-06-01 | 101 | 0.20 |
| 2024-06-01 | 102 | 0.15 |
| 2024-06-02 | 101 | 0.22 |
| 2024-06-02 | 102 | 0.18 |
- **Sum by Date**:
- Summing ProfitMargin for 2024-06-01 = 0.20 + 0.15 = 0.35 (not meaningful)
- **Sum by ProductID**:
- Summing ProfitMargin for Product 101 = 0.20 + 0.22 = 0.42 (not meaningful)
- Instead, calculating average or weighted average profit margin would be
appropriate.
What is a Fact-less FACT?
A factless fact table is a type of fact table in a data warehouse that does not have any measurable facts or numeric data. Instead, it captures the occurrence of events or records relationships between dimensions. Factless fact tables are used primarily to model many-to-many relationships or to track events that happen over time, without needing to measure anything quantitatively.
Characteristics of Factless Fact Tables
- No Numeric Facts: These tables do not contain numeric measures. They consist solely of foreign keys that reference dimension tables.
- Captures Events: They are used to record the occurrence of events, such as attendance, enrollment, or log entries.
- Relational: They can also capture the relationships between dimensions that do not involve quantitative measures.
Types of Factless Fact Tables
- Event Tracking: Captures the occurrence of specific events.
- Coverage Tables: Capture all possible combinations of dimensions to ensure coverage.
Factless Fact Table: Event Tracking
Scenario: Tracking student attendance in classes.
Dimensions
Date Dimension:
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE
);
-- Example data
INSERT INTO DateDimension (DateID, Date) VALUES
(1, '2024-06-01'),
(2, '2024-06-02');
Student Dimension:
CREATE TABLE StudentDimension (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100)
);
-- Example data
INSERT INTO StudentDimension (StudentID, StudentName) VALUES
(1, 'John Doe'),
(2, 'Jane Smith');
Class Dimension:
CREATE TABLE ClassDimension (
ClassID INT PRIMARY KEY,
ClassName VARCHAR(100)
);
-- Example data
INSERT INTO ClassDimension (ClassID, ClassName) VALUES
(1, 'Mathematics'),
(2, 'Science');
Factless Fact Table: Attendance
CREATE TABLE AttendanceFact (
DateID INT,
StudentID INT,
ClassID INT,
PRIMARY KEY (DateID, StudentID, ClassID),
FOREIGN KEY (DateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (StudentID) REFERENCES StudentDimension(StudentID),
FOREIGN KEY (ClassID) REFERENCES ClassDimension(ClassID)
);
-- Example data
INSERT INTO AttendanceFact (DateID, StudentID, ClassID) VALUES
(1, 1, 1),
(1, 2, 2),
(2, 1, 1),
(2, 2, 1);
Factless Fact Table: Coverage
Scenario: Ensuring all product promotions are recorded.
Dimensions
Date Dimension:
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE
);
-- Example data
INSERT INTO DateDimension (DateID, Date) VALUES
(1, '2024-06-01'),
(2, '2024-06-02');
Product Dimension:
CREATE TABLE ProductDimension (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100)
);
-- Example data
INSERT INTO ProductDimension (ProductID, ProductName) VALUES
(1, 'Laptop'),
(2, 'Smartphone');
Promotion Dimension:
CREATE TABLE PromotionDimension (
PromotionID INT PRIMARY KEY,
PromotionName VARCHAR(100)
);
-- Example data
INSERT INTO PromotionDimension (PromotionID, PromotionName) VALUES
(1, 'Summer Sale'),
(2, 'Winter Sale');
Factless Fact Table: Product Promotions
CREATE TABLE ProductPromotionFact (
DateID INT,
ProductID INT,
PromotionID INT,
PRIMARY KEY (DateID, ProductID, PromotionID),
FOREIGN KEY (DateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (ProductID) REFERENCES ProductDimension(ProductID),
FOREIGN KEY (PromotionID) REFERENCES PromotionDimension(PromotionID)
);
-- Example data
INSERT INTO ProductPromotionFact (DateID, ProductID, PromotionID) VALUES
(1, 1, 1),
(1, 2, 1),
(2, 1, 2),
(2, 2, 2);
Factless fact tables are useful for modeling events or relationships that do not have quantitative measures. The provided SQL snippets demonstrate how to create such tables for two scenarios: tracking student attendance and ensuring product promotional coverage. These tables focus on capturing the occurrence of events or the relationships between dimensions, providing valuable insights even in the absence of numeric data.
Dimensions — a DEEP dive!
In data warehousing and business intelligence, as mentioned above, dimensions are used to describe and categorize the facts and measures in a fact table. They provide context to the quantitative data, allowing for detailed and meaningful analysis. Here are the various types of dimensions with detailed explanations:
Types of Dimensions
- Conformed Dimensions
- Role-Playing Dimensions
- Slowly Changing Dimensions (SCD)
- Junk Dimensions
- Degenerate Dimensions
- Inferred Dimensions
- Shrunken Dimensions
- Static Dimensions
1. Conformed Dimensions
Definition: Conformed dimensions are dimensions that are shared across multiple fact tables or data marts in a data warehouse. They have the same structure, content, and meaning across different areas of the business.
Example: A Date
dimension used in both sales and inventory fact tables, ensuring consistent reporting on time across different business processes.
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);
2. Role-Playing Dimensions
Definition: Role-playing dimensions are dimensions that can be used in different roles within the same database. They are essentially the same dimension table used multiple times in different contexts.
Example: A Date
dimension can play different roles such as Order Date
, Ship Date
, and Delivery Date
in an orders fact table.
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);
-- Using DateDimension as OrderDate, ShipDate, DeliveryDate in OrdersFact table
CREATE TABLE OrdersFact (
OrderID INT PRIMARY KEY,
OrderDateID INT,
ShipDateID INT,
DeliveryDateID INT,
Amount DECIMAL(10, 2),
FOREIGN KEY (OrderDateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (ShipDateID) REFERENCES DateDimension(DateID),
FOREIGN KEY (DeliveryDateID) REFERENCES DateDimension(DateID)
);
3. Slowly Changing Dimensions (SCD)
Definition: Slowly Changing Dimensions handle the changes in dimension data over time. There are different types of SCDs, primarily Type 1, Type 2, and Type 3.
- Type 1 (SCD1): Overwrites old data with new data.
- Type 2 (SCD2): Keeps historical data by creating new rows with versioning.
- Type 3 (SCD3): Keeps a limited history by adding new columns.
Example:
-- SCD Type 2 Example
CREATE TABLE CustomerDimension (
CustomerID INT,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255),
EffectiveDate DATE,
ExpiryDate DATE,
CurrentFlag CHAR(1),
PRIMARY KEY (CustomerID, EffectiveDate)
);
4. Junk Dimensions
Definition: Junk dimensions are a collection of miscellaneous, often unrelated attributes, typically low-cardinality flags and indicators, combined into a single dimension table to reduce the clutter in the fact table.
Example:
CREATE TABLE JunkDimension (
JunkID INT PRIMARY KEY,
Flag1 CHAR(1),
Flag2 CHAR(1),
Indicator1 VARCHAR(50),
Indicator2 VARCHAR(50)
);
5. Degenerate Dimensions
Definition: Degenerate dimensions are dimensions that do not have their own dimension table. Instead, their values are stored in the fact table. These are typically unique identifiers of the transaction, like an order number or invoice number.
Example:
CREATE TABLE SalesFact (
SalesID INT PRIMARY KEY,
OrderNumber VARCHAR(20), -- Degenerate Dimension
DateID INT,
ProductID INT,
Amount DECIMAL(10, 2)
);
6. Inferred Dimensions
Definition: Inferred dimensions are placeholder dimensions created when a fact record arrives before the corresponding dimension data. These placeholders are updated with actual data later.
Example:
CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100) DEFAULT 'Unknown',
CustomerAddress VARCHAR(255) DEFAULT 'Unknown'
);
-- Inserting inferred record
INSERT INTO CustomerDimension (CustomerID) VALUES (1);
-- Later updating with actual data
UPDATE CustomerDimension SET CustomerName = 'John Doe',
CustomerAddress = '123 Main St' WHERE CustomerID = 1;
7. Shrunken Dimensions
Definition: Shrunken dimensions are subsets of a main dimension, typically used for aggregation or summarization purposes in aggregate fact tables.
Example:
-- Main Date Dimension
CREATE TABLE DateDimension (
DateID INT PRIMARY KEY,
Date DATE,
Year INT,
Quarter INT,
Month INT,
Day INT
);
-- Shrunken Year Dimension
CREATE TABLE YearDimension (
YearID INT PRIMARY KEY,
Year INT
);
8. Static Dimensions
Definition: Static dimensions do not change over time. Their data remains constant and is often used for reference purposes.
Example:
CREATE TABLE CountryDimension (
CountryID INT PRIMARY KEY,
CountryName VARCHAR(100)
);
-- Data remains constant, e.g., country names and IDs
INSERT INTO CountryDimension (CountryID, CountryName) VALUES
(1, 'United States'),
(2, 'Canada'),
(3, 'Mexico');
Dimensions in data warehousing provide essential context and categorization for fact tables, allowing for detailed analysis and reporting. Different types of dimensions serve various purposes, from handling changes over time to combining miscellaneous attributes. Understanding these types helps in designing a robust and efficient data warehouse schema.
A DEEPER dive into SCD
Slowly Changing Dimensions (SCD)
Definition: Slowly Changing Dimensions handle changes in dimension data over time. They are crucial for accurately tracking historical data and understanding how data evolves. There are primarily three types of SCDs: Type 1, Type 2, and Type 3.
Type 1 (SCD1): Overwrites Old Data with New Data
Definition: Type 1 SCD handles changes by overwriting old data with new data. This method does not maintain any historical data. The dimension table only contains the current state of the data.
Use Case: Suitable when historical changes are not important, and only the latest information is needed.
Example: Updating a customer’s address in a customer dimension table.
CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255)
);
-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress)
VALUES
(1, 'John Doe', '123 Old Address');
-- Update address
UPDATE CustomerDimension
SET CustomerAddress = '456 New Address'
WHERE CustomerID = 1;
-- The table now contains:
-- | CustomerID | CustomerName | CustomerAddress |
-- |------------|--------------|-----------------|
-- | 1 | John Doe | 456 New Address |
Type 2 (SCD2): Keeps Historical Data by Creating New Rows with Versioning
Definition: Type 2 SCD tracks historical changes by creating new rows for each change. Each row has versioning information, often including effective dates, expiry dates, and a current flag to indicate the active record.
Use Case: Suitable when historical data needs to be preserved to track changes over time.
Example: Tracking changes in customer addresses with versioning.
CREATE TABLE CustomerDimension (
CustomerID INT,
CustomerName VARCHAR(100),
CustomerAddress VARCHAR(255),
EffectiveDate DATE,
ExpiryDate DATE,
CurrentFlag CHAR(1),
PRIMARY KEY (CustomerID, EffectiveDate)
);
-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress, EffectiveDate, ExpiryDate, CurrentFlag) VALUES
(1, 'John Doe', '123 Old Address', '2024-01-01', '9999-12-31', 'Y');
-- Update address, marking the old record as expired and adding a new record
UPDATE CustomerDimension
SET ExpiryDate = '2024-06-01', CurrentFlag = 'N'
WHERE CustomerID = 1 AND CurrentFlag = 'Y';
INSERT INTO CustomerDimension (CustomerID, CustomerName, CustomerAddress, EffectiveDate, ExpiryDate, CurrentFlag) VALUES
(1, 'John Doe', '456 New Address', '2024-06-01', '9999-12-31', 'Y');
-- The table now contains:
-- | CustomerID | CustomerName | CustomerAddress | EffectiveDate | ExpiryDate | CurrentFlag |
-- |------------|--------------|-----------------|---------------|-------------|-------------|
-- | 1 | John Doe | 123 Old Address | 2024-01-01 | 2024-06-01 | N |
-- | 1 | John Doe | 456 New Address | 2024-06-01 | 9999-12-31 | Y |
Type 3 (SCD3): Keeps a Limited History by Adding New Columns
Definition: Type 3 SCD tracks limited historical data by adding new columns to the dimension table. Typically, it adds columns to store previous values and change dates.
Use Case: Suitable when only a few historical changes need to be tracked, without the need for a full history.
Example: Tracking the current and previous addresses of a customer.
CREATE TABLE CustomerDimension (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
CurrentAddress VARCHAR(255),
PreviousAddress VARCHAR(255),
AddressChangeDate DATE
);
-- Initial data
INSERT INTO CustomerDimension (CustomerID, CustomerName, CurrentAddress) VALUES
(1, 'John Doe', '123 Old Address');
-- Update address, storing the old address in a new column
UPDATE CustomerDimension
SET PreviousAddress = CurrentAddress, CurrentAddress = '456 New Address', AddressChangeDate = '2024-06-01'
WHERE CustomerID = 1;
-- The table now contains:
-- | CustomerID | CustomerName | CurrentAddress | PreviousAddress | AddressChangeDate |
-- |------------|--------------|------------------|-----------------|-------------------|
-- | 1 | John Doe | 456 New Address | 123 Old Address | 2024-06-01 |
Thus,
- Type 1 (SCD1): Overwrites old data with new data, losing historical information.
- Type 2 (SCD2): Maintains a complete history of changes by creating new rows with versioning.
- Type 3 (SCD3): Tracks limited history by adding new columns to store previous values.
Choosing the right SCD type depends on the business requirements for historical data tracking and the complexity of changes expected in the dimension data.
Conclusion —giving you a different approach!
Imagine facts and dimensions like your social media. Facts are like your main feed — those core posts and videos that get all the likes and comments (aka metrics). Dimensions are the tags, locations, and hashtags that give context to your posts, helping you and others find and categorize content.
- Facts: The juicy content. Think likes, shares, views — these are the numbers that show how popular your post is.
- Dimensions: The context. These are the tags, locations, and times that add meaning to your posts. They help you sort through all the content and understand it better.
Whether you’re tracking your favorite trends or analyzing your latest TikTok stats, knowing how to organize your data with facts and dimensions is key. It’s like having the perfect blend of content and context to keep your followers engaged and your analytics on point. So, keep your data lit and your insights fire! 🔥
The contents of external submissions are not necessarily reflective of the opinions or work of Maven Analytics or any of its team members.
We believe in fostering lifelong learning and our intent is to provide a platform for the data community to share their work and seek feedback from the Maven Analytics data fam.
Submit your own writing here if you’d like to become a contributor.
Happy learning!
-Team Maven