Mastering Data Warehousing: Understanding SCD Types with Examples

Satyanarayank
5 min readJan 25, 2024

--

In the realm of data warehousing, maintaining the accuracy and consistency of historical data over time is paramount. Slowly Changing Dimensions (SCD) are methodologies adopted in data warehousing to manage and track changes in dimension data. SCDs are crucial for preserving historical data, accommodating recurring changes, and ensuring the data warehouse reflects real-world changes accurately. This article delves into the various SCD types, illustrating their concepts with examples to provide a comprehensive understanding of their application in data warehousing.

What are Slowly Changing Dimensions (SCD)?

Slowly Changing Dimensions (SCD) refers to common strategies for managing changes in dimension tables in a data warehouse. Dimensions are aspects of business data used for categorization, aggregation, and analysis, such as time, geography, products, and customers. As business evolves, so do the attributes of these dimensions, necessitating an efficient strategy to track these changes without losing historical context.

SCD Types Explained

Type 0 — Fixed Dimension

  • Concept: The dimension attribute never changes. Once data is stored, it remains constant, preserving its historical state.
  • Use Case Example: An employee’s birth date in an HR system remains unchanged, regardless of other changes in their employment status.

Type 1 — No History

  • Concept: When a change occurs, the existing record is updated, and no historical data is kept. This approach is simple but sacrifices historical accuracy.
  • Use Case Example: If a product’s price changes, the new price overwrites the old one, and no record of the previous price is maintained.

Type 2 — Row Versioning

  • Concept: This is the most common SCD type where a new record is added to the dimension table to represent significant changes, with timestamps or version numbers indicating the period of validity.
  • Use Case Example: An employee receives a promotion. The data warehouse retains the old record with the previous role and adds a new record with the new role, each marked with the relevant effective dates.

Type 3 — Previous Value Column

  • Concept: This method adds new columns to store previous values of important attributes, thus maintaining a limited history.
  • Use Case Example: A product’s category changes from ‘Electronics’ to ‘Accessories’. A new column ‘Previous_Category’ is added to store the old category ‘Electronics’, while the current category column is updated to ‘Accessories’.

Type 4 — History Table

  • Concept: Changes are tracked in a separate history table, keeping the current data in the original table unaltered. This method balances the need for historical data with the efficiency of current state queries.
  • Use Case Example: Customer address changes are tracked in a customer history table, while the main customer table always reflects the latest address.

Type 6 — Hybrid (1+2+3)

  • Concept: Also known as “Mini-Dimensions”, Type 6 combines techniques from Types 1, 2, and 3 to provide a versatile approach to managing dimension changes. It tracks complete row history, current values, and previous values within the same dimension table.
  • Use Case Example: In a sales database, a sales region might undergo changes that are significant enough to warrant a new record (Type 2), but for less critical attributes like sales target, the previous value might be stored in a new column within the same record (Type 3), and the most current data is always reflected in the dimension table (Type 1).

Scenario to illustrate: A customer named John Doe, initially from New York, moves to Los Angeles.

Type 0 — Fixed Dimension:

In SCD Type 0, historical data never changes. If John moves, City and State remain as 'New York' and 'NY', reflecting the state of data at the time of its initial entry.

Type 1 — No History:

SCD Type 1 updates the existing record with no history of previous values. After John moves:

Type 2 — Row Versioning:

SCD Type 2 adds a new record for significant changes, often using start and end dates or a version number to indicate the period of validity.

Type 3 — Previous Value Column:

In SCD Type 3, a new column is added to store the previous value of the changed attribute.

After John’s move, with Previous City and State Columns:

Type 4 — History Table:

SCD Type 4 maintains a separate history table for changes, keeping the original table for current data only.

Type 6 — Hybrid (Combining Types 1, 2, and 3):

SCD Type 6, a hybrid approach, maintains a complete history like Type 2, tracks current and previous values within the same record as Type 3, and always reflects the most current data.

In Type 6, ‘EndDate’ could be NULL or a future date, indicating the record is current, and ‘PreviousCity’ holds the last known value before the most recent change.

Choosing the Right SCD Type

Selecting the appropriate SCD type depends on business needs, the nature of the data, and how historical changes impact analyses. Type 1 is suitable for data where historical accuracy is not crucial, whereas Type 2 is preferred for comprehensive historical tracking. Type 3 and Type 4 offer compromises between complexity and historical preservation. Type 6’s hybrid approach provides flexibility but requires careful design to avoid confusion and maintain data integrity.

Conclusion

Understanding and implementing the right SCD strategies is essential for maintaining the integrity and usefulness of a data warehouse. Each SCD type serves specific use cases, and the choice among them should align with the business’s analytical requirements and data governance policies. By effectively managing slowly changing dimensions, organizations can ensure their data warehousing efforts support accurate, insightful, and historical data analysis, driving informed decision-making.

--

--