Understanding Slowly Changing Dimension (SCD) Type 2
SCD Type 2, known as Slowly Changing Dimension Type 2, is a strategy applied in data warehousing to manage changes in dimensional data effectively. In SCD Type 2, when changes occur in a dimensional attribute, rather than updating the existing record in the dimension table, a new record is inserted to represent the change. This approach preserves historical data by retaining both the old and updated versions of the information.
The main characteristics of SCD Type 2 include:
Historical Tracking: It allows for the monitoring of changes over time by creating new records for each modification, ensuring accurate historical data representation by having an IsActive or IsCurrentRow Boolean attribute.
Effective Date Range: Typically, each record in the dimension table contains effective date ranges, indicating the period of validity for that specific version of the data.
- Effective Start Date
- Effective End Date
Surrogate Keys: To ensure the uniqueness of each dimension version and maintain data integrity within the model, surrogate keys are often employed for identification purposes.
SCD Type 2 proves beneficial when maintaining a historical record is essential, especially for monitoring changes in aspects like customer addresses, product prices, or employee roles. However, it might result in increased storage requirements due to the creation of multiple records for each changing dimension.
This method facilitates historical analysis and reporting, offering a comprehensive view of how dimensional data evolves over time. It stands as a valuable tool in data warehousing, enabling informed decision-making and in-depth analysis for strategic purposes.
Example:
Source Table: Customer table schema from OLTP system
Target Table: dim_customer table schema from Data Warehouse (OLAP)
- Here customer_key is the surrogate key, which is unique and has an auto-incremental value.
- The effective start date is the record-inserted date.
- Effective end date is the record updated or expiry date.
- IsActive has a Boolean value, which tells us whether the record is current or old.
While doing data loading, in this SCD type, we have 3 scenarios.
Insert Scenario:
As of now, Source Customer Table has two active records.
In this scenario, within the target dim_customer table, the effective start date is set to the current timestamp, the effective end date remains null or holds a date with a more time gap, and the IsActive attribute should be 1 to indicate that the record is currently active.
Update Scenario:
Lets update customer name from Sai to Saikarthika, where customer ID =1
In this scenario, within the target dim_customer table, the record associated with customer ID 1 undergoes changes. The effective end date is updated to the current timestamp, and IsActive is set to 0, signifying the record’s inactivity.
Subsequently, a new record with updated information is inserted. For the new record, the effective start date is set to the current timestamp, the effective end date remains null, and IsActive is set to 1, indicating the record as active.
Delete Scenario:
Let's delete the record that has a customer ID equal to 2.
In this scenario, within the target dim_customer table, the record associated with customer ID 2 undergoes changes. The effective end date is updated to the current timestamp, and IsActive is set to 0, signifying the record’s inactivity.
This represents the fundamental concept of SCD Type 2. Some engineers also incorporate source- and target-hash codes to monitor changes. My upcoming blog will delve deeper into SCD Type 2. Your valuable feedback or comments are greatly appreciated.