Slowly Changing Dimension
I wanted to learn about Slowly Changing Dimension for a long time, but I couldn’t find a clear, concise blog post for anyone not familiar with the topic. I, therefore, give you my own offering, a quick introduction to Slowly Changing Dimensions or SCD in a data warehousing scenario.
Let’s take 2 tables
- Users (Dimension)
- Sales (Fact)
When organizing a data warehouse into a star schema, we need to relate fact records to dimension records to get its related attributes. There are scenarios where the information in the dimension might change. For instance, the user Adam might move to the United Kingdom. If he does, do we associate all his fact records with the new country? Or do we want to ignore the change in the country to keep historical accuracy? Or do we treat facts before the change in the country to those after?
It is this decision that determines whether to make the dimension a slowly changing one. There are different types of SCD depending on how you treat incoming change.
Types of SCD
Type 0 — Fixed Dimension
No changes are allowed here. In other words, the dimension never changes. In this case, we don’t change Adam’s country even if he moves to another one.
Type 1 — No History
Update the dimension directly. There is no track of the change in dimensions. We could only see the current state. In our case, Adam’s record would be modified to have United_Kingdom as Country. All his orders, which he placed when was in United_States, will now be pointing to United_Kingdom.
Type 2 — Row Versioning
Type 2 is the most common method of tracking change in data warehouses. Here, we track the changes with new records and additional columns such as the current flag and active dates.
New Columns
- ID — We add a new ID column since the existing user id will not be sufficient to identify the specific record we require.
- Current_Flag — A quick method of returning only the latest record of each user.
- Start_Date — The date from which the specific record is active.
- End_Date — The date to which the specific record is active
This method is very powerful. We maintain the history for the entire record and can easily perform change-over-time analysis.
Type 3
In this type, we add a new column instead of a record. In our case, we add a new column “Previous Country” to track the change. In case, if the user changes the country again, we have to add another column.
Type 4
We will simply update the record similar to Type 1 to accommodate the new change. However, we simultaneously maintain a history table which is similar to type2 to track the changes.
The dimension table after update will look like
The history table will have the following records.
Separating the history from the dimension makes the dimension table smaller and therefore helps in increasing the performance and reduces the complexity if the majority of the users only use the current value.
However, if you require historical values, this type adds complexity and performance overheads.
Type 1 and Type 2 are generally preferred than Type 4.
Got questions? Feel free to comment here.
If you liked this article, click the 👏 so other people will see it here on Medium.