How to handle changed fact data in the database via an ETL tool

Asib Kaisar
Brain Station 23
Published in
3 min readFeb 2, 2022

Scenario

In an ideal situation, a data warehouse is kept separate from an operational database. Once inserted, data within fact tables in the operational database is prone to change.

Suppose an operational database keeps transaction or sales data in a fact table on daily basis. This data is extracted and loaded to data warehouse early morning the next day. But there is an option to update or correct the previous day’s data in operational database by business users via web app/mobile app/etc. Once this happens, data warehouse needs to be updated accordingly on next day’s ETL. So, on next day’s ETL we need to detect & update previous day’s changed data properly. Also, we must insert new data. So using the SSIS tool how do we handle this situation in the data warehouse and make proper changes accordingly? (Please note that this blog is mostly about fact/transaction data — not about dimension data/SCD)

Example with sample data

Let,

· RefTableA = Fact table in transaction database (Source)

· RefTableB = Fact table in DWH (Destination)

· StgTableS = Stage table in DWH (Intermediate)

Before making any changes to destination table, we must identify the following –

· What are the new rows in source table?

o These rows will be inserted

· What are the rows from previous day that got updated?

o These rows must be updated in the destination table

· What are the rows from previous day that are unchanged?

o These rows must be kept unchanged in the destination table

For example,

On early morning December 02, data is extracted from transaction database to DWH.

On 02 December

· New rows are inserted (row 5,6) in source table

· Row 3 is updated (price is changed to 400) in source table

Conclusion

There are many powerful features in ETL tools to handle a situation where we need to detect and handle changed data. The techniques mentioned above are very handy to implement efficient, robust ETL which helps to build and maintain large-scale database systems.

--

--