Backfill Your SQL Tables Without Breakage Before Anyone Finds Out You Were Wrong
Re-loading missing data will be one of the least glamorous but most important tasks you do as a SQL developer. Get it right.
Why You Need to Backfill Your SQL Tables
Ugh.
Whether I find out from an alerting system or directly from a stakeholder, “ugh” is my natural reaction when I learn that we have missing data.
Like many aspects of data-oriented work, context is what determines whether your missing data is a minor headache or a three-alarm fire.
In any case, identifying and fixing missing data must be a priority of anyone who deals directly with data that is used to guide organizational decision makers because missing, incomplete or error-riddled data can impact both real-time and historical analysis.
To account for these gaps SQL developers (typically data engineers) work through a sometimes-grueling process called backfilling.
If you’re unfamiliar, backfilling is just a catch-all industry term used to describe the CRUD processes involved with correcting incomplete or incorrect data after it should have been loaded.