Caught up in the whole
A work colleague had to quickly compile two similar datasets last week in SQL. I watched as he struggled most of the day with the problem.

Let me give a little more context. I was in the middle of a major project and he asked me for some advice on the approach. I quickly dished out which tables to use and where to get the information from.
The challenge was that the information was at different levels so it could not just be added together. It first had to be summarised and then it could be joined. Time was of the essence and if the accuracy of the result was queried it was acceptable for us to revise in the future as required.
I believe his greatest challenge is that he was trying to solve the whole problem ie compiling two datasets with 100% accuracy
Its not uncommon for data to be at different levels.
At the start of the day he found some outliers that didn’t conform to the expected model of the data and he spent a lot of time trying to resolve this. Later in the day I tested how often this was the case and showed that it only affected one record in his dataset. One record and countless hours spent on solving this.
My approach is to start small. Start with what you know is right. Then expand this to incorporate more and test if the rest of the data conforms to your expectations. Deal with outliers as they crop up but only if they cause major variances to your datasets. I find it a lot easier to solve 10 small problems than solving one overarching big problem.
He was caught up in the whole and I have seen this happen a few times.
We recently were investigating a large number of variances. The general approach being used was to look at the variances one by one and find out the cause.
My approach was different. Look at the first one and find out what caused it and then apply the rules to all of the other variances. Then you can isolate those and look at the next one which hasn’t been defined and so on.
Compile what you know is right and then revise and apply what you learn to an ever widening dataset. Test any problems against what data you’re trying to produce as they might not cause material variances.
At least you will have most of the dataset compiled rather than getting caught out on the first apparent problem for hours and not producing any data.