This is really interesting!
Adam Bethke
1

Thanks! Full replication is ideal, and I would prefer sampling over schema-only if full replication is too costly. See my response to Alexis ROLLAND for my high-level thoughts.

As far as our implementation is concerned, all of our SQL is defined in dbt as dbt models, so rebuilding the whole warehouse in a scratch schema is as simple as creating the BigQuery schema and running dbt run, which generates a DAG and recreates the tables in their proper order based on their internal dependencies. I would definitely encourage you to check out dbt because it makes my life a lot easier!

Could you prototype something with views? Your test schema could be view-only and point to the tables with the most raw form of your data. That essentially lets the query planner handle the dependencies between views, at the cost of some query performance.