Member-only story
Auto-Tag Sensitive BigQuery Data & Never Touch The UI Again
Eliminate a tedious data governance chore after understanding privacy taxonomies and a simple BigQuery API implementation.
For as many hours as you spend developing, testing and maintaining robust data pipelines I bet you don’t think as deeply about who your end user might be. This is excusable because data engineering isn’t a discipline with a significant focus on the UX (user experience).
And, depending on the structure of your org, the next individual you might “hand off” your data source to might not even be your final stakeholder. For instance, I often build pipelines whose data sources become the domain of data analysts who build dashboards stakeholders can access.
Along its meandering journey from third-party vendor to your IDE to data warehouse to stakeholder access, many individuals will be able to view, query and manipulate the data you’re ingesting.
And this usually isn’t a good thing.
Data engineers and anyone with write access in your database spend an inordinate time thinking about how the resultant data should be presented, not who might have undue access. This is why implementing a data governance policy at an organization level is integral to data security, especially regarding sensitive data a.k.a. PII.
PaaS vendors like my go-to, Google Cloud Platform, allow you to define and enforce…