How to Check Data Quality in PySpark
Using deequ to calculate metrics and set constraints on your big datasets
We have all heard it from our coworkers, our stakeholders, and sometimes even our customers — what is going on with the data?
What if instead of hearing it from others we could set up some checks and constraints and identify the problems before our data consumers see it? What if we could do that on…