How to Check Data Quality in PySpark

Using deequ to calculate metrics and set constraints on your big datasets

Photo by Prateek Katyal on Unsplash

We have all heard it from our coworkers, our stakeholders, and sometimes even our customers — what is going on with the data?

What if instead of hearing it from others we could set up some checks and constraints and identify the problems before our data consumers see it? What if we could do that on…



