CodeX
Published in

CodeX

How to Check Data Quality in PySpark

Using deequ to calculate metrics and set constraints on your big datasets

Photo by Prateek Katyal on Unsplash

We have all heard it from our coworkers, our stakeholders, and sometimes even our customers — what is going on with the data?

What if instead of hearing it from others we could set up some checks and constraints and identify the problems before our data consumers see it? What if we could do that on…

--

--

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store