Whispering Data
Published in

Whispering Data

Guarantee Consistency in Your Delta Lake Table(s)

Learn how to integrate lakeFS hooks to validate data on commits.

  • Data mutations including deletes and “in-place” updates
  • Advanced partitioning and indexing abilities (with z-order)
  • CI/CD hooks can be used to validate data quality and even ensure referential integrity
  • Tables can be cloned in zero-copy fashion, without duplicating data

lakeFS & Delta in Action

To prove this point, we’ll demonstrate how to guarantee data quality in a Delta table by utilizing lakeFS branches and hooks into the workflow for adding new data.

  1. Check no payment is ever higher than the total amount of the loan

Automating Data Deployment with lakeFS Hooks

To provide this guarantee, we’ll configure the tests we created run automatically before we expose new data to consumers.

Wrapping Up

When using lakeFS together with Delta, we can introduce changes to data and schema safely, providing powerful guarantees about the data contained within.

Want to learn more?

To learn more about lakeFS and its benefits in data lake architectures, check out the lakeFS Github repo and say “Hi” in the Slack group!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Singman

DevRel @lakeFS. Ex-ML Engineering Lead @Equinox. Whisperer of data and productivity wisdom. Standing on the shoulders of giants.