Are You Sure You Have Good Data?

Best practices for detecting bad data before it spreads

Brad Caffey
Apr 3 · 8 min read
Photo by Mika Baumeister on Unsplash / help by Dinosoft Labs from the Noun Project

Automated validations

Run your validations in your ETL

Have the right validations

Right Validation #1: Dataset Comparsion

Right Validation #2: Snapshot Comparison

Example of how snapshot comparison works

Right Validation #3: Trend Comparison

Right Validation #4: Unique Key Validation

Right Validation #5: API Validation

[{“unit”: “111”,”booking”: “1050”},{“unit”: “222”,”booking”: “3000”},{“unit”: “333”,”booking”: “5010”},{“unit”: “444”,”booking”: “2000”}]

Validations must include detail to research issues when they arise

Our validation engine in action

Conclusion

HomeAway Tech Blog

Software and data science revolutionizing vacations

7

7 claps
Brad Caffey

Written by

Big Data Developer at HomeAway.com.

HomeAway Tech Blog

Software and data science revolutionizing vacations