Data Quality at Airbnb

Part 1 — Rebuilding at Scale

Introduction

Background

The Data Quality Initiative

  1. Ensure clear ownership for all important datasets
  2. Ensure important data always meets SLAs
  3. Ensure pipelines are built to a high quality standard using best practices
  4. Ensure important data is trustworthy and routinely validated
  5. Ensure that data is well-documented and easily discoverable

Organization

Data Engineering Role

Org Structure

Community

  • Data Engineering Forum — Monthly all-hands meeting for data engineers intended for cascading context and gathering feedback from the broader community.
  • Data Architect Working Group — Composed of senior data engineers from across the company. Responsible for making major architectural decisions, and conducting reviews for Midas certification (see below).
  • Data Engineering Tooling Working Group — Composed of data engineers from across the company. Responsible for developing vision for data engineering tooling and workflows.
  • Data Engineering Leadership Group — Composed of data engineering managers and our most senior Individual Contributors. Responsible for organizational and hiring decisions.

Hiring

Architecture and Best Practices

Data Model

  1. Tables must be normalized (within reason) and rely on as few dependencies as possible. Minerva does the heavy lifting to join across data models.
  2. Tables describing a similar domain are grouped into Subject Areas. Each Subject Area must have a single owner that naturally aligns with the scope of a single team. Ownership should be obvious.

Data Technology

Operations

Governance

Process

Midas Certification process, described below.
Diagram of the Midas certification process, described in detail below.

Accountability

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store