Tuning Whatnot’s Data Platform for Speed and Scale

Stephen Bailey
Whatnot Engineering
6 min readMay 3, 2022

--

City planners have a tough task. Given scarce space and scant knowledge of the future, they must lay out the zones, infrastructure, and codes to foster a vibrant urban ecosystem. Neglect the planning, and even with the best materials and economy, the city may within up a convoluted mess.

Data platform teams face a similar challenge today. In a “modern data stack” world, the difference between building a data metropolis or a data wasteland is one of planning and execution, not technology.

At Whatnot, our core principle to “move uncomfortably fast” in support of our customers puts even more pressure on data systems. New features and new teams makes data stale. Staleness creates instability. Instability breeds distrust.

To stay ahead of these problems, the data platform team adopted firmer principles around organization, ownership, and automation. Although we made a few architectural changes since our last post (including adding real-time data systems), here I want to detail a few principles guiding the team as we hardened patterns in the platform.

Principle: Build Modules, Not Monoliths

Loosely coupled systems scale better. Allowing more people to develop data models in dbt Cloud should never break critical assets. Letting five teams orchestrate Python scripts on Dagster Cloud should not risk production pipelines. Add new machine learning models to our MLOps platform should not make existing services suffer.

--

--

Stephen Bailey
Whatnot Engineering

Stephen Bailey is a data engineer at Whatnot, the country’s fastest growing online marketplace. He writes occasionally on his blog at stkbailey.substack.com .