Airbnb’s End-to-End ML Platform

Airbnb: Bighead

Vimarsh Karbhari
Acing AI

--

In 2016, Airbnb had a few major ML models in production. Each of these models took on average 8 week to 12 weeks to build. There was no consistency among models. They were built using Aerosolve, Spark and Scala. Some of the best ML packages like Tensorflow, PyTorch, SK-Learn were required but was not present. At the data level for systems which utilize these models, there were significant discrepancies between offline and online data. Bighead was build to solve these problems.

Bighead

Why?

Some of the issues highlighted above were concerning areas which needed to be solved even before more ML models and systems were build. In addition, a platform was required to enable more users to build ML products by reducing time, effort and friction. These would also enable easier model evaluation.

How?

Zipline (ML Data Management Framework)

This is the framework that helps defining features (especially windowed) with hive was complicated and error prone. It also helps backfill training sets (on inefficient hive queries) and enables feature sharing. Using this framework, inconsistencies in offline and online datasets can also be resolved. Deploying it also resulted in ML data pipelines have better data quality checks and…

--

--