Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnb

By Hamel Husain & Nick Handel

At Airbnb, we are always searching for ways to improve our data science workflow. A fair amount of our data science projects involve machine learning, and many parts of this workflow are repetitive. These repetitive tasks include, but are not limited to:

Exploratory Data Analysis: Visualizing data before embarking on a modeling exercise is a crucial step in machine learning. Automating tasks such as plotting all your variables against the target variable being predicted as well as computing summary statistics can save lots of time.

Feature Transformations: There are many choices in how you can encode categorical variables, impute missing values, encode sequences and text, etc. Many of these feature transformations are canonical such that they can be reliably applied to many problems.

Algorithm Selection & Hyper-parameter Tuning: There are a dizzying number of algorithms to choose from and related hyper-parameters that can be tuned. These tasks are very amenable to automation.

Model Diagnostics: Learning curves, partial dependence plots, feature importances, ROC and other diagnostics are extremely useful to generate automatically.

Enter Automated Machine Learning (AML)

There is a growing community around creating tools that automate the tasks outlined above, as well as other tasks that are part of the machine learning workflow. The paradigm that encapsulates this idea is often referred to as automated machine learning, which I will abbreviate as “AML” for the rest of this post.

There is no universally agreed upon scope of AML, however the folks who routinely organize the AML workshop at the annual ICML conference define a reasonable scope on their website, which includes automating all of the repetitive tasks defined above.

The scope of AML is ambitious, however, is it really effective? The answer is it depends on how you use it. Our view is that it is difficult to perform wholesale replacement of a data scientist with an AML framework, because most machine learning problems require domain knowledge and human judgement to set up correctly.

Also, we have found AML tools to be most useful for regression and classification problems involving tabular datasets, however the state of this area is quickly advancing. In summary, we believe that in certain cases AML can vastly increase a data scientist’s productivity, often by an order of magnitude.

We have leveraged AML at Airbnb in the following ways:


  • Unbiased presentation of challenger models: AML can quickly present a plethora of challenger models using the same training set as your incumbent model. This can aid the data scientist in choosing the best model family.

Diagnostics And Exploration

  • Detecting Target Leakage: because AML builds candidate models extremely fast in an automated way, we can detect data leakage earlier in the modeling lifecycle.
  • Diagnostics: As mentioned earlier, canonical diagnostics can be automatically generated such as learning curves, partial dependence plots, feature importances, etc.


  • Tasks like exploratory data analysis, pre-processing of data, hyper-parameter tuning, model selection and putting models into production can be automated to some some extent with an Automated Machine Learning framework.

Automated Machine Learning Tools

There is a wide array of commercial and open source tools that address the AML paradigm. We have experimented with the following tools:

Case Study: Competitive Benchmarks With Customer Lifetime Value Models

At Airbnb, we use machine learning to build customer lifetime value models (LTV) for guests and hosts. These models allow us to improve our decision making and interactions with our community at very granular levels (down to the user, level if we like).

LTV models are setup as a standard regression problem for guests, where the target variable is the spend of each guest over a time horizon. The features of this model include demographic, location and activity information from our web and mobile applications. There are many moving parts in this model that account for supply and demand elasticity, expected costs, and other variables.

During the course of building a model, it is important for a data scientist to stay objective with regards to their choice of algorithm. For example a complex model may only offer a small incremental benefit over a simple model and this trade-off should be made deliberately. For example, during the course of building the LTV model we succumbed to a bias towards one of our favorite algorithms, eXtreme gradient boosted trees (XGBoost). The reason for our biases were the following:

  • This algorithm performed well on closely related problems.
  • During model development we did ad-hoc cross validation and XGBoost seemed to do the best.
  • We had limited time to create this model, and spent most of our time on feature engineering, data cleaning and gluing our model into production systems — which left very little time for rigorous algorithm selection and tuning.

Being aware of our bias, we fed our raw training data through an AML platform to perform a sanity check and to benchmark our model’s error.

An illustration of these benchmarks are in the chart below. This chart displays the distribution of RMSE (Root Mean Squared Error) of a variety of models across out of time cross validation folds. The y-axis corresponds to distinct “blueprints” which are a combination of algorithms and feature engineering steps. While it is impossible to go into the details of each of these blueprints, the below chart gives you a sense of the breadth of exploration that modern AML systems are capable of.

Using AML, we quickly got an alternate point of view: linear models are very competitive for this problem. It turned out that the AML platform tested a plethora of alternate feature engineering steps as well as performed more rigorous hyper-parameter tuning that we did not have time to explore manually. Furthermore, these findings allowed us to make changes to our algorithm and reduce model error by over 5%, which translated into a material impact.


AML is a powerful set of techniques for faster data exploration as well as improving model accuracy through model tuning and better diagnostics. The above case study highlights AML’s capability to improve model accuracy, however we have realized AMLs other benefits as well. For problems that are amenable to AML, we view the use of this paradigm as good modeling hygiene as it is cheap to try once you have already composed your training data. AML is not guaranteed to improve your results, but we find that it often does if used skillfully.

Further Reading

Intrigued? We’re always looking for talented people to join our Data Science and Analytics team!

Special thanks to the following editors: Michael, Robert Chang, Krishna Puttaswamy