Comparison of AutoML solutions 2021

Learn about the current state of AutoML solutions, adoption, market size and which one suits you best

Aleix López Pascual
Analytics Vidhya
7 min readAug 19, 2021

--

I recommend you to read my first article about AutoML where I give an introduction to the subject, the benefits of it and whether or not AutoML can replace data scientists.

Background

AutoML solutions have been around for quite some time now. The early AutoML solutions like AutoWeka originated in academia in 2013, followed by Auto-sklearn and TPOT. This triggered a new wave of machine learning and the coming years saw many other AutoML solutions including Auto-ml, and Auto-Keras hitting the market. Simultaneously startups like H2O.ai and DataRobot came out with their versions of automated solutions. More recently, companies like Amazon, Google, and Microsoft have also joined the bandwagon.

Some of the solutions like AutoWeka, Auto-Sklearn, TPOT, H2OAutoML are fully open-sourced while DataRobot, Amazon Sagemaker, Google’s AutoML, and DriverlessAI are enterprise-based.

AutoML adoption

AutoML tools and their usage

These are the most well-known and used AutoML tools based on the results of Kaggle’s State of Data Science and Machine Learning 2020 survey:

Here we show the adoption of such tools among the respondents of the survey:

We also compare the results for the Kaggle’s State of Data Science and Machine Learning 2019 survey:

We can highlight two key points:

  • The Year 2020 has seen a better adoption of the AutoML tools as compared to 2019.
  • The adoption of open-source AutoML tools is higher than enterprise AutoML tools. AutoSklearn has shown a maximum rise in adoption. In the enterprise domain, Google Cloud gained about 11% growth in adoption and 4% by H2O Driverless AI.

Social Media Analysis: Twitter and Google Trends

In addition, we analyze Twitter and Google trends to have a more clear understanding of the general sentiment towards AutoML.

The data used for the analysis can be obtained from Google Trends. We observe an increasing trend since the beginning of 2017.

AutoML market size

The autoML market has generated a revenue of $270 million in 2019 and is expected to reach $14,512 million by 2030, advancing at a CAGR of 43.7% during the forecast period (2020–2030). Considering that, we believe AutoML hasn’t reached a peak, and that interest in autoML will continue to grow.

Other resources regarding AutoML adoption:

What can we expect from an AutoML tool?

AutoML solutions can target various stages of the machine learning process. Not all of them cover the same scope. These are the different steps that can be automated:

AutoML use cases

AutoML applications are found in many industries (healthcare, financial services, marketing…). Here you can find some use cases of AutoML by the hand of H20.ai.

Comparison of AutoML solutions

Open-source vs enterprise

Open-source and enterprise solutions for AutoML are very different: open-source solutions can only automate algorithm selection and hyperparameter tuning, whereas enterprise solutions can do way more (see section “What can we expect from an AutoML tool”). In addition, the results achieved using open-source solutions are quite worse than enterprise solutions.

H2O Driverless AI

  • Data source agnostic. It can ingest data from any data source including Hadoop, Snowflake, S3 object storage, Google BigQuery, etc.
  • Automatic Visualization plots, graphics, and charts to help understand the data shape, outliers, missing values, and so on. This is where a data scientist can quickly spot things such as bias in the data. In a way, Automatic Visualization helps jump-start the EDA process.
  • Machine Learning Interpretability gives insights about what model was generated and which features were used to build the model. Every prediction made by a Driverless AI model can be explained to business users, so the system is viable even for regulated industries.
  • AutoML for Any Data: create world-class models for not only tabular data, but also text, image, video, and time-series data.
  • Automatic documentation gives one an in-depth explanation of the entire feature engineering process.
  • The entire process is done through a graphical user interface, making it easy for even a novice data scientist to be productive immediately.
  • Highly customizable: upload own models, transformers and scorers as a custom recipe.
  • Ability to leverage the benefits of the automation without losing the ability to influence the optimization
  • Deployment (documentation): Driverless AI can be deployed everywhere including all clouds (Microsoft Azure, AWS, Google Cloud) and on-premises on any system. Each completed experiment generates a MOJO (Model Object Optimized). The MOJO scoring pipelines can be deployed from within the Driverless AI GUI or standalone for production purposes. From the GUI, there is direct access to deploy to AWS Lambda, SageMaker or a local REST server; there is also the possibility to download the model as a Plain Old Java Object (POJO).
  • Model monitoring for projects in production.
  • Documentation
  • Tutorials
  • Cost: Need to contact the vendor
  • 14-Day Free Access to the H2O AI Hybrid Cloud. 21-Day Free License Key for Driverless AI. The main difference is that the first one uses the cloud environment of H20, whereas the second one allows you to take the license key to any other cloud environment such as AWS. Install the Driverless AI AWS Marketplace AMI

Google Cloud AutoML

  • Google AutoML is composed of several products depending on the type of problem: AutoML Natural Language, AutoML Tables, AutoML Video Intelligence, and AutoML Vision. Recently, Google has released Vertex AI which unifies all the AutoML products and the rest of AI products from Google in a unified API, client library, and user interface.
  • Very tied to other Google products.
  • Easy to connect with the rest of the products in Vertex AI (train custom models, Explainable AI…)
  • Less configurable than Driverless AI.
  • Little visibility on the model generated makes it difficult to iterate on top of it.
  • Deployment is made on an endpoint that can be created using Google cloud, REST & CMD, Java, Node.js and Python. The model can be deployed directly on Google Cloud or it can be downloaded in a container (edge computing).
  • Cost: Pay for what you use. No need to pay for a license.
  • Documentation
  • How to build and deploy a model with Vertex AI

H20–3

  • Open-source version of H20.
  • In-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.
  • It makes development easier and faster, even for non-experts.
  • Supports the most widely used statistical & machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more.
  • H2O-3 also has an industry-leading AutoML functionality that automatically runs through all the algorithms and their hyperparameters to produce a leaderboard of the best models. In contrast to other open-source AutoML solutions, it is highly configurable.
  • Contains a model explainability interface where several explainability methods and visualizations are generated with a single function.
  • Productionalization: H2O-3 allows you to convert the models you have built to either a Plain Old Java Object (POJO) or a Model Object Optimized (MOJO). H2O-generated MOJO and POJO models are intended to be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages.
  • H2O Flow is an additional user interface within H20–3 that you can use optionally. It is a web-based interactive environment that allows you to combine code execution, text, mathematics, plots, and rich media in a single document, similar to iPython Notebooks. This intuitive interface allows you to build your machine learning model without a single line of code. This eliminates the need of being familiar with the H20 SDK and allows anyone to build a ML model.
  • H20 is supported on many cloud environments (AWS, Azure…)
  • Documentation
  • Tutorial

My verdict

I believe that H20–3 is the best open-source platform to democratise machine learning at the moment. Its complete scope and the H2O Flow web-based interface makes it the first choice among other open-source solutions. I was able to build a machine learning project for customer churn from scratch to deployment without a single line of code. I recommend reading this Step-By-Step Guide to AutoML with H2O Flow for a complete example.

Among enterprise solutions, H20 Driverless AI is the most complete, customizable and agnostic tool out there. I easily generated a model for customer churn better than the one from H20–3, while maintaining high control and understanding of the modelling.

--

--

Aleix López Pascual
Analytics Vidhya

Senior Data Scientist @ Glovo | Competitions Expert @ Kaggle | Writer @ Medium | MSc in High Energy Physics, Astrophysics and Cosmology