6 Open Source Automated Machine Learning Tools Every Data Scientist Should Know

Shyam Sunder Kumar
Analytics Vidhya
4 min readAug 8, 2020

--

The rapid growth of machine learning applications in recent times has created a demand for off-the-shelf machine learning methods and development of user-friendly machine learning software that can be used by non-experts. Fortunately, the answer is already out there.

Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.

In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model.

Advantages of AutoML

  • Increased productivity for data scientists
  • Non-experts can also use machine learning without much hassle
  • Model accuracy and performance on par with traditional methods

Let’s look at some of the open source tools now!

1 — Auto Weka 2.0

Auto-WEKA Initial release of Auto-WEKA was released in 2013 and Auto-WEKA 2.0 was released in 2017.

AutoWeka2.0 tools are designed for the most common use-case which is tabular data (a table with rows and columns).

Weka Data Mining Tutorial for First Time & Beginner Users

2 —Auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.

Benchmarking Automatic Machine Learning Frameworks

auto-sklearn tools are designed for the most common use-case which is tabular data (a table with rows and columns). auto-sklearn performs the best on the classification datasets. (See Benchmarking Automatic Machine Learning Frameworks for more details)

3 — Auto-Keras

Auto-Keras is an open source software library which provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras does this using a fully automated approach, leveraging innovations in Efficient Neural Architecture Search with Network Morphism.

Auto Keras is a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search by introducing a neural network kernel and a tree-structured acquisition function optimization algorithm.

Auto Keras (open source autoML) practice on jupyter notebook

Note : Network morphism based NAS is still computationally expensive due to the inefficient process of selecting the proper morph operation for existing architectures.

4 — TPOT

TPOT is built on the scikit learn library and follows the scikit learn API closely. It can be used for regression and classification tasks. TPOT has genetic search algorithm to find the best parameters and model ensembles.

TPOT your Data Science Assistant

TPOT tools are designed for the most common use-case which is tabular data (a table with rows and columns). auto-sklearn performs the best on the regression datasets. (See Benchmarking Automatic Machine Learning Frameworks for more details).

Coding Session using TPOT ( 17:40 )

Note: It has special implementations for medical research.

5 — TransmogrifAI

TransmogrifAI is a library built on Scala and SparkML that could rapidly produce data-efficient models for heterogeneous structured data at massive scale. With just a few lines of code, a data scientist can automate data cleansing, feature engineering, and model selection to arrive at a performant model from which data scientist can explore and iterate further.

Transmogrification as the process of transforming, often in a surprising or magical manner

The TransmogrifAI Workflow (Source : Salesforce Engineering )

It automates the creation of machine learning models for each Salesforce customer in a multi-tenant way so that it scales to thousands of customers, without the need of data scientists to build and optimize each of those models. If you want to learn how Einstein simplifies the creation of machine learning workflows.

Auto-Machine Learning: The Magic Behind Einstein

Also See Open Sourcing TransmogrifAI for more details

6 — H2O AutoML

H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. Stacked Ensembles will be automatically trained on collections of individual models to produce highly predictive ensemble models which, in most cases, will be the top performing models in the AutoML Leaderboard.

H2O Auto ML = Random Grid Search + Stacking

Intro to AutoML + Hands-on Lab — Erin LeDell, Machine Learning Scientist, H2O.ai

Final Note

By no means this is a complete list of auto ML tools (for more detailed list of Auto MLtools visit automated-machine-learning). I have also not discussed performance and expensive computational cost which existing search algorithms usually suffer.

There you have it, your 6 Open Source Automated Machine Learning Toolkits.

--

--