Automated Machine Learning

An Introduction to AutoML

Google, Amazon & H2O.ai AutoML

Andrew Gooday

Published in

Analytics Vidhya

15 min readJul 4, 2020

Building AI with Machine Learning

Perhaps the fastest way to become familiar with a machine learning workflow and start to get a sense of some of the difficulties posed is to take a look at Kaggle.

Kaggle, owned by Google, hosts a community of machine learning enthusiasts and professionals, lots of useful public data sets, free GPU compute, prize money awarding machine learning competitions and hosted machine learning ‘notebooks’.

Here’s an excellent example of a machine learning notebook — it combines Python code with readable graphs, comments and explanations.

Kaggle Notebook, by Janio Bachmann — extract showing data exploration

We’ll be referring back to this specific notebook as we test drive AutoML solutions, so you might want to create a copy and run through it — clicking on the image above will take you directly to the notebook. Even if you don’t know Python, you can read the comments and view the analysis.

Take a look at any such data science notebook for a given Kaggle challenge and it will likely include many of the following:

Data Exploration — tables, and graphs to assess the quality and ‘shape’ of the data. (see the above image)
Data Preparation — once the data has been visualised and explored, work to pre-process the data for machine learning can begin. This can include dealing with outliers, scaling and mapping categorical to numerical values, fixing errors.
Feature Engineering — creating new features from the original data set. Feature crosses, dimensionality reduction (for example via Principal Component Analysis,) dropping features, consolidation and imputation of missing values. Some of these steps may even include use of machine learning.
Model Selection & Training — selecting the appropriate candidate models for the task and data, training them on the engineered data.
Scoring & Optimisation — selecting the appropriate scorer (e.g. AUC vs RMSE), scoring and tuning hyper-parameters. A model leaderboard, and selection and optimisation of the ‘winning’ model and transformations.

However, Kaggle doesn’t provide the full picture. The business problem & use case has been defined for you, the data has been supplied and the focus is on creating a high scoring model, not deploying and maintaining a product inference solution.

The Machine Learning Process — End to End

So, in addition to what we’ve seen on Kaggle, we also need to consider these important machine learning elements:

Machine Learning Problem Framing — Ensuring that your use case or business requirement can be framed as a machine learning problem, and that it’s appropriate to use machine learning to address them.
Data Acquisition — You’ll need the right user opt-ins, the right systems and products to capture data, stream it to where it’s needed, store it in data lakes, move it to data warehouses. Until recently, machine learning needed big data. With transfer learning, the ability to build machine learning solutions to address new use cases with small data — perhaps only hundreds or a few thousand examples, is emerging. But you still need high quality, labelled data.
ML Experiments — Forms an iterative cycle which includes Data Engineering, Model Selection & Training, Scoring & Optimisation
Machine Learning Inference Pipeline — A production machine learning solution typically compromises more than the model. The data transformations created to feed the model need to be included plus.
Model Documentation — For many real world applications, interpretability & explainability are required to mitigate risk from unintended discrimination and bias, to meet legal requirements, to ensure trust in the service through understanding of model behaviour.
Model Deployment — In general although machine learning deployment can be difficult, but use of containers and cloud endpoints make deploying and scaling inference easier. It’s really important to note that the systems and type of compute used to create the models are not necessarily suited to supporting production inference with the associated low latency and scale.
Model Management — The performance of you model and machine learning inference will degrade over time. Capturing the data needed to maintain and update the model, ensuring performance and data drift can be detected, being able to test deploy & new models are all part of the machine learning lifecycle.

End to end, the process of building a new model and moving it into production can involve many teams — Data Ops and Engineering, Data Science, Development & DevOps, Product.

Factor in best practices, keeping up with the latest research, models, feature engineering techniques and given the resources involved, the amount of iteration — surely some degree of automation is available?

The answer is — yes. Automation varies in terms of scope and functionality, but ‘AutoML’ is addressing almost all of the areas above.

Defining AutoML

I’m going to look at AutoML services provided by Google, Amazon and H2O.ai, and use them to define AutoML. Then, in the next section, I’ll run the banking data, as used by the featured Kaggle notebook, through each system to provide some insight into how the different AutoML solutions vary.

Google’s AutoML

Google provide a range of AI services, but for the purposes of this discussion, we’ll focus on Google’s AutoML offering.

Google AutoML Video, Google AutoML Vision, Google AutoML Natural Language, Google AutoML Text & Doc Entity Extraction, Google AutoML Sentiment Analysis, Google AutoML Translation effectively allow you to extend the functionality of existing Google AI services, using your data. Transfer learning or neural architecture search run behind the scenes to adapt inference to your use case and based on your data.

I’m going to refer to this as ‘Closed’ AutoML,

Google AutoML — An example of ‘Closed’ AutoML

Data Acquisition — You supply ‘clean’ data.
Basic Data Exploration — In Google AutoML, you can view fields from your source data, get some basic statistics on missing values, distinct values, ‘nullable’ fields, target correlation.
Basic Data Preparation — You can select the target field, drop columns from training via GoogleML’s basic GUI. Services to help you label that data are available. You can control train/validation/test split.
ML Experiments — You can initiate training from a GUI, but you cannot influence model selection. You are given no access to feature engineering, no visibility of training, optimisation, any ensembling or stacking that happens. Basic metrics for the final model/pipeline are made available.
ML Inference Pipeline. You cannot access the pipeline components & there is no explainability or other documentation generated.
Model Deployment. The resulting model can typically be accessed by RESTful API and may be available for deployment in a container. However, you don’t own the model & you can’t access the model.

All of that said, the results can still be impressive and achieved quickly.

Google’s AutoML services generally have a corresponding Google API, where you can’t customise the model with your data, but more features are available. As an example, compare AutoML Vision with the Vision API here.

Amazon’s AutoML

The following Amazon AI services fit my definition of ‘Closed’ ’AutoML. Amazon Comprehend & Amazon Translate have a ‘custom’ feature, allowing the automated building of a Natural Language Processing based on your data in the former case, handling of domain or brand specific language in the latter.

Amazon Rekognition offers an automated custom label service for image classification and object detection, providing a GUI interface for labelling your image data and some basic evaluation metrics. In my ‘Building a Simple AI Photo Analytics App on AWS’ article, I used Amazon Rekognition and this could easily be extended via AutoML and Rekognition’s custom labels.

However, Amazon also offer SageMaker Autopilot, which extends automation beyond the initial hyper-parameter tuning initially offered. When used via SageMaker Studio, which is not available in all AWS regions yet, a machine learning IDE, Autopilot blends model development automation with direct access to auto-generated Python machine learning notebooks.

This increase in the ability to view and customise at various stages of the automated machine learning process provides an example of ‘Open’ AutoML.

SageMaker Autopilot — An example of ‘Open’ AutoML

Data Acquisition — You still supply ‘clean’ data.
Data Exploration — SageMaker Studio & Autopilot automatically creates a data exploration notebook which includes a dataset sample, column analysis, count statistics (for string type features), descriptive statistics (for numerical features) and some recommended manual check & improvement actions.
Data Preparation — SageMaker Studio & Autopilot builds a candidate generation notebook which can be used to view and modify transformations which have been automatically created for each candidate model. Data transformations such as imputation, one-hot-encoding, principal component analysis, standard scalar, are proposed.
Feature Engineering — SageMaker Studio & Autopilot’s candidate generation notebook can be modified to customise the experiment.
Selection & Training — This happens automatically. The candidate generation notebook can be used to step through the auto generated workflow if desired.
Scoring & Optimisation — You can select your scoring algorithm, and adjust the hyper-parameter ranges and optimisation via SageMaker’s candidate generation notebook
ML Inference Pipeline — You can access SageMaker Autopilot’s model artifacts, data processor models, tuning details and notebooks, which are stored in S3.
Model Deployment — You have full control of model deployment & could export and use the model and pipeline where ever you need. Deployment to AWS is semi-automated by SageMaker.

Amazon SageMaker Autopilot — ‘Open’ AutoML

Like Google AutoML Tables, Autopilot currently only works with structured data.

3. H2O.ai’s AutoML

With a mission to “democratize AI for everyone” , H20.ai look to be very committed to AutoML.

Like SageMaker Autopilot, H20.ai Driverless AI provides access and control to each part of the automated machine learning process.

H20.ai provide an intuitive and fully integrated user interface. It’s not quite an IDE, although support for Python and R clients provides control of automation from either environment.

Unlike both Google AutoML and SageMaker Autopilot, Driverless AI can run automation on NLP, Time Series and Structured/Tabular data. Driverless AI supports Tensorflow for deep learning models and can also run automation using your transformations, models or scoring ‘recipes’.

H2O.ai — Another example of ‘Open’ AutoML

Data Acquisition — You still supply ‘clean’ data, with connectivity options to Hadoop hdfs file systems, or S3 bucket, Azure, Hive, JDBC etc.
Data Exploration — Extensive visualisation options for your source data are provided via ‘AutoViz’ — Spikie & Skewed histograms, box plots, outliers, correlation graphs, parallel coordinate plots, RADAR plots, data & ‘missing’ heatmaps. Data set details contain summaries of types, statistics, missing values, unique values, and first 20 rows of a column.
Data Preparation — Custom recipes can be created to create new data sets via the GUI. Test/train/validation splits can be determined and the target variable set.
Feature Engineering — Not only are features generated, but you can track their importance and use in real time as experiments progress.
Selection & Training — The range of candidate models varies based on your initial experiment settings — Accuracy, Time, Interpretability and Scorer. You can see exactly which models will be included in the experiment via the GUI, which also provide a real-time dashboard showing performance for each model.
Scoring & Optimisation — You can select your scoring algorithm.
ML Inference Pipeline — Low latency, Python, R, C++, Java code is generated. Comprehensive auto generated human readable documentation is generated from the experiment. In addition, a large amount of explainability, interpretability information is provided: “K-LIME, Shapley, Variable Importance, Decision Tree Surrogate, Partial Dependence, Individual Conditional Expectation, Sensitivity Analysis, NLP Tokens, NLP LOCO” plus support for LIME and Shapley reason codes.
Model Deployment — scoring pipelines are deployed manually.

Exploring AutoML

I’ll now carry out a simple run through of each of the above AutoML solutions. I’ll be using a subset of the data file that Google use in their AutoML demo. Using this simple data set will allow direct comparison with Janio Bachman’s Kaggle notebook, and hence ‘manual’ machine learning.

This is definitely not a benchmarking test, rather the purpose is to illustrate the differences between the various AutoML solutions.

It’s worth noting — this is a small dataset at 17 columns and less than 12,000 rows. There aren’t any missing values, and it’s nicely balanced in terms of the prediction target we’re using, the ‘deposit’ column, split nearly 50:50.

So it’s not the most difficult of data sets, and therefore a little atypical.

But this is okay — I’m only looking to highlight some of the differences between approaches and the potential of ‘AutoML’.

Google AutoML Tables

So let’s get started with Google AutoML tables, and I’ll stay very close to the Google demo.

Following the instructions here, but minus any need to do anything billing related, I created a project, a bucket and uploaded the csv file.

I selected ‘deposit’ as the target prediction variable. All other columns are ‘features’ which can be fed into the model to help make a prediction.

Visualisations of each of the feature data can be seen by simply clicking on them. The correlation of each feature to the target variable is shown. I can manually drop those columns with low correlation if I choose.

Google AutoML Tables — Selecting the Prediction Target

Google automatically decides to build a classification model, which is what we need. The goal is to work out whether the customer will make a deposit and it’s a binary choice, yes/no, hence we want to build a ‘binary classifier’.

Selecting ‘Train Model’ allows me to specify a budget limit of 1 node hour to run the work and uncheck any unwanted feature columns.

That’s it until I receive an email, which will confirm training is complete.

Google has automatically split the data into train (80%), validate (10%) and test (10%) which means the model can be tuned by the validation set and then checked with unseen data (the test set.)

On the evaluate tab there’s lots of useful information — as this is a classification problem, there are ROC/AUC and Precision/Recall curves, confusion matrix and feature importance. The sort of things you’d likely find on a Kaggle notebook, used to determine how well the model has done.

In our case, the model seems to be doing well — but the definition of good would depend on the cost of a false negative or positive, so the use case in question.

We can also see feature importance in the models classification decision — so there’s some basic explainability here too.

How does the model fair verses the Kaggle notebook?

Kaggle Notebook, by Janio Bachmann — showing model performance

Not bad — we’ve got a higher ROC/AUC score at 93.9. There’s also agreement on the 1st and 3rd most important features, although very little agreement on any others.

Kaggle Notebook, by Janio Bachmann — showing feature importance

And in terms of accuracy, we’re ahead too with 88.1% vs 84%

Kaggle Notebook, by Janio Bachmann — showing accuracy

We could now go ahead and run further tests, batch predictions, place the model online with one click, or export the model as a TensorFlow package to run in a Docker container.

In summary, I’ve used default values, very few data science skills, and some luck, in particular how well behaved the source data is, to create a production ready model that I can deploy any time I want.

But I don’t know what the model is — is it tree based? Is it an ensemble, a neural net? How does it make decisions? How would I tune or improve on the model?

Amazon SageMaker Autopilot

Which brings us on recent updates to Amazon’s SageMaker. By using the SageMaker AutoPilot service, I’ll be able to address some of gaps above. I’ll know what data transformations have taken place, which models are in use, and be able to modify them.

The SageMaker Studio service is still not available in all regions, so I’ll launch in the US-East ‘Ohio’ region, using a US East based S3 bucket to upload the banking data file.

SageMaker Studio’s JupyterLab environment provides a machine learning IDE. Selecting ‘Create Amazon SageMaker Autopilot Experiment’, you’ll see there are a few differences from Google’s AutoML already.

Initiating an AutoPilot Experiment in SageMaker Studio

The access to automatically generated machine learning notebooks and the resulting visibility is a big difference between Google and AWS’s approaches.

It’s unfortunate that we don’t have a GUI to select the target and feature columns. That, and just how long it takes to get an error message back — many minutes after initiating the run following my entering an invalid target name. One to address in future releases.

Autopilot goes through 4 stages: ‘Analyzing Data’, ‘Feature Engineering’, ‘Model Tuning’, ‘Completed’.

If you remember, I limited Google to 1 hour. This won’t be an entirely fair test because I didn’t limit Amazon in the same way.

Two hours after initiating Autopilot on the same bank data csv, feature engineering has completed, but model tuning is ongoing (there are 250 tuning jobs by default.)

Amazon SageMaker — Hyper Parameter Tuning

On completion, about 30 minutes later, you can take a look at the metrics. Go to experiments, control & click to ‘open in trial component list’ and add metrics. From here, a quick sort shows the best model provided a validation accuracy of 85.964%.

At first glance — this is bad. It’s better than the Kaggle workbook, but worse than Google AutoML, taking far longer to complete. Plus we don’t see a precision/recall curve, a ROC score, F1 score, confusion matrix, so it’s not possible to immediately compare.

Amazon SageMaker — Autopilot Completion & Metrics

But, a click on ‘open candidate generation notebook’ and immediately we can see the target objective, exactly which models and transformation are going to be used.

** tip — import and save the candidate definition and data exploration notebooks in Studio as I had difficulty finding them again later **

Autopilot’s auto-generated Python Notebook.

The hyper-parameter tuning ranges can also be seen, as can the transformation strategies for each model.

All of these can be adjusted, and you can step through the full process from the notebook by selecting import notebook and the data science kernel.

However, things get a little awkward if your Studio session times-out, and I think there’s still some work to make this a truly stand-alone GUI.

H20.ai Driverless AI

Moving from Google to AWS, we gained an increase in visibility and control at the cost of ease of use. Amazon’s offering in particular is a more developer friendly user interface.

Moving from Amazon on to H20.ai, we have a much more polished, data scientist friendly user interface. I tried the cloud version — ‘aquarium’ (running on AWS), creating a new project and loading the same banking data set into it.

Data visualisations on the source banking data csv were immediately available via ‘AutoViz’.

Without reading the built in instructions, or following any tutorials, I was able create a train/test split in one click, then select the target ‘deposit’ column.

H20.ai Driverless AI — Setting Up an Experiment

The training settings are intuitive and on the left you can see the various trade-offs in models, feature engineering strategies that correspond to your choices.

So it’s fair to say, this is a very intuitive interface.

It’s also a lot of fun to watch as it immediately kicks off at an AUC score of over 0.90, taking us immediately to a similar performance to Kaggle notebook.

After about 30 minutes, the experiment completed. With the final scores — AUC 0.9295, Accuracy 0.8646, F1 0.8619, this is lower than achieved via Google AutoML, but higher than SageMaker’s Accuracy and better results than the Kaggle notebook.

This was by far my favourite interface, providing the simplicity of Google’s AutoML with the power of Amazon’s customisation.

Other AutoML Services

Microsoft Azure provides many customisable AI services: MCS Speech-to-Text, LUIS, QnA, Speech Translation, Form Recogniser, Anomaly Detector, Personaliser, Speaker Recognition, Custom Vision and Video Indexer.

I’d categorise all of the above as being ‘Closed’ AutoML.

Microsoft’s AutoM is an ‘Open’ AutoML solution which can handle classification, regression and time series forecasting.

IBM AutoAI is also an ‘Open’ AutoML solution, which includes support for auto generated Python notebooks.

In addition to established AI players there are also newer players such as Kortikal providing ‘Open’ Auto ML.

Conclusion

AutoML is an emerging technology and as it improves it will have significant implications for how AI is consumed as the second wave of AI adoption reaches new industries and use cases.

AutoML will have an impact on how businesses recruit for AI, even beyond data and development teams. For example, in my last article, “AI Product Managers — Evolution or Revolution”, I include AutoML as contributing to how Product roles will change.

I’d expect to see the number of AutoML versions of cognitive APIs growing. I’m also hoping for improvements to the initial AutoML offerings from Microsoft, AWS and others.

The scope of AutoML may expand also, automating more of the deployment and in-life management on one side, integrating more seamlessly with data sources on the other.

‘Open’ AutoML looks more likely to appeal to those actively working in machine learning and also seems much more likely to address real-world business needs than ‘Closed’ AutoML.

It could be the case that if automation becomes sufficiently advanced, the need for the kind of visibility and control provided by ‘Open’ AutoML will lessen and we’ll see ‘Closed’ AutoML being adopted by enterprises.