Democratize Artificial Intelligence with Automated Machine Learning
This blog post is authored by Francesca Lazzeri (@frlazzeri)
Web Summit is the annual event that connects the technology community with all industries, both old and new, bringing together Fortune 500 companies, groundbreaking startups and world-class speakers. It has grown to become the largest technology conference in the world with 70,000 attendees and over 1200 speakers.
At Web Summit, I taught a workshop on Automated Machine Learning, which is a new Azure Machine Learning service able to automatically pick an algorithm for you, and generate a model from it. Automated Machine Learning helps save you time by using the parameters and criteria you provide to select the best algorithm for your model.
The Machine Learning Process
When companies identify a problem that can be solved through machine learning, they meet with their data scientists and analysts to create a predictive analytics solution.
There are multiple teams that get involved to build the solution:
- Data engineering team works on data acquisition and preparation.
- Data scientists focus on experimentation and optimization of models.
- DevOps teams own the development environment, tooling, and hosting the inference models in production.
After retrieving data, and running some data pre-processing and cleaning, your team finds itself in front of 3 big questions:
- Which features?
- Which algorithm?
- Which parameters?
Should they use in order to build the most accurate model?
Understanding the predictive power of a set of features with respect to a dependent variable is a tricky problem, and there is no universal metric which can tell you exactly that. Your data science team can eventually use an approach called Correlation-based Feature Subset Selection which evaluates a set of features based on its correlation with respect to the dependent variable. But still this approach can take some time.
The answer to the question “Which Algorithm” is always “It depends.” It depends on the size, quality, and nature of the data. It depends on what you want to do with the answer. And it depends on how much time you have. Even the most experienced data scientists can’t tell which algorithm will perform best before trying them.
Finally, the answer to the question “Which Parameters” is also very challenging. Manual parameter setting and searching for optimal parameter values based on learning and experience can be very time-consuming.
Things in machine learning are repeated over and over, and hence machine learning is iterative by nature. The reason for it being so complex is very clear, since a large amount of complex data is involved and out of which we try to find out meaningful predictive patterns and models.
Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. There is simply no substitute for understanding the principles of each algorithm and understanding the system that generated your data. Every machine learning algorithm has its own style or inductive bias. For a specific problem, several algorithms may be appropriate, and one algorithm may be a better fit than others. But it’s not always possible to know beforehand which is the best fit.
What if a developer or data scientist could access an automated service that identifies the best machine learning pipelines for their labelled data?
Automated Machine Learning is a new capability that does exactly that. It is now in preview, accessible through the Azure Machine Learning service. It empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for any problem.
Automated Machine Learning is based on a breakthrough from our Microsoft Research division. The approach combines ideas from collaborative filtering and Bayesian optimization. It’s essentially a recommender system for machine learning pipelines.
Most importantly, Automated Machine Learning is designed to not look at the customer’s data. Customer data and execution of the machine learning pipeline both live in the customer’s cloud subscription (or their local machine), which they have complete control of. Only the results of each pipeline run are sent back to the Automated Machine Learning service, which then makes a probabilistic choice of which pipelines should be tried next.
How Automated Machine Learning Works
With Automated Machine Learning you can configure the type of machine learning problem you are trying to solve. Two categories of supervised learning are supported:
Then you need to specify the source and format for the training data. The data must be labeled, and can be stored on your development environment or in Azure Blob Storage. If the data is stored on your development environment, it must be in the same directory as your training scripts. This directory is copied to the compute target you select for training.
For simplicity, in this blog post I will show how to train a model to classify handwritten images of digits (0–9) from the MNIST dataset. But this time you don’t to specify an algorithm or tune hyperparameters. Automated Machine Learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion. For more detailed information about this tutorial, please read here.
To automatically train a model, first define configuration settings for the experiment and then run the experiment:
Start the experiment to run locally. Define the compute target as local and set the output to true to view progress on the experiment
On of my favorite features is that you can use the Jupyter notebook widget to see a graph and a table of all results:
You can then retrieve all iterations and view the experiment history and individual metrics for each iteration run:
You can use the local_run object to get the best model and register it into the workspace:
Microsoft is committed to democratizing Artificial Intelligence through our products. By making Automated Machine Learning available through the Automated Machine Learning service, we’re empowering data scientists with a powerful productivity tool.
- https://aka.ms/AutomatedML To learn more about Automated Machine Learning
- https://aka.ms/AutomatedMLDocs To see more information about Automated Machine Learning tutorial
- https://github.com/Azure/MachineLearningNotebooks/tree/master/automl This is our official Automated Machine Learning GitHub Repo
- https://aka.ms/AMLServices To learn more about Azure Machine Learning Services
- https://www.globalaibootcamp.com/ The Global AI bootcamp is a free one-day event organized by local communities all over the world that are passionate about Artificial Intelligence on the Microsoft stack and Automated Machine Learning
- If you have any questions, please feel free to contact us at AskAutomatedML@microsoft.com