AutoML — Not Just for Experts Anymore

Adam Blum
@auger
Published in
3 min readNov 5, 2018

The genesis of automated machine learning was from experts who spent weeks on a prediction or classification problem trying various algorithms and hyperparameters until an optimal result was find. Instead of manually selecting each prediction algorithm and choosing hyperparameters for the algorithm. For example, for random forest the number of observations for each tree, the splitting rule within each tree, and the number of trees.

This was the motivator of projects like TPOT and AutoSklearn: tools to help data science experts be more productive. Auger.AI certainly shared this heritage. We are a team of machine learning experts who found that algorithm selection and tuning were dominating our efforts in applying machine learning to various problems. We built Auger as a “power tool for prediction” (like an auger for auguring, arg, sorry).

That said, since we launched Auger a few months ago, we found that some of the biggest benefits were to novice machine learning users. These users just don’t have great instincts on what machine learning algorithms to use for their problem. Typically they usually have applied some statistical method to perform basic prediction. Often it was linear regression inside a spreadsheet. They may have tried some newer algorithm like a Support Vector Machine to eke out a little more accuracy. But selecting a state of the art algorithm based on the characteristics of their own dataset (numbers of observations, features, distribution within those features) is not something they were equipped to do. Each algorithm has its own unique settings. Its unreasonable for a novice user to even know what all of these settings are for every algorithm. Let alone choose the right values for those settings for their unique problem.

As machine learning gets more widely deployed it is no longer acceptable to have just a “good enough” prediction model. In most industries your competitors are using machine learning models. If you are, for example, competing with other websites in some particular market, your ML-driven homepage optimization (something Auger has been used for successfully) must be better than your competitor for you to get the most conversions and dominate the industry. In any kind of zero-sum competitive prediction races, such as options or futures forecasting driving trading, the best machine learning model will win. Suboptimal models on the other end of those trades will lose.

Automated machine learning tools allow the novice data scientist to converge quickly to a good algorithm, and the best hyperparameter settings for that algorithm. One outstanding issue is that these tools are generally expected to be integrated into a larger machine learning pipeline by a trained developer. By contrast, Auger offers a friendly model manager to view and evaluate the features of your model. Just upload your CSV, click “Run” and start watching resulting models on your Leaderboard.

There are other “machine learning model managers” (from Microsoft, Amazon and others). But none of them offer “smart search” of machine learning algorithm choices and their options. There are a few “grid search” AutoML players with a visual model manager. The topic of why “grid search” will always yield poorer results than some form of smart search is the topic of another blog post here.

If you are a novice data scientist trying to choose an algorithm for your predictive problem, you owe it to yourself to try an AutoML tool to make an intelligent choice. Either a developer-oriented tool like TPOT or AutoSKLearn. Or one with a visual model manager like Auger.AI.

--

--

Adam Blum
@auger
Editor for

CTO of Empath— Technical co-founder, dad, author, ultramarathoner. Building ML products in four different decades…