Google AutoML Tables vs XGBoost
Google is the prototypical machine learning company. As a young and hungry startup, they first used “statistical modelling” to correct the spelling of search terms, and they’ve never looked back. These days, they even use machine learning to invent the new the machine learning, mechanising the invention of new deep neural network topologies that achieve state-of-the-art results in terms of accuracy, while often reducing total model size. If you’ve trained any deep neural networks at all, you’ve likely used Google’s Tensorflow framework — not bad for an advertising firm!
So, when such a legendary force in machine learning announced a new product — AutoML — it wasn’t a matter of if I would try it out, but when. My first opportunity came as my team reached our “feature complete” milestone on a machine-learning based microservice we are developing for a non-profit.
The training set for that project is structured or tabular data — close to the sort of thing you might find stored as a table in a database. Since I already had code to train an effective model using, this was a perfect chance to compare “traditional” (models from open-source libraries tune using grid search) to “cutting edge” — Google’s AutoML Tables.
Getting started was super-easy — just upload a CSV of my data to a Google Cloud bucket, something AutoML Tables has UI support for, keeping the workflow simple.
Data import was slick and accurate, with numeric and categoric columns handled automagically. All I had to do was select the target column (“correct_location”), and I was ready to click the enticing blue “TRAIN MODEL” button. Data import will be even easier if you are already using GCP.
Model training comes in 1-hour increments, and at the time of writing, costs $19.32 per hour — this is paying for a cluster of 92 virtual machines (n1-standard-4) which will team up for an hour to get the best possible results.
An hour later, my model was ready, and a very presentable dashboard was full of good news — acceptable F1 score, AUC, log loss. The model itself ways in at a comparatively chunky 235MB.
While my 92 node cluster was doing its Google-Fu, I also rented a tiny VM to run our traditional, artisanal, hand-crafted non-Bayesian-optimised training code — a single n1-standard-1 machine, which is a single Haswell core with just under 4GB of memory, renting for just $0.04 per hour. Our code loaded the same CSV, then ran a grid-search, our brute-force hyperparameter tuning, training 108 XGBoost models (gradient boosting trees are highly recommended for structured or tabular data). Our 1-core minnow completed its work in about 15 minutes, crossing the finish line a healthy 45 minutes ahead of the cluster, and costing us just $0.01 of compute. The finished model was about 1.3MB in size and scored a log loss of about 0.2, significantly better than the AutoML Tables results of 0.36 (for log loss, a larger score is worse, indicating less accurate class probabilities).
Of course, training a model isn’t much use unless you deploy it. AutoML tables really shines in this area — with a few clicks, you can batch-process data from a Google Bucket, or deploy your model as a REST service. There’s definitely a big difference between this and our hand-crafted human-dockerized Flask app.
It’s an exciting time for machine learning engineering. For our specific customer, cloud deployment isn’t where we will deploy, but it’s inspiring to see how easily we could have done so. However, AutoML Tables, at least for this dataset, definitely disappointed in terms of model quality, model size, and time train, all distinctly worse than our fairly bare-bones baseline.