Faster AutoML with TPOT and RAPIDS

Nick Becker
Published in
6 min readNov 5, 2020


By Nick Becker, Dante Gama Dessavre, and John Zedlewski

TPOT, one of Python’s most popular Automated Machine Learning libraries, is now accelerated with RAPIDS cuML and DMLC XGBoost. TPOT users with access to an NVIDIA GPU can accelerate their AutoML pipelines with the new, natively supported “TPOT cuML” configuration by changing only one parameter in their code in the newly released version 0.11.6.

Yes, GPU-powered AutoML is faster. So, no surprises there. But it’s also more accurate and less expensive. GPU-accelerated modeling allows AutoML libraries like TPOT to find a higher accuracy model pipeline within a given time constraint.

In this post, we walk through a case study using TPOT with GPUs to dramatically speed up AutoML classification pipelines on two public datasets. In our experiments, GPU-accelerated TPOT was:

  • More accurate, finding pipelines that were 2% and 1.3% higher accuracy respectively after eight hours
  • Faster, finding a pipeline with higher accuracy after one hour than the CPU pipeline after eight hours.
  • Less expensive, due to the need to only run TPOT for a fraction of the time to get higher accuracy

To put 2% higher accuracy into perspective, imagine if your credit card company’s fraud detection models were leaving 2% accuracy (or recall) on the table. That’s potentially millions of dollars of fraud, headache, and stress per year that could have been avoided.

These are exciting results. Before we dive in, let’s talk about AutoML and TPOT.


Automated Machine Learning is an approach to predictive modeling that relies on algorithmic techniques to find the most powerful combination of feature engineering and machine learning models for a given problem. With AutoML, the more feature and model combinations you test, the better your results. However, this means we’re training far more than just a single model, a more compute-intensive task than basic ML use cases. Unfortunately, that can make it both time consuming and expensive.

As a result, it’s common for data scientists and engineers to define a computational budget when using AutoML tools for a given experiment. For example, I might want to find the best model I can within eight hours of training time or $50 of cloud instance costs. Testing thousands of complex modeling pipelines on medium-to-large datasets becomes impractical on any reasonable time budget. These constraints often materially affect the quality of results, forcing data scientists to build and deploy inferior models.


TPOT is one of the most effective AutoML libraries in the computing world. The ideal data science assistance, TPOT delivers a user-friendly interface, integration with scikit-learn and Dask, and uses genetic algorithms to efficiently prune out ineffective modeling pipelines. But, it still faces the fundamental bottleneck of computationally expensive modeling pipelines.

Since these pipelines are compute-bound, GPUs have enormous potential to make these problems tractable.

Case Study: The Impact of GPU-Acceleration

To test the impact of GPU-acceleration, we ran a series of timeboxed experiments using 500,000-row samples from two canonical publicly available machine learning classification datasets (Higgs Boson and Airline delays) [1]. TPOT provides timeboxing of experiments via the max_time_mins variable, which stops fitting any additional pipelines after the specified time limit. Note that the actual time elapsed varies, as TPOT on the CPU can take a long time to finish training existing pipelines if they are large.


To approximate example real-world time constraints, the target experiment times ranged from 1 hour to 8 hours. We configured TPOT to use five-fold cross-validation and set a population size of 30 (large enough to almost fully saturate all of the available CPU resources, but not so large we’d be waiting days for informative results). We ran these experiments on a system with dual Intel Xeon Platinum 8168 CPUs and one NVIDIA V100 GPU.


Accelerated TPOT found pipelines with 2% higher accuracy on the Higgs Boson dataset and 1.3% higher accuracy on the Airlines dataset. These are huge increases in accuracy for the same time budget. For both dataset samples, the GPU-accelerated configuration achieved higher accuracy in one hour than the default achieved in eight hours. Instead of spending the day waiting for results, data scientists can get started on their next experiment.

How is this possible? TPOT is now able to evaluate many more individual pipelines per experiment, helping it find a higher accuracy pipeline much faster. It’s not that TPOT entirely on CPUs wouldn’t have gotten there eventually, but that the computational and time cost involved make it impractical with even 500,000 rows.

The following graphs illustrate the scale of the impact. Accelerated TPOT is able to test up to 5x more pipelines in some of these experiments.

But it’s not just about the raw number of pipelines evaluated. Because TPOT uses genetic algorithms to find quality pipelines, later generations usually involve the more complex pipelines that lead to better results. GPU-accelerated TPOT is able to evaluate so many more pipelines even while pipeline complexity is increasing. Looking at the final pipeline from the eight-hour experiment on the Airlines dataset makes the impact crystal clear.

The final pipeline for Default TPOT achieved 87.2% cross-validation accuracy. The final pipeline for GPU TPOT achieved 88.5% accuracy, a 1.3% increase. In line with expectations, the resulting GPU-accelerated pipeline is more complex, highlighting the impact of evaluating more pipelines.

TPOT Default Final Pipeline achieving 87.2% accuracy:


TPOT GPU Final Pipeline achieving 88.5% accuracy:

Pipeline(steps=[(‘zerocount-1’, ZeroCount()),
(‘selectpercentile’, SelectPercentile(percentile=43)),
(‘pca’, PCA(iterated_power=5, random_state=12,
(‘zerocount-2’, ZeroCount()),
(‘xgbclassifier’, XGBClassifier(alpha=1, base_score=0.5, booster=’gbtree’, colsample_bylevel=1, colsampl…
importance_type=’gain’, interaction_constraints=’’,
learning_rate=0.5, max_delta_step=0, max_depth=9,
min_child_weight=3, missing=nan,
n_estimators=100, n_jobs=1, nthread=1,
num_parallel_tree=1, random_state=12, reg_alpha=1,
scale_pos_weight=1, subsample=1.0,
tree_method=’gpu_hist’, validate_parameters=1,

Cost Reduction

In these experiments, we got better results in one hour with accelerated TPOT compared to nine hours (the actual time elapsed) with TPOT Default. In terms of AWS cloud instance pricing equivalents, we’d want to compare one hour on a p3.2xlarge instance ($3.06 per hour) with nine hours on an m5.16xlarge instance ($3.072 per hour). In these two experiments, GPU-accelerated TPOT can be roughly 9x cheaper in total cost and return a better model.

Getting Started

Getting started is easy. All you need to do is pass “TPOT cuML” to the config_dict argument of your TPOTClassifier or TPOTRegressor instead of leaving it as None. Make sure to leave n_jobs=1 (the default).

tpot = TPOTClassifier(
config_dict=”TPOT cuML”,
), y)

The TPOT Github repository also includes brief examples demonstrating classification and regression. If you have cuML installed in your environment and access to a GPU, the rest will take care of itself. We recommend installing XGBoost, too.

Oh yeah, and it scales with Dask for multi-GPU and even multi-node if you need even faster results or want to test even more pipelines per experiment. Just spin up a Dask-CUDA cluster, install dask-ml via conda or pip, and pass use_dask=True to your TPOTClassifier or TPOTRegressor.


In the past, the computational requirements for AutoML have made it impractical for many industries and scientific datasets. Going forward, this is no longer the case.

TPOT is already one of the most effective AutoML tools in the world, and we’re excited to bring GPU-acceleration to the data scientists and engineers who use TPOT — and those who are hearing about it for the first time here. With better, faster, and less expensive AutoML, you can be confident you’re building the best model with your dataset.

We’re only just getting started. Join the movement. File a feature request or contribute a pull request on Github for cuML, XGBoost, or TPOT. Want to get started with accelerated AutoML? Check out the RAPIDS Getting Started webpage, with links to help you download pre-built Docker containers or install directly via Conda. Then just pip install tpot and you’re off to the races.

— — — — — — — — — — — — — -