Published in


Optimizing Machine Learning Models with Hyperopt and RAPIDS on Databricks Cloud

Parallel coordinates plot from a hyperopt experiment

What is Hyperparameter Optimization?

max_depth=5, n_estimators=50
max_depth=5, n_estimators=100
max_depth=5, n_estimators=150
max_depth=14, n_estimators=500

Getting started with RAPIDS and Hyperopt on Databricks

Instance types with GPU support
dbfs configuredbfs cp src/rapids_install_cuml0.13_cuda10.0_ubuntu16.04.sh dbfs:/databricks/init_scripts/
Configuring the init script

Integrating with Hyperopt

  1. Define your objective function,
  2. Define your parameter space,
  3. Create a SparkTrials object, and
  4. Launch hyperopt with the “hyperopt.fmin” function.

1. Define your objective function

def train_rapids(hyperopt_params):
Pseudo training function
1. Unpack hyperopt_params
2. Train a new model with those hyperopt_params
3. Compute accuracy as ‘acc’ on validation set
# …. function body here ….
return {‘loss’: acc, ‘status’: STATUS_OK}

2. Define your parameter space

search_space = [
hyperopt.hp.uniform(‘max_depth’, 5, 20),
hyperopt.hp.uniform(‘max_features’, 0., 1.0),
hyperopt.hp.uniform(‘n_estimators’, 150, 1000)

3. Create a “SparkTrials” object

import hyperopt
spark_trials = hyperopt.SparkTrials(parallelism=MAX_PARALLEL)

4. Launch hyperopt with the hyperopt.fmin function

results = hyperopt.fmin(fn=train_rapids,
Runs view sidebar
Graphing in the Experiment UI

Wrapping Up



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store