Managing and Deploying High-Performance Machine Learning Models on GPUs with RAPIDS and MLFlow

John Zedlewski
Jun 24, 2020 · 7 min read

RAPIDS and cuML use NVIDIA GPUs to accelerate the training of machine learning models — by up to 45x in the case of random forests, over 100x for support vector machines, and up to 600x for k-nearest neighbors. These speedups can turn overnight jobs into interactive jobs, allow exploration of larger datasets, and allow users to save time on prototyping. But possibly the most exciting super-power of GPU-accelerated machine learning is that it enables data scientists to iterate much faster, trying dozens of model variants in the time it would have previously taken to train a single model.

Faster iteration leads to many more models, and that can lead to confusion and complexity if you track your efforts manually. MLFlow is an open source model lifecycle management framework that simplifies the process of tracking, comparing, and deploying models. It can integrate with most machine learning frameworks and several cloud platforms.

Figure 1: Components of MLflow

In this blog, we’ll walk through a simple RAPIDS cuML model and show how we can integrate it with MLFlow to make our lives easier.

After integrating with MLFlow, data scientists will often want to incorporate automated hyperparameter optimization (HPO) to tune models with no intervention. To learn more about building an HPO pipeline with hyperopt, RAPIDS, and MLFlow, see the follow-up post on the RAPIDS blog.

Starting with a Basic Model in RAPIDS cuML

RAPIDS cuML provides the same Python APIs as the popular scikit-learn library, but with a GPU-accelerated backend to speed up model training and inference. Similarly, cuDF provides an accelerated backend for Pandas APIs. For this example, we’ll build a trivial model using a dataset of flight departures. We’ll try to predict whether or not a flight will be delayed (ArrDelayBinary in the dataset) based on other metadata about the flight. We can do this using a Random Forest classifier, which will build a large number of varied decision trees to collectively vote on whether they expect a flight to be delayed. (For more on Random Forests, see this more detailed blog.)

# See the full example at:
# https://github.com/rapidsai/cloud-ml examples/tree/master/mlflow_project/src
from cuml import RandomForestClassifier
from cuml.metrics import accuracy_score
# Loads data via cuDF and splits with cuml train_test_splitX_train, X_test, y_train, y_test = load_data(fpath)mod = RandomForestClassifier(max_depth=max_depth,
max_features=max_features,
n_estimators=n_estimators)
preds = mod.predict(X_test)
acc = accuracy_score(y_test, preds)

This looks almost identical to the scikit-learn-based code, with only imports being changed. One key difference is that it will run 10x-45x faster, depending on the dataset size and parameters.

While random forests can be solid classifiers right out of the box, they have several parameters that can be tuned to improve accuracy. How deep should the trees be allowed to grow? Deeper trees can express more complex functions but may lead to overfitting of the data. How many trees should we build? When we split a node in the tree, should we consider all of the features or a random subset? Like many data scientists, we may start by manually trying a few variants of the model.

Tracking Models with MLFlow

Once we start experimenting with model variations, it’s easy to lose track of what we’ve tried. Are we sure we can reproduce that best model? What was the good value of n_estimators again? Even in a single work session, it can be easy to lose track, but the problem grows as more team members iterate on a model over days or months.

MLFlow is designed to help solve these lifecycle and tracking issues. It provides hooks to snapshot your training code, your metrics, and your trained models. You can query any of these results later and pull down the best model to deploy for production. MLFlow works with RAPIDS cuML out of the box. Because cuML estimators match the scikit-learn API, the features of the mlflow.sklearn the module automatically works with cuML models as well.

We’ll want to encapsulate the training inside of an MLFlow run, then store the hyperparameters, the metrics, and the model itself. You can install MLFlow locally with a simple conda install -c conda-forge mlflow or pip install mlflow. Examples in this blog were tested with version 1.8.0.

X_train, X_test, y_train, y_test = load_data(fpath)mod = RandomForestClassifier(max_depth=max_depth,
max_features=max_features,
n_estimators=n_estimators)
mod.fit(X_train, y_train)
preds = mod.predict(X_test)
acc = accuracy_score(y_test, preds)
mlparams = {"max_depth": max_depth,
"max_features": max_features,
"n_estimators": n_estimators}
mlflow.log_params(mlparams)
mlflow.log_metric("accuracy", acc)mlflow.sklearn.log_model(mod, "saved_models")

See this repo for the full example, including startup code.

We’re inserting just a few additional lines. The mlflow.log_metric("accuracy", acc) call is a typical MLFlow addition — it stores an arbitrary key=value style metric for the model. You can log as many metrics as you want for a given training run. For instance, you may want accuracy, precision, and recall. We also recorded the parameters used with mlflow.log_params, and this will allow us to find the parameters used to train the best model in the future.

Maybe the most critical line is the call to mlflow.sklearn.log_model(mod, "saved_models"), which adds the trained model to MLFlow’s saved model repository. This is essentially a database of saved models so you can extract models again for further analysis or production deployment. It can be as simple as a local filesystem with past models, or a full-fledged model registry (see more on model registry options here). The mlflow.sklearn model logging API uses Python’s standard protocols to serialize the model via CloudPickle. cuML models support the same serialization and deserialization protocols so that they can be logged as well.

Before launching the training script, we’ll add an MLProject file in our directory. This is a simple yaml document describing the model, its parameters, and how to run it. The MLProject file provides a standardized entry point so that MLFlow tools know precisely how to run and configure the model training. This standardization becomes particularly valuable as your organization builds more and more models and shares them across multiple data scientists. In our sample MLProject file, we’ve defined both a “simple” MLFlow model and a more complex one that includes hyperparameter optimization.

Now we can run the model and specify its parameters with:

mlflow run . -b local -e simple \
-P conda-env=$PWD/envs/conda.yaml \
-P fpath=https://rapidsai-cloud-ml-sample-data.s3-us-west-2.amazonaws.com/airline_small.parquet \
-P n_estimators=50 -P max_features=0.5 -P n_estimators=10

All the key=value parameters passed with the -P flag will show up as command-line arguments to the underlying script.

After you’ve run a few example experiments, you can browse the results in the MLFlow UI. Each model shows the experiment metadata as well as key metrics. We can select the best model and view the parameters, command line, and conda environment used to train it. We can also choose a model that we’ll deploy to production.

To launch the UI, just run mlflow ui in your shell, and it will start a local server on port 5000 by default.

Browsing trained models and metrics in MLFlow UI

Deploying the Model

Training, of course, is only half the battle in machine learning. We ultimately want to use this model to predict results from new input data. Often this step will be developed by a different engineer or a different team entirely, comprised of data scientists who have experienced the struggle of managing models across multiple groups and who may be unfamiliar with each other’s APIs and model lifecycles. MLFlow supports various workflows to facilitate these use cases.

The first, a simple artifact style storage/retrieval pattern, allows a user or group of users to save, search, and load models programmatically using the MLFlow tracking server. This makes it possible for a well structured, and consistent design process, which is fully reproducible by other data scientists, or engineers tasked with moving these models into production. As RAPIDS provides a SKLearn compatible API, we can leverage the mlflow.sklearn module to save and restore models.

Save a Model

import mlflowfrom cuml.ensemble import RandomForestClassifiermodel = RandomForestClassifier()# Store an existing RAPIDS model# Here, conda_env allows us to attach a conda environment
# specification to a given model, that provides an explicit
# declaration of the environment where it was trained, and allows us
# to deploy it later.
mlflow.sklearn.log_model(model, "rapids_rf_model",
conda_env='conda.yaml')
# Retrieve and load a model
model_name = "rapids_rf_model"
client = mlflow.tracking.MlflowClient()
model_data = \
client.search_model_versions("name='rapids_mlflow_test'")[0]
model = mlflow.sklearn.load_model(model_data.source)
# Make predictions
preds = model.predict(X)

The second workflow sets up a fully functional REST-style server for our model, managed by MLFLow. For this, we simply need to identify the model_uri, which we would like to deploy and launch the REST service via the MLFlow CLI. This will cause MLFLow to create a conda environment matching the ‘conda.yaml’ we supplied along with our model when it saved, load the model, and serve from its ‘predict’ method. This process provides a well-specified process for testing models via Continuous Integration (CI), as well as a potential path forward for integrated production deployment and updates.

mlflow models — serve -m [PATH TO MODEL] -p [PORT] -h [HOST]

Once our model is up and running, we can query the REST service as follows:

import json
import requests
headers = {
"Content-Type": "application/json",
"format": "pandas-split"
}
# Some arbitrary sample data
data = {
"columns": ["Year", "Month”, "DayofMonth", "DayofWeek", "CRSDepTime", "CRSArrTime", "UniqueCarrier", "FlightNum", "ActualElapsedTime", "Origin", "Dest", "Distance", "Diverted"],
"data": [[1987, 10, 1, 4, 1, 556, 0, 190, 247, 202, 162, 1846, 0]]
}
resp = requests.post(url=f"http://{host}:{port}/invocations",
data=json.dumps(data), headers=headers)
print(f'Classification: {"ON-Time" if resp.text == "[0.0]" else "LATE"}')

Wrapping Up and Additional Resources

This blog just scratches the surface of cuML and MLFlow. The code samples for this blog are found in the RAPIDS cloud-ml-examples repo. That repository contains several other examples of hyperparameter optimization and model management on platforms like Ray Tune, AWS SageMaker, Azure ML, and Google Cloud AI.

Also, John Zedlewski gave a talk on the integration of RAPIDS cuML and MLFlow at Spark+AI Summit on June 24, 2020, at 11 AM PDT. Get the details here.

As always, if you have feature requests or any problems, please raise an issue on the cloud-ml-examples repo or reach out to our RAPIDS Slack channel.

RAPIDS AI

RAPIDS Everywhere