Custom Images and Monitoring for R Models with Watson Machine Learning & Watson OpenScale

Carolyn Saplicki
IBM Data Science in Practice
5 min readJan 26, 2023

Co-authored by Carolyn Saplicki, Senior Data Scientist, and Courtney Branson, Advisory Data Scientist

Many businesses today use the R programming language to build their machine learning models. Currently, R model deployments are not natively supported by WML. To use the R model on WML, you have to wrap your R script within a Python wrapper function. This function can then utilize the R script to evaluate new data on your R model and return the output to the Python function. We first learned about this capability of WML from Randy Phoa’s blog, Deploying R models in online mode on Cloud Pak for Data. In our blog, we will discuss how Randy’s work can be extended by installing custom packages for a model and monitoring the model with Watson OpenScale.

WML Deployment

Python Function

The Python function used in this project had 3 steps:

  1. Download all required assets (R model, R Script) from the deployment space
  2. Install all required packages (ranger, etc.)
  3. Score Function:
  • Collect the json input from the payload
  • Send the input and other required assets to the R Script
  • Return the predicted output as a json

R Script

The R Script utilized for this project:

  1. Accepts the input json payload
  2. Formats the data
  3. Loads the R model from the deployment space
  4. Computes a prediction for the payload
  5. Returns the output as a json

R Model

The R model we will create an online deployment for is a random forest model built primarily using R packages called caret and ranger. But, there were some custom packages used as well. These are not available within RStudio in Watson Studio. In order to successfully train the random forest model, we utilized a custom image in RStudio.

WML Custom Software Specification

The default Python function software does not come with R. For some R scripts, you can simply install R and other libraries. The R version would have been the current default, which at the time of writing this was 3.5.0. This may be sufficient for some projects. However, to get our R script to work within WML with all of its respective dependencies, we had to utilize a custom software specification.

In this project, a custom image was necessary because we needed a specific version of R that was not offered in order to install our dependencies such as ranger, caret and custom packages. You can find more information on creating your own custom software specification here.

In order for the function to work, you must ensure the R scripts and any other necessary assets are available in the deployment space as shown below. The Python function downloads these assets and runs the R script, resulting in a model prediction output.

Architecture Structure

Below is the architecture structure used to deploy a Python wrapper function in R on WML and how it connects to Watson OpenScale. We utilized a WML custom image to ensure all neccesary packages were present during inference. We set up automatic payload logging so that all requests that are sent to the deployed python wrapper function will be logged and monitored in OpenScale.

If you are scoring a large number of payloads at one time or if you are performing explanations which require a large number of perturbations to be sent to the endpoint at once, you may run into this issue.

This happens because each payload in the batch of payloads sent is its own argument variable. R has limits to how many arguments can be sent to it. So, when you send many at one time, you may receive an error like the one above. While these limits can be overwritten, it may require access and permissions that you do not have.

We were able to solve this problem by writing all payloads passed into the Python function into a temporary file. We then pass this file to the R script. The R script then opens the file and is able to read and score the individual payloads. This way there is only one payload argument being sent that has access to all of the payloads being requested, allowing us to utilize the explainability monitor on Watson OpenScale.

Watson OpenScale

Once the Python function has been successfully deployed to WML, you can then configure Watson OpenScale to monitor the deployment. Fairness, quality, drift and explainability monitors can all be configured the same, as it would for a regular Python function that does not use an R model. An example of how to configure these monitors can be found here. In order for your Python function to be compatible with Watson OpenScale, you must ensure that your inputs and outputs are in the format outlined below. Example inputs/outputs can be found here.

It is important to note that the training data sent into the monitor configurations matches the training data used for the model. Python functions have less safeguards than Python models, regarding what can be included in a function connected to Watson OpenScale. So, it’s important to double check your training data and Python function return format.

#DataScience #MachineLearning #R #CloudPakforData #WatsonMachineLearning #WatsonOpenScale #RandomForest

--

--