IBM Cloud Functions with R

Published in

IBM Data Science in Practice

5 min readJun 9, 2021

a stylized f on a circle over a background

Serverless computing is a useful resource that you can execute in the cloud. It combines financial savings, reduced management time, configuration and ease to deployment/scale. This means that a data scientist can work faster and smarter while building a machine learning pipeline, such as providing a prediction service. The Function-as-a-Service (FaaS) platform in IBM Cloud is a resource to execute code on demand with HTTP-based API requests. The IBM Cloud Functions is based on the Apache OpenWhisk open source project.

Today it is possible with any programming language to create IBM Cloud Functions. For some languages, such as Python, Node, Go, and others, the platform offers a default runtime and creating the serverless function has fewer steps. Many data scientists believe that it is not possible to develop or it may be quite complex for programming languages without this support. This article will show how it is possible to create an IBM Cloud function with the R language.

Configuration

To run a function with a language that is not supported by IBM Cloud Functions, you will need to configure an exec file. In the cloud, the function will run on a Docker container whose image is the openwhisk/dockerskeleton. The apk is an Alpine Linux package management, so you be able to add Linux dependencies in your container. In this project, beyond the R and system dependencies, it is also necessary a package to handle with json (e.g. jsonlite). Because, in the function, both input and output should be in this format. The main structure of exec is: install the system dependencies and R packages, get input and save as json file, and finally run a R script as executable.

#!/bin/bash# install R
apk update && apk add R R-dev R-doc build-base# install package
R -e "install.packages('jsonlite', repos = 'http://cran.rstudio.com/')"# run R script
chmod +x script.R # turn executable
echo "$@" > input.json # get input
./script.R # run script

To begin with a minimal to understand about the configuration I created a toy example to compute the function A = s².

The script.Ris set as:

#!/usr/bin/env Rscript# get input
input <- jsonlite::fromJSON("input.json", flatten = FALSE)input <- as.numeric(input)# function
A <- function(s) s^2# output (it is important set the output as JSON)
jsonlite::toJSON(list(side = input, area = A(input)), auto_unbox = TRUE)

Deploy the function in IBM Cloud

It is necessary to package the files exec and script.R with zip . The argument --native means that the function will start with a default container openwhisk/dockerskeleton .

# login
ibmcloud login
ibmcloud target --cf# zip
zip -r raction.zip exec script.R# deploy
ibmcloud fn action create raction raction.zip --native

Testing the function

In order to test the function deployed you can make an invoke and set the param that you want to submit.

ibmcloud fn action invoke raction --result --param s 8

Using a custom Docker to deploy

It is also possible to use a custom container to deploy your function. You need create a Dockerfile, build and push(in the Docker Hub only).

# dockerfile
FROM openwhisk/dockerskeleton
RUN apk update && apk add R R-dev R-doc build-base
RUN R -e "install.packages('jsonlite', repos = 'http://cran.rstudio.com/')"# build
docker -t th1460/raction .# push
docker push th1460/raction

I modified the exec file because the step to install R, Linux dependencies and R libraries was executed in the Docker build.

#!/bin/bash# run R script
chmod +x script.R # turn executable
echo "$@" > input.json # set input
./script.R # run script

To deploy the functions you need to indicate the repository from Docker Hub, so you have to replace --native by --docker th1460/raction.

ibmcloud fn action create raction raction.zip --docker th1460/raction

This approach was efficient to reduce the time to build in the request of the function. In speed comparing, it was faster than the one without custom Docker push to Docker Hub.

Prediction using Tidymodels

In this example, the IBM function appear in a case more complex. I fitted a model with Titanic database using the library tidymodels to later, I am able to deploy the model in IBM Cloud Function.

I fitted a logistic regression to classify the passengers in survived or not survived on Titanic disaster. The features selected were Sex and Pclass. The details about the data preparation to get the train_data are omitted in this paper.

library(tidymodels)
library(magrittr)lr_mod <- logistic_reg() %>% set_engine("glm")
lr_fit <- lr_mod %>% fit(Survived ~ Sex + Pclass, data = train_data)

After the fit, it was saved the model as yaml file in the local directory. Here is necessary use the yaml package and the tidypredict to parse in a yaml file.

yaml::write_yaml(tidypredict::parse_model(lr_fit),"R/my_model.yml")

Deployment

Configure the exec (see the section: Using a custom Docker to deploy) and Dockerfile:

FROM openwhisk/dockerskeletonRUN apk update && apk add R R-dev R-doc build-baseRUN R -e "install.packages(c('jsonlite', 'tidypredict', 'yaml'), repos = 'http://cran.rstudio.com/')"

Configure a script.R to load the model and compute the predictions using tidymodels:

#!/usr/bin/env Rscript# load model
loaded_model <-tidypredict::as_parsed_model(yaml::read_yaml("my_model.yml"))# input
input <- jsonlite::fromJSON("input.json", flatten = FALSE)# compute prediction
pred <- tidypredict::tidypredict_to_column(as.data.frame(input), loaded_model)# output
jsonlite::stream_out(pred, verbose = FALSE)

To deploy the model follow the steps:

Build and push in Docker Hub to get better performance when to request the function.

# docker
docker build th1460/titanic .
docker push th1460/titanic

Packaging the files exec, script.R, my_model.yml.

# login
ibmcloud login
ibmcloud target --cf# zip
zip -r titanic.zip exec script.R my_model.yml

Create the function declaring --web trueto turn the function as web api.

# deploy
ibmcloud fn action create titanic titanic.zip --docker th1460/titanic --web true

In order to request your api prediction, you can make a POST using either curl or the function from httr package. The <APIHOST>and <NAMESPACE>can be found with ibmcloud fn action get <ACTION NAME> --url

input <- list(Sex = "male", Pclass = "3rd")"https://<APIHOST>/api/v1/web/<NAMESPACE>/default/titanic.json" %>% httr::POST(., body = input, encode = "json") %>% httr::content() %>% .[c("Sex", "Pclass", "fit")] %>% jsonlite::toJSON(pretty = TRUE, auto_unbox = TRUE)

Results

After the request, the output shows the parameters ( Sex and Pclass) and the probability to survive in Titanic disaster (fit).

{"Sex": "male", "Pclass": "3rd", "fit": 0.0979}

The request can also be made in the IBM Cloud page. In this example the request takes 964ms to get the results.

animation showing the IBM cloud function console in action — IBM Cloud function console

Conclusion

The IBM Cloud function is an amazing resource to any programming language. It is easy to configure, deploy and scale. If this feature match with the needs of your project (availability, frequency of requests, etc). It should be an interesting choice to run a serverless function with R in production.

References

Preparing apps in Docker images: https://cloud.ibm.com/docs/openwhisk?topic=openwhisk-prep#prep_docker
Creating web actions: https://cloud.ibm.com/docs/openwhisk?topic=openwhisk-actions_web
Serverless Functions in your favorite language with Openwhisk: https://medium.com/openwhisk/serverless-functions-in-your-favorite-language-with-openwhisk-f7c447558f42