IBM Cloud Functions with R
Serverless computing is a useful resource that you can execute in the cloud. It combines financial savings, reduced management time, configuration and ease to deployment/scale. This means that a data scientist can work faster and smarter while building a machine learning pipeline, such as providing a prediction service. The Function-as-a-Service (FaaS) platform in IBM Cloud is a resource to execute code on demand with HTTP-based API requests. The IBM Cloud Functions is based on the Apache OpenWhisk open source project.
Today it is possible with any programming language to create IBM Cloud Functions. For some languages, such as Python, Node, Go, and others, the platform offers a default runtime and creating the serverless function has fewer steps. Many data scientists believe that it is not possible to develop or it may be quite complex for programming languages without this support. This article will show how it is possible to create an IBM Cloud function with the R language.
Configuration
To run a function with a language that is not supported by IBM Cloud Functions, you will need to configure an exec
file. In the cloud, the function will run on a Docker container whose image is the openwhisk/dockerskeleton
. The apk
is an Alpine Linux package management, so you be able to add
Linux dependencies in your container. In this project, beyond the R
and system dependencies, it is also necessary a package to handle with json (e.g. jsonlite
). Because, in the function, both input and output should be in this format. The main structure of exec
is: install the system dependencies and R packages, get input and save as json file, and finally run a R script as executable.
#!/bin/bash# install R
apk update && apk add R R-dev R-doc build-base# install package
R -e "install.packages('jsonlite', repos = 'http://cran.rstudio.com/')"# run R script
chmod +x script.R # turn executable
echo "$@" > input.json # get input
./script.R # run script
To begin with a minimal to understand about the configuration I created a toy example to compute the function A = s².
The script.R
is set as:
#!/usr/bin/env Rscript# get input
input <- jsonlite::fromJSON("input.json", flatten = FALSE)input <- as.numeric(input)# function
A <- function(s) s^2# output (it is important set the output as JSON)
jsonlite::toJSON(list(side = input, area = A(input)), auto_unbox = TRUE)
Deploy the function in IBM Cloud
It is necessary to package the files exec
and script.R
with zip
. The argument --native
means that the function will start with a default container openwhisk/dockerskeleton
.
# login
ibmcloud login
ibmcloud target --cf# zip
zip -r raction.zip exec script.R# deploy
ibmcloud fn action create raction raction.zip --native
Testing the function
In order to test the function deployed you can make an invoke and set the param that you want to submit.
ibmcloud fn action invoke raction --result --param s 8
Using a custom Docker to deploy
It is also possible to use a custom container to deploy your function. You need create a Dockerfile
, build
and push
(in the Docker Hub only).
# dockerfile
FROM openwhisk/dockerskeleton
RUN apk update && apk add R R-dev R-doc build-base
RUN R -e "install.packages('jsonlite', repos = 'http://cran.rstudio.com/')"# build
docker -t th1460/raction .# push
docker push th1460/raction
I modified the exec
file because the step to install R, Linux dependencies and R libraries was executed in the Docker build.
#!/bin/bash# run R script
chmod +x script.R # turn executable
echo "$@" > input.json # set input
./script.R # run script
To deploy the functions you need to indicate the repository from Docker Hub, so you have to replace --native
by --docker th1460/raction
.
ibmcloud fn action create raction raction.zip --docker th1460/raction
This approach was efficient to reduce the time to build in the request of the function. In speed comparing, it was faster than the one without custom Docker push to Docker Hub.
Prediction using Tidymodels
In this example, the IBM function appear in a case more complex. I fitted a model with Titanic database using the library tidymodels
to later, I am able to deploy the model in IBM Cloud Function.
I fitted a logistic regression to classify the passengers in survived or not survived on Titanic disaster. The features selected were Sex
and Pclass.
The details about the data preparation to get the train_data
are omitted in this paper.
library(tidymodels)
library(magrittr)lr_mod <- logistic_reg() %>% set_engine("glm")
lr_fit <- lr_mod %>% fit(Survived ~ Sex + Pclass, data = train_data)
After the fit, it was saved the model as yaml
file in the local directory. Here is necessary use the yaml
package and the tidypredict
to parse in a yaml file.
yaml::write_yaml(tidypredict::parse_model(lr_fit),"R/my_model.yml")
Deployment
Configure the exec
(see the section: Using a custom Docker to deploy) and Dockerfile
:
FROM openwhisk/dockerskeletonRUN apk update && apk add R R-dev R-doc build-baseRUN R -e "install.packages(c('jsonlite', 'tidypredict', 'yaml'), repos = 'http://cran.rstudio.com/')"
Configure a script.R
to load the model and compute the predictions using tidymodels
:
#!/usr/bin/env Rscript# load model
loaded_model <-tidypredict::as_parsed_model(yaml::read_yaml("my_model.yml"))# input
input <- jsonlite::fromJSON("input.json", flatten = FALSE)# compute prediction
pred <- tidypredict::tidypredict_to_column(as.data.frame(input), loaded_model)# output
jsonlite::stream_out(pred, verbose = FALSE)
To deploy the model follow the steps:
- Build and push in Docker Hub to get better performance when to request the function.
# docker
docker build th1460/titanic .
docker push th1460/titanic
- Packaging the files
exec, script.R, my_model.yml.
# login
ibmcloud login
ibmcloud target --cf# zip
zip -r titanic.zip exec script.R my_model.yml
- Create the function declaring
--web true
to turn the function as web api.
# deploy
ibmcloud fn action create titanic titanic.zip --docker th1460/titanic --web true
In order to request your api prediction, you can make a POST
using either curl
or the function from httr
package. The <APIHOST>
and <NAMESPACE>
can be found with ibmcloud fn action get <ACTION NAME> --url
input <- list(Sex = "male", Pclass = "3rd")"https://<APIHOST>/api/v1/web/<NAMESPACE>/default/titanic.json" %>% httr::POST(., body = input, encode = "json") %>% httr::content() %>% .[c("Sex", "Pclass", "fit")] %>% jsonlite::toJSON(pretty = TRUE, auto_unbox = TRUE)
Results
After the request, the output shows the parameters ( Sex
and Pclass
) and the probability to survive in Titanic disaster (fit
).
{"Sex": "male", "Pclass": "3rd", "fit": 0.0979}
The request can also be made in the IBM Cloud page. In this example the request takes 964ms to get the results.
Conclusion
The IBM Cloud function is an amazing resource to any programming language. It is easy to configure, deploy and scale. If this feature match with the needs of your project (availability, frequency of requests, etc). It should be an interesting choice to run a serverless function with R in production.
References
- Preparing apps in Docker images: https://cloud.ibm.com/docs/openwhisk?topic=openwhisk-prep#prep_docker
- Creating web actions: https://cloud.ibm.com/docs/openwhisk?topic=openwhisk-actions_web
- Serverless Functions in your favorite language with Openwhisk: https://medium.com/openwhisk/serverless-functions-in-your-favorite-language-with-openwhisk-f7c447558f42