IBM Cloud Code Engine with R

Published in

IBM Data Science in Practice

6 min readApr 5, 2022

I have already written about how configure and use the IBM Cloud functions with R . The IBM functions are a cloud resource for serverless computing that brings a lot of facilities to deploy scripts. However, in 2021 the IBM announced the Code Engine, a resource that introduces improvements to the serverless. Code Engine abstracts the infrastructure, allowing us to center our attention on business logic rather than becoming infrastructure experts. This approach is nice for Data Scientists that want to deploy analytics apps or batch jobs. An advantage of Code Engine over IBM Cloud functions is that it is not restricted to web calls or scheduled events, but also allows the use of all available applications.

Code Engine uses the power of Kubernetes without the complexity.

Overview

Code Engine uses a Kubernetes container approach with the following collection of technologies:

Paketo: transform the application source code into container image
Tekton: create a pipeline to generate the container image and send to registry
Knative: get and deploys images and controls scaling
Istio: responsible for the network layer and allows for blue green deployment in rolling updates

IBM Cloud Code Engine architecture (source: https://www.youtube.com/watch?v=0RkNDSrpr-E)

Applications and batch jobs are able to be deployed in Code Engine. This resource is quite versatile. The Code Engine transits between Functions-as-a-service and Platform-as-a-service resources. We can see other cloud providers below and compare similar resources.

Comparable resources with others cloud providers (source: https://www.youtube.com/watch?v=0RkNDSrpr-E)

To play with Code Engine in IBM Cloud you should have a Pay-As-You-Go Account but includes a free tier to test.

Case

To illustrate the use case of Code Engine, two workloads (app and batch job scheduled) were deployed. A batch job to get the 2019 Novel Coronavirus (COVID 19) data from this repository https://github.com/CSSEGISandData/COVID-19 and insert the data into a DB2 database provisioned in IBM Cloud. The application is a dashboard made in Shiny that shows the numbers of cases on a map. The data was filtered to only show countries from South America.

The repository with the batch job, application, and additional resources (Dockerfiles) is in https://github.com/th1460/ibmcloud-ce-with-r. The batch job is an R script and the application is an R Shiny based.

Set resource in IBM Cloud

All processes to work with Code Engine are able to be executed by the web console, but I will use the ibmcloud cli to configure and deploy. Follow this tutorial to install the ibmcloud cli https://cloud.ibm.com/docs/cli?topic=cli-getting-started.

# Login
ibmcloud login# Install code engine plugin
ibmcloud plugin install code-engine# Set resource group
ibmcloud target -g Default# Create a project
ibmcloud ce project create -n covid19# If the project has aldeady been created use to select
ibmcloud ce project select -n covid19

Build and push images

To deploy a workload there is an option to submit an image from a registry or from the source code in a repository. In this case we push an image to Docker Hub that will be used in the Code Engine deployment.

# Login to a registry
docker login# Build job and app
docker build -t th1460/covid_update -f Dockefile.update .
docker build -t th1460/covid_app -f Dockerfile.app .# Push job and app
docker push th1460/covid_update
docker push th1460/covid_app

Job

The job is created with the mandatory arguments --name and --image. Resources like memory and cpu are scaled with the update option. Environment variables to access the DB2 can be set using the create or update options.

# Create a job
ibmcloud ce job create --name covid-update  --image docker.io/th1460/covid_update# List jobs
ibmcloud ce job list# Get information about specific job
ibmcloud ce job get --name covid-update# Update job
ibmcloud ce job update --name covid-update --memory 0.5G --cpu 0.125ibmcloud ce job update --name covid-update \
--env DB2_HOST=<db_host> \
--env DB2_PASSWORD=<db_password> \
--env DB2_USER=<db_user>

Cronjob

It is possible associate a schedule to the job with the sub cron option. In this example the batch job is executed every day at 3AM. A nice page that helps with cronjob definitions is the https://crontab.guru/.

# Create cronjob
ibmcloud ce sub cron create --name covid-update-cron --destination-type job --destination covid-update --schedule '0 3 * * *'# Get informations about cronjob
ibmcloud ce subscription cron get -n covid-update-cron

App

The application deployment has the same logic as the batch job. Just switch the option from job to application.

# Create application
ibmcloud ce application create --name covid-app --image docker.io/th1460/covid_app \
--memory 4G --cpu 0.5 \
--env DB2_HOST=<db2_host> \
--env DB2_PASSWORD=<db2_password> \
--env DB2_USER=<db2_user># Get informations about app
ibmcloud ce application get --name covid-app# Get url from app
ibmcloud ce application get --name covid-app --output url

Code Engine with Plumber

Plumber is an R library to create web APIs. I will show how create an API for prediction/inference. A classifier model will be deployed in Code Engine. The Titanic data (from titanic library) was used to fit a model to predict the probability of survival.

The first step was to prepare the data:

train_data <- titanic::titanic_train |>
  dplyr::select(Survived, Pclass, Sex) |>
  dplyr::mutate(Sex = as.factor(Sex), 
                Pclass = factor(Pclass, 
                                labels = c("1st", "2nd", "3rd")), 
                Survived = factor(Survived, 
                                  labels = c("Yes", "No")))

Followed by fitting the model:

lr_mod <- parsnip::logistic_reg() |>
  parsnip::set_engine("glm")lr_fit <- lr_mod |>
  parsnip::fit(Survived ~ Sex + Pclass, data = train_data)

The model was saved in the Cloud Object Storage using the library pins package:

saved_lr_fit <- tidypredict::parse_model(lr_fit)
board <- pins::board_s3(bucket = Sys.getenv("COS_BUCKET"),
                        region = Sys.getenv("COS_REGION"),
                        access_key = Sys.getenv("COS_ACCESS_KEY_ID"),
                        secret_access_key = Sys.getenv("COS_SECRET_ACCESS_KEY"),
                        endpoint = Sys.getenv("COS_ENDPOINT"))board |> pins::pin_write(saved_lr_fit, name = "my-model")

The pins package is for versioning assets and artifacts of an analysis; e.g. data, models, etc.

The plumber side of the script was set as:

library(plumber)#* @apiTitle Prediction Survived in Titanic Disaster
#* Return the prediction survived
#* @param sex Sex
#* @param pclass Class
#* @get /predict
function(sex, pclass) {
  
  board <- pins::board_s3(bucket = Sys.getenv("COS_BUCKET"),
                          region = Sys.getenv("COS_REGION"),
                          access_key = Sys.getenv("COS_ACCESS_KEY_ID"),
                          secret_access_key = Sys.getenv("COS_SECRET_ACCESS_KEY"),
                          endpoint = Sys.getenv("COS_ENDPOINT"))  model <- board |> pins::pin_read("my-model")
  input <- data.frame(Sex = sex, Pclass = pclass)
  pred <- tidypredict::tidypredict_to_column(input, model)  return(pred)}

Now we create a Dockerfile, then build and push the resulting image to Docker Hub. The image was based in the lightweight linux distribution alpine.

# Dockerfile
FROM alpine:3.14
ARG NOT_CRAN=true
ARG ARROW_R_DEV=true
RUN apk update && apk add R R-dev R-doc build-base libsodium-dev autoconf automake bash cmake g++ gcc make libxml2-dev
RUN set -ex && apk --no-cache add sudo
RUN sudo R -e "install.packages('arrow', repos = 'https://packagemanager.rstudio.com/all/__linux__/focal/latest')"
RUN sudo R -e "install.packages(c('paws.storage', 'plumber', 'pins', 'tidypredict'), repos = 'http://cran.rstudio.com/')"
COPY src/plumber.R .
COPY src/run.R .EXPOSE 8080CMD ["Rscript", "run.R"]# Build
docker build -t th1460/plumber -f Dockerfile.plumber .# Push image
docker push th1460/plumber

The run.R file contains:

plumber::pr_run(plumber::pr(‘plumber.R’), host = “0.0.0.0”, port = 8080)

To deploy the Code Engine:

# Create project
ibmcloud ce project create --name plumber# Create application
ibmcloud ce application create --name plumber --image docker.io/th1460/plumber \
--memory 4G --cpu 0.5 \
--env COS_ACCESS_KEY_ID=<cos_access_key_id> \
--env COS_SECRET_ACCESS_KEY=<cos_secret_access_key> \
--env COS_ENDPOINT=<cos_endpoint> \
--env COS_BUCKET=<cos_bucket> \
--env COS_REGION=<cos_region># Get url from application
ibmcloud ce application get --name plumber --output url

The url from service is:

https://plumber.lnlpoiqaiyu.us-south.codeengine.appdomain.cloud/

Finally, we use the library httr to request the API with the arguments male and 1st class with the method predict:

"https://plumber.lnlpoiqaiyu.us-south.codeengine.appdomain.cloud/predict?sex=male&pclass=1st" |>        httr::GET() |>
httr::content()

The results show the probability of survival for this demographic is equal to 0.4147:

[[1]]
[[1]]$Sex
[1] "male"
[[1]]$Pclass
[1] "1st"
[[1]]$fit
[1] 0.4147

Thanks to Troy Hernandez for his support in proofreading.

References

IBM Cloud Code Engine is now generally available https://www.ibm.com/cloud/blog/announcements/ibm-cloud-code-engine-is-now-generally-available
Serverless 2.0 with Code Engine https://developer.ibm.com/events/serverless-2-0-with-code-engine/
IBM Cloud Code Engine https://www.ibm.com/cloud/code-engine
IBM Cloud Code Engine: go cloud native quickly https://www.youtube.com/watch?v=0RkNDSrpr-E