R can API and So Can You!

Empowering R Users From Analysis to Endpoint

[This post is the first of a series on R in production co-authored by Jacqueline Nolis. Here is part 2 (using Docker) and part 3 (production-quality API development in R)]

As a machine learning engineer who came up through software development, some of the most fascinating yet frustrating parts of my job come from integrating with data scientists. Because most data scientists haven’t worked on a team that creates software, it often feels like we speak different languages.

me, trying to talk to data scientists

When our team was tasked with creating a customer-facing deep learning model, I proposed making it into an API– which was met with a swath of data science confusion. An API is the textbook way to allow other T-Mobile software to leverage our model. The most engineering-savvy data scientist on the team kept referring to it as “R as a web server.” While technically true, to me this as an amusingly spot-on, living example of the typical data scientist resistance towards API’s. On the opposite side of the fence is software engineering where everything is an API. Our API’s have API’s to check the health of dependent API’s. So how could API’s, something so fundamental to my job, be such a mystery to my data science comrades?

trying to keep track of microservices API flow like

Finally, a data scientist on the team explained it to me. API’s sound complicated! They’re these big scary things that engineers design, build, and maintain. No way a data scientist is qualified to do that! But here’s the thing — if you can sit at your computer and analyze a dataset in R, you can build an API. And with an API, you’ll empower other people to start using the model you’ve made. The analysis needed to make the model is the hard part! After that, an API is easy. In this post I’ll walk through what API’s are and detail how you can use plumber to create your own API in R.

Some notes on environment

This is an experiment! We’re going to talk about how to make APIs in R. The more people who show interest, the further we can expand our efforts. So, think of this information as a bit of a beta test. We’re putting our code out into the world in hopes that people will also be interested in supporting R APIs, using our code as a baseline. If enough people are interested, we’ll officially release this as an open source project.So in legalese:

  1. The code in this blog are just examples. They will have to be modified for your own environment.
  2. We will try to update our code as changes come out, but want to gauge interest first!
  3. 3. Please modify our code as you see fit. Fork it! Show us what you can do. All we ask is that you honor any open source licenses and attributations — including those from us.
  4. You are responsible for respecting the data you use and ensuring best practices for cybersecurity. Before you just throw a model into production, please let your internal IT department review and security team review your code and data. They know more about your environment that I ever will and we’re not in a position to provide general support. In legalese, “Use is AS IS and without warrantee.”
  5. Your IT department should be aware of the open source licenses for some of these tools. Most are Apache and BSD. R and Rocker are GPL based. Note that different departments have different policies about open source.

With all that aside, now to the FUN STUFF!

Introduction to APIs

Technically, API stands for Application Program Interface — a name that’s both exactly what an API is and too vague to actually convey any meaning. But what are API’s really? They’re a simple way to set up a computer to pass information to other computers through the internet. 95% of the time, when people say API they mean a RESTful API, which is an API that uses HTTP to interact with other computers. HTTP is the same protocol your browser is using for you to read this article right now, as websites just APIs! When you type in a URL to a website, a computer receives your request and sends back HTML. RESTful API’s are the same thing, but instead of HTML they send back text or data.

When you call an API you typically do one of two things. You either request that computer to send you a specific piece of information (like the weather of a certain city you requested), or you ask the other computer to change the data it has stored (like adding a record to a table). Most simply, API’s are ways to — from your computer — call a function on another computer.

API’s are used everywhere in software development. Let’s walk through a quick example. When you open the weather app on your phone, it probably:

  1. Gets your GPS coordinates
  2. Calls a location API with your GPS coordinates.
  3. Gets back the nearest city to you.
  4. Calls a weather.com API with your current city.
  5. Gets back the forecast for your location.
  6. Displays this information on your screen.

Because most APIs use HTTP, you can try using your browser to call them directly . For example, Open Notify has an API that tells you how many people are in space. When you hit the API by opening this web page, it tells you how many people are in space right now. See? It’s easy!

Using R and Plumber to Create an API

The easiest way to create an API in R is using the library plumber — a package which can convert existing R code into an API with just a few extra lines.

It’s easier to show than tell, so let’s start with a really simple R model. Here, we take in the classic iris dataset and create a linear model which correlates petal length to petal width. We’ll pretend that this model is so interesting that it should be exposed to the world, allowing other people to run it on their own data.

First, we create a script that loads the data and trains the model.

make_model.R

# load the dataset
dataset <- iris
# create the model
model <- lm(Petal.Length ~ Petal.Width, data = dataset)

Now that we trained the model we can try testing it. Set up a data frame containing one element — a petal width of two. Then, run the model we saved on our input data to get the result.

# example: run the model once
input_data <- data.frame(Petal.Width=1)
predict(model,input_data)

The output is

>`[3.3135]`

That’s great! Unfortunately, to do this we had to manually alter the R code to contain 1. What if the people running our model don’t want to learn R? Or maybe don’t want to learn to code at all?

your second favorite plumber

In that case, we can set up an API so that each time we pass in a parameter, the model will run once. In the rest controller, where all the logic for API’s lives, we first create the model by running the previous script. Then, we define our API endpoints. Since we want a simple endpoint that takes in a petal width and returns a prediction, a simple GET endpoint will do.

When a GET request containing a petal width is received, the code must:

1. Convert the input (petal width) into a number so R can use it

2. Create a data frame containing just that petal width.

3. Run the model and return the prediction.

The following code does just that:

rest_controller.R

# make the model
source("make_model.R")
#* @get /predict_petal_length
get_predict_length <- function(petal_width){
# convert the input to a number
petal_width <- as.numeric(petal_width)
    # create the prediction data frame
input_data <- data.frame(Petal.Width=as.numeric(petal_width))
    # create the prediction
predict(model,input_data)
}

Finally, we need to use plumber to set up our R code to accept HTTP requests and transform them into executable R code. This is the only thing new to you as an R user

We simply:

1. Import plumber.

2. Show plumber where our endpoints are.

3. Start the API service on port 80.

Since HTTP defaults to port 80, our service to that port allows things we type in our browser to be executed in the R on our computer.

Main.R

library(plumber)
r <- plumb("rest_controller.R")
r$run(port=80, host="0.0.0.0")

Great! To see all of this code together, go to our GitHub repository . After cloning the project, run

install.packages('plumber')

and then run main.R in RStudio. You should see:

>`Starting server to listen on port 80`

To hit our API, open the browser to http://127.0.0.1 — this is the port we opened in Plumber. You should see:

>`{“error”:[“404 — Resource Not Found”]}`

This is expected! We’re opening our browser to the right place, but we haven’t navigated to the endpoint we defined in rest_controller.R. Now, add “predict_petal_length” to the end of the navigation bar, which should result in http://127.0.0.1/predict_petal_length. You’ll see:

>`{“error”:[“500 — Internal server error”],”message”:[“Error in (function (pedal_width) : argument \”pedal_width\” is missing, with no default\n”]}`

Okay, so that didn’t work. The error code tells us exactly why though! We’re missing petal width, and the model can’t run on nothing. So let’s add our parameter petal width. This is done by adding a ? at the end of the navigation bar followed by the parameter name, an equals sign, and the value of that parameter. Let’s pass a petal width of 1. Now, our browser is pointing at http://127.0.0.1/predict_petal_length?petal_width=1

BOOM! In your browser you should see

>`[3.3135]`

This is the same output we ran from R Studio. Congratulations! You’ve made your first API. Now you can have your R code run by typing things into your browser.

But… what we’ve done is make an API that you can hit from your own computer. That’s not super useful! API’s are best used to enable computers to talk to each other. You could set your computer up as a server always open to port 80, jump a million security and networking hoops to let other people connect to your computer, and then just leave it on forever so other people can make nifty iris predictions, but that sounds exhausting, terrible, and like an utter waste of resources. You could go to Amazon AWS and get a virtual computer in the cloud that’s always on (aka an EC2 instance), leaving your laptop free to use. However, the programming setup alone is a huge hassle on your own computer, much less a virtual machine you can’t touch.

So how can you go about exposing this brilliant model to be always available, accessible by other machines, and able to handle varying volumes of requests?

Enter: [Docker].