Using docker to deploy an R plumber API
Get your R code to run anywhere as a service
[This post is co-authored by Heather Nolis, and is part of a series on R in production. We recommend you start with part 1.]
In the previous blog post we learned how to use the R library plumber to create a RESTful API. We showed that by running a simple R script, we could have a web service running on our computer straight from RStudio. But what if you want to run the web service somewhere besides your computer? How do you get that code running somewhere else? And how can you do it in a way that’s quick to set up and easy to maintain?
[Before we begin, here is some legalese from our lawyers.]
The code in this blog are examples we ran using Docker 18.06, which may need to be modified for your own environment. We will make our best effort to update but we make no guarantees on that. Feel free to modify the code as you see fit. In fact, if you want to fork the code, feel free to do so. All we ask is that you honor any open source licenses and attributions including from us. It goes without saying, but you are responsible for respecting the data you use and ensuring best practices for cybersecurity. In general, before using in production, please have your internal IT department (if they exist) review your code and your data. They know your environment better than we do, and we are not in a position to provide general support. In legalese, “Use is AS IS and without warranty.” Your IT department (if they exist) should be aware of the open source licenses for some of these tools. Most are Apache and BSD. R and Rocker are GPL based. Note that different departments have different policies about open source.
The ☁️cloud☁️ and virtual machines
As a data scientist, you sometimes want to have code running in places that are not your computer. In our case, we want to have our R script run an API continuously, regardless of if our laptop runs out of battery.
One way we could set this up would be to create a virtual machine in a place like Amazon Web Services. A virtual machine is like a computer that you’ve used before, but it’s simulated–it doesn’t have it’s own process, memory, or hardware. Instead, a different computer is running the machine. Virtual machines are great because many of them can run on a single server, and they can be easily started and stopped.
If you wanted to run the API on a virtual machine you would log onto AWS and:
- Start an EC2 instance (a virtual machine) with the right operating system and set the security group to allow HTTP traffic.
- Install the correct version of R.
- Install the right R libraries, some of which may come from GitHub.
- Put the R scripts for your plumber API on the instance.
- Open port 80 within the virtual machine’s firewall.
And that works! If you follow these steps you will be able to hit the API endpoint you created. But there are a few downsides with this process:
- Starting the EC2 instance and installing all of these programs takes a ton of time. If the EC2 instance was accidentally deleted, you would have to do it all again. If you no longer worked at your company, no one would know exactly what steps you took to create the EC2.
- Virtual machines take a lot of computing resources. They are entirely simulated computers, so they require an entire OS running and all the programs that come with it.
Both of these problems are fixed with docker.
Enter Docker
Docker is a way to make the process of configuring and running computers smoother. With docker, you can create a single document that specifies how to set up the computer. The document lets you run these steps at a moments notice. More formally, docker is a way to run virtual machines as containers, which are lightweight executable packages of software. A container is an instance of an image, which is a snapshot of a computer at a moment in time. Images can be full snapshots, or they can just be a small addition to an earlier image. A dockerfile is the specification document for how to build the image.
Or to put it another way:
- A dockerfile lists how to build the computer we’ll want to run (like the list of steps in the section above).
- An image is a file that’s a snapshot of that computer. Each step of the docker file constructs a new image. That image is used as the starting place for the next step in the docker file.
- A container is when you take an image file and begin running it. Running an image creates a container.
Docker containers are a great way to have computers set to exact specifications. Software developers love containers because they allow quick creation of many computers running the exact same code. Docker simplifies installation and maintenance of production-level code.
Docker containers are also smaller than virtual machines because they can share components amongst each other. If we have many docker containers that all need Linux with R installed, then they can all share those core Linux and R components.
By passing a docker image, you can immediately have someone else run your code! All the setup is taken care of, provided they have docker installed.
Using Docker for R and plumber
We’ll use the API we created in part 1 of this series as the code we want in our docker container. You can find the R code in our GitHub repository. Let’s try making a container out of it and running it on our own machine. In our case we’ll want to set up a computer following roughly the steps above:
- Start a computer with Linux.
- Install R on it.
- Install our R libraries.
- Transfer our R scripts to the computer.
Since we’re using docker, performing each one of those steps will correspond with creating a new docker image. We’ll start with an image already setup to have Linux on it. Then we’ll create an image that has R installed on top of Linux. After that, we’ll add our R libraries into a third image. Lastly, we’ll transfer our files to the final image.
A great thing about docker is that since so many people use it, you can often find images that other people have made that do what you want to do. In our case, the Rocker project has a ton of docker images that support R, including ones with R+RStudio, Shiny, and more. This means we don’t have to devise these images ourselves; we can use those images as starting points for ours. To start on our container, we’ll use the “r-ver” docker image, and thus the first two steps are done.
To use docker, you’ll first need to install it on the computer you’re using (here are links for windows or mac). This is relatively painless, but it requires admin rights and may require you to change some operating system settings. Begin by starting docker on your computer. You’ll know you’ve started it if this dashing looking whale stops moving:
Once docker has started, go to the terminal (or PowerShell in windows) and type:
docker pull rocker/r-ver:3.5.0
This will get the Rocker docker image from the internet that has R version 3.5.0 installed on a linux machine. So you are downloading a compressed file that has a full computer’s information in it. It’ll take a few seconds, and during this time you’ll see a screen like this:
Each line represents a sub-image, starting with one that has just the operating system and ending with one that has R installed. All put together, these create the full rocker/r-ver:3.5.0 image.
Once complete you can run the following command:
docker run -ti rocker/r-ver:3.5.0
This runs a rocker container that is set up to run interactive R. You should see the familiar R command line interface. This works even if you don’t have R installed on your own computer! 😱 It’s all happening in this tiny container running concurrently inside your computer. If you were to quit the interactive R session, the container would stop — meaning anything you did inside it would be lost. This makes sense since the container is not your computer, it’s just running on your computer.
For our example, running R alone isn’t sufficient. We also need to install plumber, copy our files, set up our rest controller, and open the container to port 80 so that it can receive HTTP traffic. To add onto this Rocker image, we need to make the dockerfile.
In the same directory as the plumber code from part 1, create a plaintext file called dockerfile
. In order to build, docker always looks for a plain text file with no extension called dockerfile
, so this is important.
To customize our docker container, we first specify our base container. For us, that’s the rocker/r-ver:3.5.0 image. We do this by using the `FROM` command:
FROM rocker/r-ver:3.5.0
Then, we need to install the linux packages that are required for the plumber using linux bash commands. To these commands while setting up the image, we use the RUN
. We first check for the newest packages and then install two which aren’t included in the standard update.
RUN apt-get update -qq && apt-get install -y \
libssl-dev \
libcurl4-gnutls-dev
Install plumber using RUN
to execute one line of R code:
RUN R -e "install.packages('plumber')"
Now we have a container with all the required libraries and packages. But we still need to copy any R scripts that we use to run our webservice! To make it easy, let’s just copy everything from the folder containing our dockerfile
and the code from part 1 into the image. The COPY
command takes the first argument as the location on the computer with the dockerfile
and copies it into the specified folder in the image:
COPY / /
Next we open port 80 to web traffic (since that’s the port plumber listens to) using the EXPOSE
command:
EXPOSE 80
Lastly, we should specify what happens when the container starts. In part 1 we created a main.R
file that starts the plumber web service in R. We want that to happen in our container. So we set the ENTRYPOINT
of the container to run the main.R
script.
ENTRYPOINT ["Rscript", "main.R"]
And that’s our dockerfile! It builds everything we need to run our API from any machine, regardless of the setup! All together it looks like this:
To build our image, open a terminal in the project folder and run
docker build -t plumber-demo .
This builds the image and names it “plumber-demo”. Because installing plumber requires compiling several other R packages, this takes a few minutes. Don’t worry! Once the process is complete, you can run the container by using:
docker run --rm -p 80:80 plumber-demo
Now, you can run your API just like you did in part 1 — by navigating to this address: http://127.0.0.1/predict_petal_length?petal_width=1. Any other person can take your same project run and your web service on their computer — regardless of OS and programs installed.
Wow! When you’re ready you can stop all running containers by using:
docker stop $(docker ps -a -q)
If you want to have this web service running on a server in the cloud, it only takes a few more steps. One way to do this is to use a tool like Amazon Elastic Container Service (ECS). ECS lets you take a container you’ve created and deploy it to AWS. At that point your container is running within AWS systems, and you will get an endpoint that anyone can hit with a browser from anywhere. Wild!
With a better understanding of what docker is, why it’s super cool, and how it can be used to run R plumber APIs anywhere, you’re now empowered to make your own APIs. Take that, engineering team!
You should be able to create your own simple docker container and understand the basics of what is happening in a dockerfile.
There is still work left to do if you wanted to use R and plumber in an enterprise setting. In particular:
- You may need your container to support the keras and tensorflow libraries, which require Python.
- You’ll want to make the container as small as reasonably possible
- You definitely want your web service to support encryption via HTTPS, which isn’t supported by plumber.
- You should make the dockerfile something that you never have to touch (so remove things like the list of R libraries to install).
In the final part of this series, we’ll talk about the r-tensorflow-api. We crafted a small docker container for T-Mobile to run neural networks in R as a RESTful API in an enterprise environment.
Though docker seems complicated and mysterious at first pass, it is just a way to write a program to build environments for you that guarantee all your dependencies are present. It makes sharing code smooth — something that is a pain point for the vast majority of data scientists.