Getting Started with Docker to do Data Science and deploy a Shiny App

Analytics Anonymous
4 min readMay 29, 2017

In this post, I would go over the basics of Docker containers, touch on using Docker for data science, and finally show how to deploy a simple Shiny App. I will assume no prior knowledge of Docker and show you how to install Docker on your laptop as well. In essence, the tutorial that follows is meant to be a self-contained and (hopefully) actionable getting started guide!

Image source: https://blog.docker.com/2015/05/dockers-2nd-birthday-by-the-numbers/

Short Introduction

Docker is an open-source software container platform. It creates containers on top of an operating system using Linux kernel features, thereby virtualizing the operating system instead of the physical hardware (making it more portable and efficient than Virtual Machines).

Why Docker?

  1. Easy to use: it’s “build once, run anywhere”, meaning you can build an application on your laptop and it can run unmodified on any server or cloud
  2. Fast: containers share the OS kernel and take up fewer resources than VMs — container is lightweight and starts almost instantly!
  3. Rich in ecosystem: Docker Hub alone has hundreds of thousands of public images, community-created and readily available for use (see next section). There are other Docker registry hosting services too (e.g. Quay).
  4. Scalable: break down your application into multiple containers for modularity, then you can link them together; scale easily by adding in new containers or destroying unused ones independently.

Containerized Data Science

Thanks to the rich ecosystem, there are already several readily available images for the common components in data science pipelines. Here are some Docker images to help you quickly spin up your own data science pipeline:

So far so cool, now let’s get our hands dirty and do real stuffs! We’ll start by installing Docker on our machine.

Installing Docker

Note: The instructions below are taken from a repository by DataKindSG: https://github.com/DataKind-SG/contain-yourself

… for Windows

Follow the setup instructions here: https://docs.docker.com/docker-for-windows/install/

Note: If your machine doesn’t meet the requirements for “Docker For Windows”, try setting up “Docker Toolbox”: https://docs.docker.com/toolbox/toolbox_install_windows/

… for Linux

Follow the setup instructions for your flavor of Linux here: https://docs.docker.com/engine/installation/linux/

… for MacOS

Follow the setup instructions here: https://store.docker.com/editions/community/docker-ce-desktop-mac

Or if you use Homebrew Cask,

$ brew cask install docker

Building a shiny app

In order to build a Docker image for our shiny app, we will need a Dockerfile (see below template). Once the image is built from the Dockerfile, we can simply run it and enjoy our app!

To build our demo dockerized Shiny, here’s what you need to prepare along with the Dockerfile:

  1. Your shiny source codes (e.g. ui.R and server.R)
  2. The shiny server configuration file and script: shiny-server.conf and shiny-server.sh

And here is a sample Dockerfile that you can use directly:

FROM r-base:latestMAINTAINER Analytics AnonymousENV http_proxy “”ENV https_proxy “”RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev/unstable \
libxt-dev \
libssl-dev \
gsl-bin \
libgsl0-dev
# Download and install shiny serverRUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
VERSION=$(cat version.txt) && \
wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
gdebi -n ss-latest.deb && \
rm -f version.txt ss-latest.deb
RUN R -e "install.packages(c('shiny', 'shinydashboard', 'dplyr'), repos='http://cran.rstudio.com/')"COPY shiny-server.conf /etc/shiny-server/shiny-server.confCOPY myshinyapp /srv/shiny-server/EXPOSE 3838COPY shiny-server.sh /usr/bin/shiny-server.shCMD ["/usr/bin/shiny-server.sh"]

Briefly, what the Dockerfile above does is the following to the Docker image:

  • Install the required dependencies
  • Install shiny server
  • Install required R packages
  • Copy the shiny server configuration (shiny-server.conf and shiny-server.sh)

It assumes that the shiny files ui.R and server.R are located in the ‘myshinyapp’ directory (you can change this). The shiny files are then copied over to /srv/shiny-server in the Docker image. Notice that proxy settings need to be specified in the two ENV lines (you’ll need this to work if you are behind a corporate proxy).

To build and tag the image as anonymous/analytics:1.0.0, run this command in the same directory the Dockerfile is located:

$ docker build –t anonymous/analytics:1.0.0 .

The dot (.) in the command above refers to the current working directory.

After the image is built, you can run it as follows (e.g. to run an image named anonymous/analytics tagged 1.0.0):

$ docker run --rm –p 3838:3838 anonymous/analytics:1.0.0

The — rm flag tells Docker to remove the container upon exit. The –p option specifies the container ports that are to be ported to the host ports. If you have data to attach to the image (like me), you can put it in a directory (e.g. shinydata) and give it a –v flag:

$ docker run –p 3838:3838 –v shinydata:/srv/shiny-server/data anonymous/analytics:1.0.0

Final Words

If you’ve made it this far, I hope you are now able to use Docker and deploy your Shiny Application. Next, perhaps you may want to:

Please give us a LIKE and your COMMENT too if you like this post!

--

--

Analytics Anonymous

Sharing awesome analytics ideas and insights, one byte at a time.