R Shiny in Production

Published in

Inloco Tech Blog

7 min readJun 11, 2019

R is a programming language initially developed to handle statistical computing and graphics. However, throughout the years, a lot of new features and packages have been added, offering the users more tools to create whole environments. One example of this is the package Shiny: a framework for building web applications using R. Shiny applications are a powerful and easy way to share and communicate your analysis and let people interact with it.

Many students that are in fields such as statistics, applied math, and computer science have heard about Shiny or have been using it in their data projects. David Robinson discussed the growth of R in The Impressive Growth of R. In this article, we can see the particular growth of Shiny as some of the most mentioned R packages in Stack Overflow.

The Impressive Growth of R by David Robinson.

Shiny allows anyone with basic R knowledge to start developing web applications, which can go from merely displaying your descriptive analysis to very sophisticated and interactive dashboards. An example is showing your colleagues your new machine learning algorithm performance by letting them upload an image and see the classification outcome or create an app where one can access demographic data of their country with interactive graphical analysis. As you can see below, here’s a simple app from Rstudio gallery:

The app shows the number of telephones by region of the world in the '50s.

What many R users don’t know about is that it is possible to build enterprise applications and put them in production using Shiny - the framework can be quite underrated sometimes. However, first of all, what does “in production” actually mean? A reasonable answer is:

“Production environment is a term used mostly by developers to describe the setting where software and other products are actually put into operation for their intended uses by end users. A production environment can be thought of as a real-time setting where programs are run and hardware setups are installed and relied on for organization or commercial daily operations.”
Source: www.techopedia.com

In summary, a production environment is used and relied on by real users, with real consequences if things go wrong. In this environment, we hope that everything works as intended for the user. There are some challenges to be faced when we talk about Shiny in production:

In general, R has lots of users that come from different backgrounds not only in the Computer Science world, like statistics. Usually, this type of background doesn’t cover building applications and deploying them, which can create a significant bottleneck if that step is required;
Creating production environments like Software Engineers is usually not a required skill for Data Scientists.

Here at In Loco, Data Scientists have ownership over the data they generate and how it is arranged for customers as well, including the technology stack. The knowledge exchange between Software Engineers and Data Scientists creates a vast ecosystem and makes the data products more performant and shipped faster for customer validation. Data products generate value to the customers quickly, and a great structure to support this environment is very important.

One of our current products is the OOH (Out Of Home) Planner, which is in an alpha version, a platform where the customer can analyze, plan and measure the performance of an offline media campaign. We built The OOH Planner in Shiny and, as Data Scientists, we had to face challenges and problems when deploying to production. Below, we can see one feature of our OOH Planner product.

The platform helps you to understand which OOH points are more likely to engage your customers in one of your places.

Many tools can help you analyze the app’s performance, as well as testing and responsiveness, such as:

They help you test and understand how the application is working. Showing parts of your code where the app is taking too long to respond can save your time if you would otherwise read the entire application code to find bugs and responsiveness bottlenecks.

Now let’s take a look at the architecture! We’ll focus on database and deploy.

Database

It is plausible to say that the core of an R Shiny application is data. The way the data is stored and the way we access it are both critical points in the performance of a Shiny app. When we are developing starter apps using Shiny, usually, there is a .csv file containing the data. Even in this scenario, there are good practices to improve the responsiveness performance: saving your data in a different format like RDS or feather. These formats compact your file in a manner that you’ll have a high reading performance inside your application.

However, what if our application has to deal with large amounts of data in real-time? In this situation, the workload should be taken out of Shiny. Be “taken out” means that data must be ready and preprocessed for the application to access it. Two words in the last phrase are very essential: preprocessed and access.

An example of a database structure for a shiny app. (Only displaying some databases supported by R)

R is a single-threaded process. This means that everything is running in sequence. So, making many calculations inside the app code can give you a headache and let your users tired of waiting for your system’s response.

The figure above shows an example of the structure that could be used in this case, and we can see some of the databases supported by R. Choose the database that best fits your app’s necessities, we’ve previously posted an article about it: The right database for the job. With this in mind, now we query the database and get some “ready” data from a user request, instead of making significant calculations inside the app.

Another thing that you have to avoid is querying without a filter to the database. Commonly, the data used is loaded in the global.R file:

data <- read_csv("data/my_app_data.csv")

data <- readRDS("data/my_app_data.RDS")

The database has millions of observations, and bringing all this data into the app will probably crash it. The query must be made using as filter some input or action from the user. Below there is an example for the Telephones by region app.

barPlot_data <- reactive({
    return(
      tbl(con, "telephones") %>% 
          filter(region == input$region)
    )
})

And the object barPlot_data has a data frame ready to be used for plotting, no need for any calculation. You can create indices in your database to boost the performance even more, but just filtering can handle the latency and responsiveness.

Deploy

Reliability is our primary objective when we deploy services in production, and that includes consistency. Imagine you are working on an analysis in R and you send your code to a friend. Your friend runs exactly this code on the same dataset but receives a slightly different result. The difference can have various reasons such as a different operating system or a different version of an R package. A solution to this problem is Docker!

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. In a way,

“Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they’re running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.”
Source: https://opensource.com/resources/what-docker

Using Docker is a great way to deploy a Shiny application.

Shiny application inside a container, establishing a connection to an external database.

Shiny application and database both containerized.

We can see two examples of deploy for Shiny apps using Docker. The app is inside of an isolated environment where there will not be problems about packages, R versions and a lot of others problems you can face when trying to run your code in another machine, be a friend’s pc or AWS EC2.

The first one is only the app inside a container and a connection to a PostgreSQL database outside the container. This one is the recommended scenario for a production environment since your data will be outside the container and you won’t need to worry about data being lost if your container is restarted or the new infrastructure you’d need to deal with to avoid this scenario. You can think that the database is shared by your whole team/chapter with ETLs and jobs, then you can write the data used by your service also there.

In the second one, both Shiny app and database are containerized. This architecture leverages the development flow, making it faster to try out different databases, testing your code and how your app is querying. When finished, all the data will be lost if you didn’t create a volume to store on your host machine.

Shiny applications are powerful and can be used in production environments, letting Data Scientists develop data products from end to end. To achieve this, the interaction between Data Scientists and Software Engineers is essential. Models and data products need the expertise of both professionals to be released, creating services that are reliable and secure.

Are you interested?

If you are interested in building context-aware products through location, check out our opportunities. Also, we’d love to hear from you! Leave a comment and let us know what you would like us to talk about in the upcoming posts.

A special thanks to Raíza and Abel.

R Shiny in Production

Database

Deploy

Are you interested?

Written by Gabriel Teotonio