Scaling R Shiny Applications with Cloud-Native Solution

Siva Anne
IBM Data Science in Practice
6 min readSep 1, 2020

Architectural Lessons from Data Science Engagements

The need to analyze data for quick insights has led to growth in the ecosystem of languages, tools, and frameworks for Data Science. While the Python-based ecosystem has taken the lead in recent years, R continues to be an essential tool and popular language of choice for Data Science projects. Every language has its own set of benefits and limitations. Often enterprises are operating under resource constraints and strong affinity to specific skills, so the real question is more about which options suit best for the project requirements and less about which language or tool is better.

With deep-rooted origins in statistical analysis and access to a growing count of 16k plus packages, R offers an exceptional edge in driving speedy data analysis coupled with powerful visualization of results. As part of the blog series on architectural lessons from Data Science engagements, this post shares details of solution architecture that has helped scale R Shiny applications at an economical cost.

R Shiny for Interactive Dashboards

Organizations incur no license fees to build, use, or distribute R applications. Over the years, this open-source ecosystem of R has been the catalyst for its wide adoption in academia and enterprises alike. Shiny offers an R package to visualize the results of R analysis quickly. The Shiny distribution includes R Shiny Server, a Linux based web server program to host Shiny apps as web applications.

Shiny apps follow a simple structure, are mostly contained in a single R script (app.R) that implements two essential parts, the UI and Server functions. The UI owns the display, captures user input, and passes to the server function. The server function calculates the output and hands it back to the UI for rendering. Shiny offers a family of built-in functions to build out UI widgets and server components. The UI and server components make up the Shiny app. The simple app structure makes it easy to build, distribute, and host Shiny applications.

Shiny inherently uses a reactive programming model. After initializing the application once, Shiny recalculates reactive R expressions only when a user changes any input value in UI widgets. Triggered by user actions, UI renders the recalculated output from the Server function. The reactive programming model optimizes the application’s execution path making Shiny a compelling choice for generating highly interactive visuals.

Shiny makes it easy to visualize results as interactive dashboards. Sharing the application as-is requires users to have an R environment to execute the R scripts. Hosting the Shiny application using the R Shiny server opens up for web-based access to a much larger group of users without imposing requirements for any client-side software.

Photo by Adrien WIESENBACH on Unsplash

Challenges Scaling R Shiny Applications

Typically, organizations kick start multiple data science initiatives as exploratory projects seeking to derive new insights from data. Compelled by the time-to-value offered by R and Shiny, the data scientists leverage the open-source stack for accelerated analysis and visualization of results. For an application that gains traction, the logical next step is to initiate a production rollout. The organization has to now scale the exploratory desktop analysis to a more visible and widely used enterprise application.

The open-source R kernel is single-threaded, lacks parallelism, and drives only one CPU core. A single instance of the R process cannot scale beyond a single CPU core. The R Shiny server launches one R process per web application to serve all the user requests. The back-end R process constrains the number of concurrent users a Shiny application can support. The inherent limitation of R impacts the scalability of R Shiny applications.

Commercial Offerings Scale R Shiny at High Price Point

Software vendors like RStudio and others may offer enhanced R offerings with enterprise-class features. Organizations looking to leverage their R investments at scale subscribe to the commercial offerings.

R Shiny Server Pro from RStudio is one such offering to scale Shiny applications with enterprise-class features. The server launches an R process for every user request. The multi-process implementation scales the Shiny application to as many concurrent users as the underlying system resources can support. Subscriptions to commercial offerings may incur significant licensing fees. At the time of writing this blog, the listed price for R Shiny Pro is around $10k/year for 20 concurrent users. To scale an R Shiny application for a thousand users would theoretically incur half a million dollars in licensing fees.

Scaling R Shiny Applications with Cloud Native Architecture

Microservices based cloud-native architectures have proven to be useful building blocks for engineering highly scalable software applications. The design builds self-contained services that are deployed in containers and managed by a container orchestrator. Docker and Kubernetes are the de-facto standards for containers and container orchestration. Kubernetes manages multiple containers as pods and essentially functions as an operating system for cloud-native applications. With the ability to independently manage lifecycle, scalability, and resource allocation for each of the services, the architecture yields a highly agile software stack.

The R Shiny application can be scaled using a cloud-native architecture. Typically the nexts step are:

  • Package the application code, its dependencies, and R Shiny Server for runtime into a Linux Docker image.
  • Deploy the containerized application to a Kubernetes cluster and configure to run multiple replica instances.
  • Configure a load balancer service to distribute the incoming requests across the collection of pods.

The application can be scaled to manage the load by dynamically increasing the number of pods.

Cloud-Native Solution with Managed Kubernetes on IBM Cloud

Managed services on cloud platforms offer are perceived as offering the best value. IBM Cloud Kubernetes Service (IKS) is a managed Kubernetes offering to deploy, scale, and manage containerized applications. Here is the solution architecture implemented using IKS clusters on IBM Cloud to scale the R Shiny dashboard.

Cloud-Native Solution for Scaling R Shiny Application on IBM Cloud

The Global Load Balancer (CIS) directs public traffic for the application’s domain URL to the two IKS clusters. The multi-node IKS clusters provisioned in two regions with nodes spanning across three zones help ensure high availability. The Load Balancer service (VPC Load Balancer) configured for each IKS cluster directs incoming traffic to Ingress ALBs (Application Load Balancer) across the zones. The Ingress ALB routes traffic to pods in nodes using cookie-based session affinity. All requests from a user session go to the same pod. The nodes are sized for every pod to have at most 2 CPU cores to accommodate single-threaded constraints of R. Scaling the number of pods will handle the growing load.

IBM’s Cloud Pak for Data includes RStudio IDE tooling to drive end-to-end Data Science. RStudio is the development IDE in the solution. The custom Toolchain in IBM Cloud implements a CICD pipeline to automate the DevOps cycle between RStudio and IKS Clusters.

Cloud-Native Implementations Offer Cost-Efficient Solutions

The cloud-native implementation leverages open-source R to scale the Shiny application. The solution does not incur any licensing fees for R software. The managed Kubernetes (IKS) on IBM Cloud offers an attractive price point to build a highly scalable solution. The three-node IKS clusters across the two regions hosting 12 pods would cost less than $15K per year. For a moderately intensive Shiny dashboard, the configuration could support up to 1000 users.

Summary

R and Shiny are popular tools to analyze data and visualize insights quickly. The single-threaded implementation of open-source R kernel constrains the scalability of R Shiny applications. Scaling R Shiny applications with commercial offerings may incur high licensing costs. The cloud-native implementation offers a cost-efficient solution to scale R Shiny applications.

--

--