Basic Builds :: How to update data in a Shiny App on RStudio Connect

Kelly O'Briant
Apr 19 · 6 min read

TLDR; Leverage R Markdown output files to create simple ETL processes on RStudio Connect.

Basic Builds is a series of articles providing code templates for data products published to RStudio Connect

Introduction: Application data and how to find it

Building data products with open source R packages like shiny, rmarkdown and plumber can be a powerful way to both explore and communicate with data. This article is a discussion on how I think about data and data storage in relation to application “data product” code in a production context.

I’ve built a Shiny application (or R Markdown report, or Flexdashboard, or …) that uses data from a CSV file that I have access to in my local development environment. RStudio Connect makes it easy to bundle up this application code and the small CSV file, reproduce information about my local environment, and run the application on a production server.

This works great, it is a totally valid publishing workflow. But it might not be what I actually wanted to do. Building useful, robust data products, means also having the vision and experience to anticipate the future.

Here are the questions that I always seek to answer when I’m developing an MVP/POC or “Version 1” of a data product:

  • Vision: What kind of impact am I looking to make with this application?
  • Experience: What do I know about my production environment that will influence how I engineer this application?

Start by thinking simply.

  • If this application is useful today, will it still be useful in a week, or a month?
  • Will the data that feeds it still be valid a week or month from now?
  • How will you deliver new data to the application?
  • Can the delivery of new data be automated in some way?

We know that bundling a small data file with the application code will work in production, but if that data needs to be updated (even as infrequently as once a month), it would be nice to have the data update and app publication process be automated.

The good news is, programmatic deployment (automated application publishing to RStudio Connect) is also a totally valid workflow. But before you run off and build that, I can share that from my experience, programmatic deployment is likely going to be engineering-overkill.

The Wonders of R Markdown Output Files — on RStudio Connect

If you’re trying to solve this automation problem and RStudio Connect is your production environment, the tools are already built-in; all you need is the experience to know how to piece the right parts together. This article is a how-to framework for creating basic, automated, ETL processes with R Markdown output files on RStudio Connect. But before I get to the how-to, there are a few important things to know first:

  1. RStudio Connect has a basic scheduling tool — the catch is that it’s only available for certain types of content. You can set up an R Markdown document to render on a schedule, but you can’t use Connect to schedule updates for content with a Shiny runtime.
  2. Don’t try to combine multiple content types into a single publishing bundle. Different documents need to be published to different content locations on Connect. This guide will show you how content published to different locations can still be linked into a pipeline.
  3. You should also know that Connect uses sandboxed processes to protect certain areas on the file system. This has significant development implications. On RStudio Connect, you can’t do something simple like write.csv() and expect the result of that action to be available in the same way it would be on your local file system. Luckily, there are some simple solutions — and they’re easy to learn.

Goal: Build a Shiny application backed by a data.csv file that gets automatically updated without having to re-deploy anything to RStudio Connect

Solution: Change where the data lives, and automate the data update process

Our original plan was to package a CSV file up and publish it along with the shiny application. A better plan might be to leverage R Markdown scheduling and rsc_output_files to have our application access the latest CSV file over HTTP.

To do this, I’ll need to create a second content item — an R Markdown document, to publish on RStudio Connect.

Step 1: Create a new R Markdown document

Set up the YAML header to use rsc_output_files:

Step 2: Use R Markdown code chunks to extract and transform data

Every time this report is rendered, it will a new random data frame. Creating dummy data is not representative of a typical ETL process. You’ll likely want to replace this section with code that pulls data from a database or API.

Step 3: Write the data file

Add an R Markdown code chunk to write the final data frame to a CSV file.

This creates the data.csv output file we specified in Step 1. There are actually two ways to specify output files, which you can read about in the RStudio Connect user guide: How to work with output files

Step 4: Add a download link (optional)

Make a download link to share the output file from your report using standard Markdown syntax:

Step 5: Publish

Publish the R Markdown document to RStudio Connect. Make sure to publish with source code.

Step 6: Configure Access and Scheduling

Once the R Markdown document is published to RStudio Connect, use the publisher settings tools to set access controls and the rendering schedule.

Note on Access: The CSV file will be subject to the same authorization as your report. For the most secure experience, keep access controls tight. Shiny applications can perform authenticated HTTP requests using a Connect API Key. Find examples of using httr to perform authenticated HTTP requests here: https://docs.rstudio.com/connect/user/cookbook.html#cookbook-configuring-your-scripts

Step 7: Update the Shiny application

Assuming that you aren’t using strict access controls, the update required to the Shiny application code might be quite minimal.

In my case, it was simply changing this:

To this:

While this solution is simple, there’s still a chance that a long-running R process might not show the most recent data. It’s likely a good idea to put a check in place, or use other Shiny tools like reactiveFileReader and reactivePoll to monitor for changes. Winston Chang created a nice gist example of that type of app workflow here: https://gist.github.com/wch/9652222

One more cool thing

Because output files are versioned along with the rendering of their report, they can be accessed from the History viewer tool.

Adding a data link (optional Step 4) is useful if you anticipate wanting to download those historical versions.

Conclusion: ETL on RStudio Connect made easy

I hope that learning about rsc_output_files will save you some time and energy going forward.

  • Accessing output files over HTTP can be a good alternative to setting up a shared read/write persistent storage location on RStudio Connect.
  • If output files aren’t a good solution for your use case, think about the benefits of using a database.
  • Finally, if a database is still out of the question, be sure to read this article before moving to a persistent storage solution on Connect.

Resources:

  • The code R Markdown and Shiny application code framework for this project can be found here
  • More information on how to use output files on RStudio Connect can be found here