Basic Builds :: How to update data in a Shiny App on RStudio Connect
Note: This article (while still accurate) represents an outdated approach. You can find the latest best practices here.
Basic Builds is a series of articles providing code templates for data products published to RStudio Connect
- Link to full code template (one Shiny app, one R Markdown document)
Introduction: Application data and how to find it
Building data products with open source R packages like shiny
, rmarkdown
and plumber
can be a powerful way to both explore and communicate with data. This article is a discussion on how I think about data and data storage in relation to application “data product” code in a production context.
I’ve built a Shiny application (or R Markdown report, or Flexdashboard, or …) that uses data from a CSV file that I have access to in my local development environment. RStudio Connect makes it easy to bundle up this application code and the small CSV file, reproduce information about my local environment, and run the application on a production server.
This works great, it is a totally valid publishing workflow. But it might not be what I actually wanted to do. Building useful, robust data products, means also having the vision and experience to anticipate the future.
Here are the questions that I always seek to answer when I’m developing an MVP/POC or “Version 1” of a data product:
- Vision: What kind of impact am I looking to make with this application?
- Experience: What do I know about my production environment that will influence how I engineer this application?
Start by thinking simply.
- If this application is useful today, will it still be useful in a week, or a month?
- Will the data that feeds it still be valid a week or month from now?
- How will you deliver new data to the application?
- Can the delivery of new data be automated in some way?
We know that bundling a small data file with the application code will work in production, but if that data needs to be updated (even as infrequently as once a month), it would be nice to have the data update and app publication process be automated.
The good news is, programmatic deployment (automated application publishing to RStudio Connect) is also a totally valid workflow. But before you run off and build that, I can share that from my experience, programmatic deployment is likely going to be engineering-overkill.
The Wonders of R Markdown Output Files — on RStudio Connect
If you’re trying to solve this automation problem and RStudio Connect is your production environment, the tools are already built-in; all you need is the experience to know how to piece the right parts together. This article is a how-to framework for creating basic, automated, ETL processes with R Markdown output files on RStudio Connect. But before I get to the how-to, there are a few important things to know first:
- RStudio Connect has a basic scheduling tool — the catch is that it’s only available for certain types of content. You can set up an R Markdown document to render on a schedule, but you can’t use Connect to schedule updates for content with a Shiny runtime.
- Don’t try to combine multiple content types into a single publishing bundle. Different documents need to be published to different content locations on Connect. This guide will show you how content published to different locations can still be linked into a pipeline.
- You should also know that Connect uses sandboxed processes to protect certain areas on the file system. This has significant development implications. On RStudio Connect, you can’t do something simple like
write.csv()
and expect the result of that action to be available in the same way it would be on your local file system. Luckily, there are some simple solutions — and they’re easy to learn.
Goal: Build a Shiny application backed by a data.csv
file that gets automatically updated without having to re-deploy anything to RStudio Connect
Solution: Change where the data lives, and automate the data update process
Our original plan was to package a CSV file up and publish it along with the shiny application. A better plan might be to leverage R Markdown scheduling and rsc_output_files
to have our application access the latest CSV file over HTTP.
To do this, I’ll need to create a second content item — an R Markdown document, to publish on RStudio Connect.
Step 1: Create a new R Markdown document
Set up the YAML header to use rsc_output_files
:
---title: "Output File Framework for R Markdown ETL on RStudio Connect"output: html_documentrmd_output_metadata: rsc_output_files: - "data.csv"---
Step 2: Use R Markdown code chunks to extract and transform data
```{r}df <- data.frame(a=rnorm(50), b=rnorm(50), c=rnorm(50), d=rnorm(50), e=rnorm(50))```
Every time this report is rendered, it will a new random data frame. Creating dummy data is not representative of a typical ETL process. You’ll likely want to replace this section with code that pulls data from a database or API.
- Best practices for working with databases can be found at db.rstudio.com
- The
httr
package is a good place to start when working with REST APIs and the http protocol
Step 3: Write the data file
Add an R Markdown code chunk to write the final data frame to a CSV file.
```{r}write.csv(df, “data.csv”, row.names=FALSE)```
This creates the data.csv
output file we specified in Step 1. There are actually two ways to specify output files, which you can read about in the RStudio Connect user guide: How to work with output files
Step 4: Add a download link (optional)
Make a download link to share the output file from your report using standard Markdown syntax:
#### Here is the data generated from this report: [data.csv](data.csv)
Step 5: Publish
Publish the R Markdown document to RStudio Connect. Make sure to publish with source code.
Step 6: Configure Access and Scheduling
Once the R Markdown document is published to RStudio Connect, use the publisher settings tools to set access controls and the rendering schedule.
Note on Access: The CSV file will be subject to the same authorization as your report. For the most secure experience, keep access controls tight. Shiny applications can perform authenticated HTTP requests using a Connect API Key. Find examples of using
httr
to perform authenticated HTTP requests here: https://docs.rstudio.com/connect/user/cookbook.html#cookbook-configuring-your-scripts
Step 7: Update the Shiny application
Assuming that you aren’t using strict access controls, the update required to the Shiny application code might be quite minimal.
In my case, it was simply changing this:
data <- read.csv('data.csv')
To this:
data <- read.csv('https://colorado.rstudio.com/rsc/content/2352/data.csv')
While this solution is simple, there’s still a chance that a long-running R process might not show the most recent data. It’s likely a good idea to put a check in place, or use other Shiny tools like reactiveFileReader
and reactivePoll
to monitor for changes. Winston Chang created a nice gist example of that type of app workflow here: https://gist.github.com/wch/9652222
One more cool thing
Because output files are versioned along with the rendering of their report, they can be accessed from the History viewer tool.
Adding a data link (optional Step 4) is useful if you anticipate wanting to download those historical versions.
Conclusion: ETL on RStudio Connect made easy
I hope that learning about rsc_output_files
will save you some time and energy going forward.
- Accessing output files over HTTP can be a good alternative to setting up a shared read/write persistent storage location on RStudio Connect.
- If output files aren’t a good solution for your use case, think about the benefits of using a database.
- Finally, if a database is still out of the question, be sure to read this article before moving to a persistent storage solution on Connect.
Resources: