Simple Tools for Creating Websites with Reproducible Research Capabilities

(with R packages and Rails)

By Goran Gruić, Founder of Top Floor Labs


Scientific claims published in different publications are often very hard to verify. They are usually based on complex data analysis and their findings are written in shorter or longer summaries but without attached data and maybe with simplified computation process. That kind of presentation makes such claims not as strong as they maybe are. If you can’t test the claim by yourself you’re then just left to decide if you’re going to trust that author or not. Trust is good, but not when we are talking about scientific method.

Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. — Reproducible Research — Johns Hopkins University | Coursera

In practice, if author wants to make his research available to others he has to put it on the web, and putting it to the web means investing not trivial amount of time and effort, finding best way to organize files, datasets, etc., maybe even developing a website for that purpose. And the reader also has problems because he has to download data, code and text separately, and then try to understand what author did with each part of code and data. For sure it would be better if we could have everything (code, data, text) linked together in one easy to follow document. That approach is often called literate statistical programming; more about the concept here.

So, how do we write our reproducible literate statistical program/analysis? Roger Peng made an interesting lecture here. He says that for the start, we should use statistical software whose operation can be coded. It means we should be able to make analysis by writing sequences of instructions which manipulates the data. Also, it would be good to save the data in non-proprietary formats because that way it won’t be automatically “locked” to one software vendor but available to everybody (csv, json …).

In this article, our statistical software is going to be R.

Many users think of R as a statistics system. We prefer to think of it as an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. — What is R

The idea here is to make an analysis in R and present it in a format which could easily be converted to HTML (the language of the web) with additional R packages.

The mentioned format is going to be R markdown. You can check how those documents look like on that link.

R Markdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R. It combines the core syntax of markdown (an easy-to-write plain text format) with embedded R code chunks that are run so their output can be included in the final document. R Markdown documents are fully reproducible (they can be automatically regenerated whenever underlying R code or data changes). — R markdown

After we have working R environment (recent version of R), enhanced with couple of packages from CRAN (install.packages(“rmarkdown”)), you only need a text editor and some basic knowledge of Markdown and HTML.

Create example R Markdown document (called input.Rmd) and save it somewhere in a folder.

Example rmarkdown Document
==================================
Here is some text.
Here is some code
```{r}
set.seed(1)
x <- rnorm(100)
mean(x)
```

Then run this in R console.

library(rmarkdown)
setwd(<folder path where input.Rmd is>)
rmarkdown::render('input.Rmd', output_file = 'output.html')

Now, you should have your HTML document in the same folder.

Ok, it’s nice, but it would be nicer if somehow we could include those HTML files in a proper web application. For that, we’ll need some kind of interface to R which could be called from the web application. I personally like OpenCPU very much, check it out why on its website.

OpenCPU is a system for embedded scientific computing and reproducible research. The OpenCPU server provides a reliable and interoperable HTTP API for data analysis based on R. You can either use the public servers or host your own. — https://www.opencpu.org/

We’ll continue working in our R console for now; we’ll install OpenCPU from CRAN.

install.packages('opencpu')

After successful installation, we’ll start OpenCPU server with

library('opencpu')

And the result will look like this:

OpenCPU server starts on the random port. We’ll see later why it would be useful to make it always start on the same port, let’s say 5307, so we’ll do exactly that:

opencpu$stop()
opencpu$start(5307)

We now have our OpenCPU server running on port 5307.

OpenCPU uses standard R packaging to develop, ship and deploy web applications. Now, we’ll install an OpenCPU application (which is R package). Basically, it’s a fork of this repository with small changes.

library('devtools')
devtools::install_github("ggruic/markdownapp")

Check if it works by entering http://localhost:5307/ocpu/library/markdownapp/www/ in the browser.

Now, we can say we have the basic web application. If we want more than that we could integrate this solution in one of the frameworks for web development.

We’ll use Ruby on Rails framework which is well known for its convention over configuration approach that allows rapid prototyping.

Check if you have working installation of ruby and rails (let’s say version 4 something). If that’s the case you should be able to create a web application in minutes.

rails new rr
cd rr
bundle
rails generate controller home index

Open the app/views/home/index.html.erb file in your text editor. Delete all of the existing code in the file, and replace it with the following lines:

<h1>Hello, this is Reproducible Research Demo Website!</h1>
<a href="/rscripts">RR scripts</a>

Then, open the file config/routes.rb in your editor and add following code just before last “end”.

root 'home#index'

Now, run the rails application by entering

rails server

And check if it’s available in your browser on http://localhost:3000

Next step is to add gem opencpu to Gemfile. Open Gemfile and add this line at the end

gem 'opencpu'

You have to bundle your gems again (you can open another command prompt/terminal for doing this)

bundle

Use scaffold to generate basic CRUD actions (and lots more)

rails generate scaffold Rscript title:string code:text
rake db:migrate

Now, you can click on “RR scripts” link on homepage, and after that on “New Rscript” link on http://localhost:3000/rscripts

Insert the code from our example R Markdown script and click “Create Rscript”.

You’ll get something like this.

That’s not what we need. We need R Markdown translated into Markdown and then Markdown into HTML.

Modify “show” action in rscripts controller (apps/controllers/rscripts_controller.rb) to look like this

def show
@client = OpenCPU.client
@mark = @client.prepare :markdownapp, 'rmdtext', data: {text: @rscript.code}
end

We are “reusing” R code from installed “markdownapp” R package.

Create a new file named opencpu.rb in config/initializers folder. It will tell rails the location (url) of the OpenCPU server.

OCPU_SERVER_LINK = 'localhost:5307/ocpu'
OpenCPU.configure do |config|
config.endpoint_url = 'http://'+OCPU_SERVER_LINK
config.timeout = 30 # Timeout in seconds
config.verify_ssl = false
end

Add those lines at the end of app/views/rscripts/show.html.erb

<div id=”rmarkdown_target”></div>
<script>
$( document ).ready(function() {
$("#rmarkdown_target").html('<iframe src="<%=@mark.location + 'files/output.html'%>" style="top: 0px; right: 0px; width: 800px; height: 1200px; border: 0; margin: 0; z-index: 999999;"></iframe>');
});
</script>

Restart your rails server, refresh http://localhost:3000/rscripts/1 and you should see HTML with text, code and results; something like this:

That’s it. Now, only your imagination is the limit.

For example — if you add this code chunk with “New Rscript”

```{r block1}
library("networkD3")
src <- c("A", "A", "A", "A", "B", "B", "C", "C", "D")
target <- c("B", "C", "D", "J", "E", "F", "G", "H", "I")
networkData <- data.frame(src, target)
simpleNetwork(networkData)
```

(of course, make sure you have networkD3 and htmlwidgets packages installed in R) you will see that

Wow.

Etc, etc …

What are the next steps if you want to go further with this application?

  • OpenCPU server performance & security. Two implementations of OpenCPU are available: a single-user server that runs inside an interactive R session, and a cloud server that builds on apache and nginx. The single-user server is intended for development and local use only (that’s what we’ve installed). Check the OpenCPU Server Manual for any questions.
  • Making your website more pretty (hint: bootstrap), more secure (hint: devise), more … check here for ideas …

The code for this article is available on Github, first repository for rails application, and second for R markdown package.

Like what you read? Give Goran Gruic a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.