Jupyter And R Markdown: Notebooks With R

Karlijn Willems
DataCamp
Published in
12 min readDec 1, 2016

Originally published at https://www.datacamp.com/community/blog/jupyter-notebook-r

When want to get started working on data science problems, you first might want to consider setting up an interactive environment to work and share your code for a project with others.

Or let’s say you just want to communicate about the workflow and your analysis’ results.

You’ll already see me coming: notebooks are perfect for both situations. In these two cases, you want to combine plain text with rich text elements such as graphics, calculations, etc.

Today’s blog post will focus on the two notebooks that are popular with R users, namely, the Jupyter Notebook and the R Markdown Notebook. You’ll discover how to use these notebooks, how they compare to one another and what other alternatives exist.

R And The Jupyter Notebook

Contrary to what you might think, Jupyter doesn’t limit you to working solely with Python: the notebook application is language agnostic, which means that you can also work with other languages.

There are two general ways to get started on using R with Jupyter: by using a kernel or by setting up an R environment that has all the essential tools to get started on doing data science.

Running R in Jupyter With The R Kernel

As described above, the first way to run R is by using a kernel. If you want to have a complete list of all the available kernels in Jupyter, go here.

To work with R, you’ll need to load the IRKernel and activate it to get started on working with R in the notebook environment.

First, you’ll need to install some packages. Make sure that you don’t do this in your RStudio console, but in a regular R terminal, otherwise you’ll get an error like this:

Error in IRkernel::installspec() : Jupyter or IPython 3.0 has to be installed but could neither run “jupyter” nor “ipython”, “ipython2” or “ipython3”. (Note that “ipython2” is just IPython for Python 2, but still may be IPython 3.0)$ R > install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest'))

This command will prompt you to type in a number to select a CRAN mirror to install the necessary packages. Enter a number and the installation will continue.

> devtools::install_github('IRkernel/IRkernel')

Then, you still need to make the R kernel visible for Jupyter:

# Install IRKernel for the current user > IRkernel::installspec() # Or install IRKernel system-wide > IRkernel::installspec(user = FALSE)

Now open up the notebook application with jupyter notebook. You'll see R appearing in the list of kernels when you create a new notebook.

Using An R Essentials Environment In Jupyter

The second option to quickly work with R is to install the R essentials in your current environment:

conda install -c r r-essentials

These “essentials” include the packages dplyr, shiny, ggplot2, tidyr, caret, and nnet. If you don’t want to install the essentials in your current environment, you can use the following command to create a new environment just for the R essentials:

conda create -n my-r-env -c r r-essentials

Now open up the notebook application to start working with R.

You might wonder what you need to do if you want to install additional packages to elaborate your data science project. After all, these packages might be enough to get you started, but you might need other tools.

Well, you can either build a Conda R package by running, for example:

conda skeleton cran ldavis conda build r-ldavis/

Or you can install the package from inside of R via install.packages() or devtools::install_github (to install packages from GitHub). You just have to make sure to add the new package to the correct R library used by Jupyter:

install.packages("ldavis", "/home/user/anaconda3/lib/R/library")

If you want to know more about kernels or about running R in a Docker environment, check out this page.

Adding Some R Magic To Jupyter

A huge advantage of working with notebooks is that they provide you with an interactive environment. That interactivity comes mainly from the so-called “magic commands”.

These commands allow you to switch from Python to command line instructions or to write code in another language such as R, Julia, Scala, …

If you want more details about magic commands, on how to set up a notebook, where to download the application, how you can run the notebook application (via Docker, pip install or with the Anaconda distribution) or other details, check out our Definitive Guide or the original article.

The R Notebook

Up until recently, Jupyter seems to have been a popular solution for R users, next to notebooks such as Apache Zeppelin or Beaker.

Also, other alternatives to report results of data analyses, such as R Markdown, Knitr or Sweave, have been hugely popular in the R community.

However, this might change with the recent release of the R or R Markdown Notebook by RStudio.

You see it: the context of the R Markdown Notebook is complex, and it’s worth looking into the history of reproducible research in R to understand what drove the creation and development of this notebook. Ultimately, you will also realize that this notebook is different from others.

If you want to know more about the history of R Notebooks, check out the original article.

R Markdown Versus Computational Notebooks

R Markdown is probably one of the most popular options in the R community to report on data analyses. It’s no surprise whatsoever that it is still a core component in the R Markdown Notebook.

And there are some things that R Markdown and notebooks share, such as the delivering of a reproducible workflow, the weaving of code, output, and text together in a single document, supporting interactive widgets and outputting to multiple formats. However, they differ in their emphases: R Markdown focuses on reproducible batch execution, plain text representation, version control, production output and offers the same editor and tools that you use for R scripts.

On the other hand, the traditional computational notebooks focus on outputting inline with code, caching the output across sessions, sharing code and outputting in a single file. Notebooks have an emphasis on an interactive execution model. They don’t use a plain text representation, but a structured data representation, such as JSON.

That all explains the purpose of RStudio’s notebook application: it combines all the advantages of R Markdown with the good things that computational notebooks have to offer.

That’s why R Markdown is a core component of the R Markdown Notebook: RStudio defines its notebook as “an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input”.

How To Work With R Notebooks

If you’ve ever worked with Jupyter or any other computational notebook, you’ll see that the workflow is very similar. One thing that might seem very different is the fact that now you’re not working with code cells anymore by default: you’re rather working with a sort of text editor in which you indicate your code chunks with R Markdown.

How To Install And Use The R Markdown Notebook

The first requirement to use the notebook is that you have the newest version of RStudio available on your PC. Since notebooks are a new feature of RStudio, they are only available in version 1.0 or higher of RStudio. So, it’s important to check if you have a correct version installed.

If you don’t have version 1.0 or higher of RStudio, you can download the latest version here.

Then, to make a new notebook, you go to File tab, select”New File”, and you’ll see the option to create a new R Markdown Notebook. If RStudio prompts you to update some packages, just accept the offer and eventually a new file will appear.

Tip: double-check whether you’re working with a notebook by looking at the top of your document. The output should be html_notebook.

You’ll see that the default text that appears in the document is in R Markdown. R Markdown should feel pretty familiar to you, but if you’re not yet quite proficient, you can always check out our Reporting With R Markdown course or go through the material that is provided by RStudio.

Starting a new R Notebook in RStudio

Note that you can always use the gear icon to adjust the notebook’s working space: you have the option to expand, collapse, and remove the output of your code, to change the preview options and to modify the output options.

This last option can come in handy if you want to change the syntax highlighting, apply another theme, adjust the default width and height of the figures appearing in your output, etc.

From there onwards, you can start inserting code chunks and text!

You can add code chunks in two ways: through the keyboard shortcut Ctrl + Alt + I or Cmd + Option + I, or with the insert button that you find in the toolbar.

What’s great about working with these R Markdown notebooks is the fact that you can follow up on the execution of your code chunks, thanks to the little green bar that appears on the left when you’re executing large code chunks or multiple code chunks at once. Also, note that there’s a progress bar on the bottom.

You can see the green progress bar appearing in the gif below:

Running an R Notebook in RStudio

Talking about code execution: there are multiple ways in which you can execute your R code chunks.

You can run a code chunk or run the next chunk, run all code chunks below and above; but you can also choose to restart R and run all chunks or to restart and to clear the output.

Note that when you execute the notebook’s code, you will also see the output appearing on your console! That might be a rather big difference for those who usually work with other computational notebooks such as Jupyter.

If there are any errors while the notebook’s code chunks are being executed, the execution will stop, and there will appear a red bar alongside the code piece that produces the error.

You can suppress the halt of the execution by adding errors = TRUE in the chunk options, just like this:

```{r, error=TRUE} iris <- read.csv(url("http://mlr.cs.umass.edu/ml/machine-leaning-databases/"), header = FALSE) names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species") ```

Note that the error will still appear, but that the notebook’s code execution won’t be halted!

How To Use R Markdown Notebook’s Magic

Just like with Jupyter, you can also work interactively with your R Markdown notebooks. It works a bit differently from Jupyter, as there are no real magic commands; To work with other languages, you need to add separate Bash, Stan, Python, SQL or Rcpp chunks to the notebook.

These options might seem quite limited to you, but it’s compensated in the ease with which you can easily add these types of code chunks with the toolbar’s insert button.

Also working with these code chunks is easy: you can see an example of SQL chunks in this document, published by J.J Allaire. For Bash commands, you just type the command. There’s no need extra characters such as ‘!’ to signal that you're working in Bash, like you would do when you would work with Jupyter.

How To Output Your R Markdown Notebooks

Before you render the final version of a notebook, you might want to preview what you have been doing. There’s a handy feature that allows you to do this: you’ll find it in your toolbar.

Click on the “preview” button and the provisional version of your document will pop up on the right-hand side, in the “Viewer” tab.

By adding some lines to the first section on top of the notebook, you can adjust your output options, like this:

--- 
title: "Notebook with KNN Example"
output:
pdf_document:
highlight: tango
toc: yes
html_notebook:
toc: yes
---

To see where you can get those distributions, you can just try to knit, and the console output will give you the sites where you can download the necessary packages.

Note that this is just one of the many options that you have to export a notebook: there’s also the possibility to render GitHub documents, word documents, beamer presentation, etc. These are the output options that you already had with regular R Markdown files. You can find more info here.

Tips And Tricks To Work With R Notebook

Besides the general coding practices that you should keep in mind, such as documenting your code and applying a consistent naming scheme, code grouping and name length, you can also use the following tips to make a notebook awesome for others to use and read.

  • Just like with computational notebooks, it might be handy to split large code chunks or code chunks that generate more than one output into multiple chunks. This way, you will improve the general user experience and increase the transparency of a notebook.
  • Make use of the keyboard shortcuts to speed up your work. You will find most of them in the toolbar, next to the commands that you want to perform.
  • Use the spellchecker in the toolbar to make sure your report’s vocabulary is correct.
  • Take advantage of the option to hide your code if a notebook is code-heavy. You can do this through code chunk options or in the HTML file of the notebook itself!

The R Notebook Versus The Jupyter Notebook

Besides the differences between the Jupyter and R Markdown notebooks that you have already read above, there are some more things.

Let’s compare Jupyter with the R Markdown Notebook!

There are four aspects that you will find interesting to consider: notebook sharing, code execution, version control, and project management.

Notebook Sharing

The source code for an R Markdown notebook is an .Rmd file. But when you save a notebook, an .nb.html file is created alongside it. This HTML file is an associated file that includes a copy of the R Markdown source code and the generated output.

That means that you need no special viewer to see the file, while you might need it to view notebooks that were made with the Jupyter application, which are simple JSON documents, or other computational notebooks that have structured format outputs. You can publish your R Markdown notebook on any web server, GitHub or as an email attachment.

There also are APIs to render and parse R Markdown notebooks: this gives other frontend tools the ability to create notebook authoring modes for R Markdown. Or the APIs can be used to create conversion utilities to and from different notebook formats.

To share the notebooks you make in the Jupyter application, you can export the notebooks as slideshows, blogs, dashboards, etc. You can find more information in this tutorial. However, there are also the default options to generate Python scripts, HTML files, Markdown files, PDF files or reStructured Text files.

Code Execution

R Markdown Notebooks have options to run a code chunk or run the next chunk, run all code chunks below and above; In addition to these options, you can also choose to restart R and run all chunks or to restart and to clear the output.

These options are interesting when you’re working with R because the R Markdown Notebook allows all R code pieces to share the same environment. However, this can prove to be a huge disadvantage if you’re working with non-R code pieces, as these don’t share environments.

All in all, these code execution options add a considerable amount of flexibility for the users who have been struggling with the code execution options that Jupyter offers, even though if these are not too much different: in the Jupyter application, you have the option to run a single cell, to run cells and to run all cells. You can also choose to clear the current or all outputs. The code environment is shared between code cells.

If you want to know more about version control and project management options in Jupyter and R Notebook, go to the original article.

Alternatives to Jupyter or R Markdown Notebooks

Apart from the notebooks that you can use as interactive data science environments which make it easy for you to share your code with colleagues, peers, and friends, there are also other alternatives to consider.

Because sometimes you don’t need a notebook, but a dashboard, an interactive learning platform or a book, for example.

You have already read about options such as Sweave and Knitr in the second section. Some other options that are out there, are:

  • Even though this blog post has covered R Markdown to some extent, you should know that you can do so much more with it. For example, you can build dashboards with flexdashboard.
  • Or you can use Bookdown to quickly publish HTML, PDF, ePub, and Kindle books with R Markdown.
  • Shiny is a tool that you can also use to create dashboards. To get started with Shiny, go to this page.
  • In an educational setting, DataCamp Light might also come in handy to create interactive tutorials on your blog or website. If you want to see DataCamp light at work, go to this tutorial, for example.

Originally published at www.datacamp.com.

--

--