R Markdown: My Favorite Tool for Conducting and Sharing Analysis
Have you ever inherited some data analysis work from a colleague, only to immediately get lost in a mess of disjoint scripts, spreadsheets, notes, and slides? If you’re anything like me, you’ve probably been confused about which script to run first, which data files correspond to which graphs, or (in particularly dire situations) why you can’t reproduce your coworker’s results.
I’ve been in this unenviable position before (and, admittedly, have left others in that situation too — sorry!). But having experienced this particular conundrum a few times, I’ve come to learn the importance of clear documentation, which makes transitioning work from one analyst to another much, much easier.
But even though most data scientists would agree that comprehensive documentation is important, poorly documented analysis is still common within data science teams. Why? Two reasons come to mind.
Time: We often have limited time in which to do our work. Documenting our work usually takes a back seat to actually finishing it. People often plan to revisit our analysis later to clean it up, but this doesn’t always happen.
Lack of Standard: There isn’t a widely observed standard for documenting analysis work. Should we just comment our code well? Put instructions at the top of each file? Make a README and push everything to GitHub?
No perfect solution exists, as different projects may have different requirements (e.g., differing levels of complexity, team sizes, etc.). However, one tool has significantly improved my ability to document my work and share it with others.
Enter R Markdown. (Python users, don’t stop reading here. R Markdown supports Python!)
R Markdown is an open-source tool that combines code, documentation, and output into a single file. It allows you to connect text, code, graphs, and more, all in one interactive report that can be run and re-run with ease.
While commonly used within the RStudio IDE, R Markdown supports multiple languages, including R, Python, SQL, C++, Julia, and more. You can “knit” (produce) an R Markdown report in a variety of document formats (including HTML, PDF, MS Word, MS PowerPoint, and more) for easy sharing. (For concrete examples of R Markdowns, browse the R Markdown Gallery.)
R Markdown documents are reproducible by nature, so you can feel confident that your coworkers won’t get weird results when they re-run your analysis (assuming, of course, that they use the same data and package versions, and that your code doesn’t introduce any random numbers without seeds).
Here are some of the key benefits that R Markdown offers me:
- Within R Markdown, I can document my work as I write my code instead of creating separate documentation later. And by “document,” I don’t just mean adding comments to my code — R Markdown reports can include formatted text, tables, bulleted/numbered lists, equations, and images. This is a big time-saver, because documenting my work after the fact often requires a lot of time and mental horsepower — to document effectively, I have to remember the details of and reasons behind each choice I made in my analysis.
- Re-running my work is a breeze with R Markdown. If I receive some new data and want to update my results, I can simply re-knit my document, and my output updates accordingly. This is much faster and more consistent than having to re-run chunks of code, render and save graphs and tables, and then manually update separate PowerPoint slides.
- All of my work is contained in one file with R Markdown.
- Since I document as I go, it’s easier to keep my work organized. While doing my work, I think about the analysis and how it’ll look when it’s shared with others.
- The document runs from top to bottom when it “knits,” so I’m forced to keep my code in a logical order.
Flexible Output Formats
- R Markdown lets me produce documents suitable for all readers, both technical and non-technical alike. I can show or hide any given code chunk in the report depending on my intended audience.
- I can include a hyperlinked table of contents and/or tabbed pages in HTML output files for quick navigation.
- I can create and reuse template documents with custom formatting. For example, I often start a new document with the template below, which automatically includes my company’s logo and fonts. The template saves me time, and has a sleek look.
As with anything, there are some downsides to using R Markdown. In my opinion, the benefits far outweigh the drawbacks, but the negatives I can think of are as follows:
- Though I think it’s relatively easy to use, there is a slight learning curve for beginners.
- It is tricky (but possible) to use R Markdown outside of the R Studio IDE. There are a lot of features built into R Studio that help run and knit your documents, so you’d miss out on a lot by using another IDE. People who are much more comfortable in other IDEs may not like being forced to use R Studio.
When to Use R Markdown
I typically use R Markdown for one-off analysis, like the initial exploratory phase when you first work with a new dataset. It’s a convenient way to learn more about the data with a full suite of tools at my disposal while simultaneously documenting my findings.
Routine reports are also well suited for R Markdown. I once created an R Markdown document that tested whether a given dataset would fit the assumptions of a certain inventory model. From then on, all I had to do was point the report to the new data and re-knit it with a single click.
If you’re working on a project related to app/software development, it likely doesn’t make sense to use R Markdown for your app’s infrastructure. But it can still be helpful for any one-off analyses you perform during development (perhaps to analyze app runtime under varying conditions, or a similar secondary task).
The information available on RStudio’s website can help you create your first R Markdown document in minutes. For help with more advanced features of R Markdown (e.g., knit parameters, interactive documents), I recommend the free online version of R Markdown: The Definitive Guide.
With so many benefits, R Markdown has become my standard tool for sharing analysis with others. It doesn’t prevent me from ever touching Excel or PowerPoint, but it saves me lots of time, and its output has garnered nothing but praise from those who’ve seen my work.
If you’ve never used R Markdown, try it out for your next one-off project. Once you get comfortable with its features, it may very well become your new standard format for sharing analysis work.
Frequently Asked Questions
Which output format should I use?
Unless you have a specific reason otherwise, I recommend the standard HTML format. It supports more formatting options (collapsible code chunks, tabbed pages, floating tables of contents, etc.) than the other formats, and gives your output a polished look.
Also, you may face issues with page breaks when using some of the other output formats (like PDFs) — cleanly spreading your work across multiple pages can be a challenge.
Don’t Jupyter Notebooks do the same thing as R Markdown?
R Markdown and Jupyter notebooks have similar functionality. They both allow you to combine your code and documentation into a single file. However, there are some pros and cons for each tool.
- Formatting: With the multitude of formatting features built in to R Markdown, it’s the better choice for producing elegant, “client-friendly” reports. However, if your goal is just to document your code quickly so you can pass it off to a coworker, the extra formatting features may not be relevant to you.
- Git Tracking: R Markdown files are saved as plain text, so it’s easy to track their changes using Git. It’s comparatively difficult to track changes in Jupyter notebooks due to the fact that they’re stored in JSON format.
- IDE Support: Most users run and edit R Markdown documents from R Studio, a full-fledged IDE. It includes many of the features that make IDEs so useful, including a variable explorer, a file browser, a package manager, auto-complete, linting, and more. While Jupyter notebooks can be integrated into an IDE (such as PyCharm), most users run them from their browser, which limits their functionality.
- Simpler UI: The user interface for Jupyter notebooks is a bit simpler than the user interface for R Markdown in RStudio. Beginners may find it easier to interact with Jupyter notebooks.
There may be other benefits unique to Jupyter Notebooks of which I’m not aware. I’ve used Jupyter notebooks on a handful of occasions, but not nearly as many times as I’ve used R Markdown. However, when doing research for this article, I found many more posts from users describing the pros of R Markdown documents over Jupyter Notebooks than vice versa.
How do I use Python or other languages in R Markdown?
The reticulate package allows you to use Python with R Markdown. Check out the documentation here.
For information regarding how to use other languages besides R or Python, this guide should be able to help you out.