rMarkdown tricks from Kaggle trenches

Fran Pérez
3 min readSep 4, 2017

--

rMarkdown is great because it helps to annotate the design process, blending the design’s documentation with development code and charts in a single file. I will show some tricks which enhance my day-to-day work in my projects

Save rMarkdown file environment

In my experimentation process, there are many times I want to recover the exact model produced by the document (to further work by hand, contrast with another model, etc). This is very easy just adding at the end of the document a command to save the R environment. This environment is the one generated by compiling the rMarkdown file, and is not mixed with the “user” environment (usually, the R console)

When I need to recover the document environment, I just need to type:

Save rMarkdown file environment on error

Former trick can be extended to be used when the document doesn’t compile (for any reason). In case there is an unexpected error (most of the times, alongside a very “unfriendly” text error), a “forced” way to debug is to save the environment produced by the document until the error, load this environment in the R console and then manually type into the console the code in the chunk provoking the error.

To save the document environment generated until the error is thrown, I introduce the following command at the beginning of the document

Define your compilation flags alike

One of the cons developing using rMarkdown, is that the code tends to be a big monolith spaghetti code. This requires some thinking when you want to compile just a section of the code. For example, If I need to run the document ignoring a PCA optimization, or just ignoring the model training (because I just want to take a look at the data produced by the cleaning stage). In order to simulate these “compilations flags”, I define (at the beginning of the document) the flags as logical variables, and after, I used them in the attributes of the chunck

The evalattribute triggers the compilation of the chunk, and the include include attribute appends chunk and chunk’s output execution in the document’s generated output.

Define CACHE chunks (and know when to dispose them off)

Compiling documents may become a tedious task, mainly because it takes much time than expected. In order to reduce compilation time, I suggest turning cache attribute on critical chunks. We can define cache dependencies giving a chunk a name, so in the end, this mechanism is quite complex.

The con of this trick is that the cache gets degenerated sometimes. For this reason, the stored environment’s document by former trick may become ruined (the file will only include new code updated since last execution). For this reason, I recommend using this attribute at your own risk. And anytime you get a WTF error, just remove the cache and regenerate the rMarkdown document. To remove the cache, you can delete the folder with the prefix _cache placed in your working folder. Another option, it’s clicking on the Knitr dropdown, and then select Clear Knitr Cache … or execute following command (replacing folder name by the one used in your project)

If you need to further research on rMarkdown options, I suggest you take a look to the knitr package help

--

--