My Fully Remote Research Workflow in VS Code

Prasanna Parasurama
7 min readAug 14, 2020

--

Since I started my Ph.D., I have been experimenting with various workflows, software, and tools for my research. After 2 years, I’ve (finally?) settled on a workflow that I’m happy with and have been using for the past few months. I’m writing this post to share my workflow, which hopefully serves as a helpful guide for the curious.

My Research Requirements

My research is data-heavy (sometimes using TBs of data that won’t fit on my laptop). So, at a minimum, my setup should support the following well:

  • Python — For data cleaning, wrangling, simulations, etc. This is my default programming language.
  • Spark — For big data needs
  • R — For statistical analyses
  • LaTeX — For writing manuscripts
  • Zotero — For research and reference management
  • A unified setup — i.e. one IDE for everything
  • A remote setup — i.e. everything on a remote virtual machine

Why a Unified Setup?

It’s easier and more efficient to work in a single environment than multiple environments. For example, prior to my current setup, I used PyCharm for python, RStudio Server for R, and Sublime Text (with LateX Tools) for LaTeX. Even though each of them supported their respective functions really well, learning shortcuts for 3 separate environments became painful.

Why a Remote Setup?

By “remote setup”, I mean a set up where all the compute and storage happens on a remote virtual machine (the server), and my laptop (the client) is simply an interface to interact with the VM. The reasons I want a remote setup are:

  • No memory, storage, compute constraints — With a remote setup, I’m not constrained by my Macbook Pro’s measly 8GB of RAM. If I need more memory or compute on the VM, it’s super simple to add.
  • Safer — I don’t have to worry about my laptop dying and all of my work being lost. Swapping out the client should be nearly costless.
  • Data Backups are easier — I think it’s easier to do data backups on a remote machine on a continuous basis (I talk about this more towards the end of the post).

Why not use just a VNC, you may be wondering. Because VNC does not feel native. I still like to use other apps (browsing, PowerPoint) and features on my MacBook (trackpad, various customizations, etc.). A VNC takes away from the “unified” spirit of the setup.

Although the following guide describes a remote setup, it doesn’t have to be remote — all the benefits of the unified setup remain.

My Setup

Below I describe my setup and workflow for research, coding, and writing manuscripts. My laptop is a 4-year-old Macbook Pro, and my server is a VM running CentOS, provisioned by the university’s computing center. I assume some familiarity with the Linux OS in the following guide.

Coding and Writing Manuscripts (VS Code)

VS Code (or Visual Studio Code) is the centerpiece of my research workflow for all coding and writing needs. It’s everything it’s older estranged sibling Visual Studio is not: free, open-source, fast, lightweight, tons of extensions, and great community support.

Remote Development in VS Code

Download VS Code and the Remote Development Extension Pack extension, which allows for full remote development— that is, all the files will be stored, edited, and run on the remote server.

Follow the instructions here to set up a new remote project.

Python Extensions

We need 2 extensions for python integration: the Python extension pack and the recently released pylance extension, which adds additional python features. Installing an extension is super simple as described here.

VS Code also has a python interactive mode, combining the best of Jupyter notebook with an IDE.

R Extensions

We need the VS Code VSCode-R and R-LSP extensions as well as the R language server, package, which we can install in R:

install.packages("languageserver")

See here for more detailed instructions and customization.

Make sure to enable the Session Watcher feature in the VSCode-R extension for viewing plots, dataframes, etc.

LaTeX Extension

I use the LaTeX Workshop extension, which provides a full set of features one would expect in a LateX IDE (auto-complete, snippets, shortcuts, synctex, etc.)

Other Useful Extensions

Research and Reference Management (Zotero)

I use Zotero for research and reference management. There are 2 extensions that make Zotero exponentially better.

Zotero Connector

Zotero Connector is a chrome extension (available on other browsers as well), that allows us to add research articles, reports, news, etc. directly from webpage to the library. Zotero will take care of filling in all the necessary bibliographical information automatically (this works 95% of the time, which is much better than the alternative Mendeley).

Better BibTeX for Zotero

Zotero can export a bibliography file, which can then be used for in-text citation in LaTeX. A pain point, however, is that every time a new item is added to the Zotero library (via the Zotero Connector, for example), we have to re-export the bibliography file. Better BibTeX for Zotero solves this problem by auto-exporting libraries when they change. This makes it incredibly easy to cite as we write.

Zotero VS Code Extension

If LaTeX is not your style, there is also a general VS Code Extension for Zotero.

SSHFS for Mounting Remote Drives

Often times it’s helpful to have direct access to remote machine files to view or edit outside of VS Code (for example, editing files in powerpoint, word, etc). We can use OSX Fuse to mount any remote machine folder directly on the laptop.

Download OSX Fuse and SSHFS from here and install it on your laptop. Then, run this command to mount the desired directory:

sshfs yourusername@your.remote.host:path/to/directory/on/remote /path/to/directory/on/local -o reconnect,idmap=user,allow_other,default_permissions

Version Control and Data Backups

I use git (good starter tutorial) for version control for all of my code and manuscript. VS Code provides excellent support for git, which is particularly useful on large collaborative projects.

One point of departure for version control in research from traditional software development is that it is good practice to “version control” large data files as well. There are options like git lfs for version controlling large files, but I’m not a big fan for many reasons that I won’t go into in this post. Instead, I use rclone to create full backups of the entire project for every major milestone (submissions, major revisions, etc.)

Rclone is a command-line program to sync remote files to cloud storage (Google Drive, Dropbox, S3, etc.). See here for instructions on how to install it on the remote server.

I use Google Drive as my cloud storage (since I get “unlimited” storage from the university) and use rclone to copy the entire project folder for every major milestone. I also have a “current” backup to sync my project every so often. Note that “sync” copies only changed files, whereas “copy” copies everything.

Some Alternatives

RStudio Server

For those who prefer working in RStudio for R, RStudio Server is a good alternative, while still preserving a remote setup. RStudio Server can run on the virtual machine (with all the benefits of remote development), and we can use our laptop to access the interface via a web browser (It looks identical to RStudio). See the instructions here for installation. It does require some familiarity with Linux and port forwarding to set it up.

Once installed, start the RStudio server on the remote machine (by default it is served on port 8787). We need to forward this port to access on our laptop, which we can do by adding the following LocalForward option in the ssh config (for MacOS it’s in ~/.ssh/config). Now pointing the browser to localhost:1234 will bring up RStudio Server.

Host yourVM
HostName your.remote.host
User yourusername
ForwardAgent yes
ForwardX11Trusted yes
LocalForward 1234 localhost:8787

About Me: I’m a PhD student at NYU.
Website: parasurama.github.io | Twitter: prasanna_parasu

--

--