Neovim Setups for Data Science

Published in

Geek Culture

11 min readMay 19, 2021

This is my FIRST blog after graduation, so just use one of my favorite photos taken in my tour to Kirishima-shi in 2019 to label it. “Life was like a box of chocolates, you never know what you’re gonna get.” Exactly it describes my past experience. I was fascinated by Economics and game theory when I was an undergraduate, but I determined to pursue the PhD degree in Marketing in business school. When I tried preparing the career in academics, I fell in love with Micro-econometrics and machine learning and made up my mind to explore the interaction between business strategies and consumers’ behaviors in industry instead of university. I also like sharing, so in recent years I discuss my experience and findings with the youngsters (I am too!) in the university. I should have begun writing earlier, and now I have a chance.

Neovim setups for data science are introduced in this post.

Data scientists need programming. However, we have different focuses about programming from those other developers have, e.g. full-stack, front-end, engineer, and etc., since we have special workflows distinct from what other developers have. For example, data scientists usually have two main tasks: one is data clear and preprocessing and the other is pipelines setup for model training. Therefore, data scientists usually need relevant languages specific for these tasks, within which R and Python are the most common choices. Sometimes other low level languages, such as cpp and fortran, are also needed if the bottleneck about speed is the real concern. Nowadays Julia seems to be the potential solution to multi-languages, and I have several trials for its early versions. However, it is expected that Julia can have its unique and dominant position in the world of data science in the future but it is also quite probable that Julia will coexist with R and Python in the long run. Therefore, data scientists usually need to operate specific languages for projects, so the references of editor setups for data science are necessary in addition to those for general developers.

Why use neovim as my first editor? First, (neo)vim key-bindings are the charming characteristics for programming especially for efficiency. Admittedly the nearly vertical learning curve for (neo)vim is mostly due to its complicated key-bindings. However, familiarity of the basic part is good enough to edit files in an efficient way (to avoid mouse). Secondly, multiple languages are naturally supported by the general editor, like neovim. Since daily works inevitably include multiple languages, ideally data scientists can stick to the same editing patterns (e.g. key-binding) throughout the projects. On the other hand, neovim can be easily extended to adapt to the programming needs for multiple languages using plugins. Some IDEs are designed for specific languages, like Rstudio and Pycharm. It seems not quite convenient to switch among different editors to adapt to different settings frequently. Therefore, neovim can easily satisfy the demands for data scientists. Thirdly, primitive neovim is preferred to the modern electron-based editors especially in terms of speed and smoothness. Frankly speaking, electron-based editors, such as Atom and VScode, are all cross-platform and also support multiple languages. Furthermore, the community about the plugins for such editors is usually popular and the design of the GUI is usually fancy and beautiful. However, electron-based apps usually perform slow and latency problem is somewhat severe. Other guys seem to have similar feelings, for example, How Vim killed Atom and VSCode on my Machine and Why I Still Use Vim.

Neovim Setups for References

In this section key plugins and related setups are introduced. One can also check my complete setup files in Github: https://github.com/AlfredSAM/medium_blogs/tree/main/Neovim_Setups_for_Data_Science/nvim. Thanks to the reference from Olivier Roques I can have the basic templates to construct these files.

Several preparation steps can be introduced at first. First of all, the latest version of neovim needs to be installed. Audience may notice that neovim, not vim, is recommended in this post. Actually neovim is the refined vim and quite popular among various variants of vim. The nightly dev version of neovim is 0.5 and it brings in relatively big changes and benefits, so I also introduce the setups for such 0.5 version. One of the greatest change for 0.5 version is that Lua is built-in. Therefore, you can find my current main file is init.lua instead of init.vim used in early versions. However, not all plugins I use now are rewritten in Lua, so I need to preserve some setups still using Vimscripts. The basic logic is that init.lua sources plugins.vim for the installation of plugins and plugins_setup.vim for the setups of portion of plugins using Vimscripts, and then setup the other stuffs using Lua. First, we need to install 0.5 version of neovim. I use Mac for both work and entertainment. With the help of Homebrew, one can easily get such latest version:

$brew install --HEAD neovim

For the linux users, installation from source is the usual way. For windows users, WSL2 is recommended and installation from source should not be the problem. BTW, the setups introduced here can be easily port to Linux/Unix systems even though I only use Mac. After installation, simply check and you’ll see

$nvim --versionNVIM v0.5.0-dev+nightly-139-gd16e9d8ed
Build type: Release
LuaJIT 2.1.0-beta3

As illustrated above, plugins are needed and then the management tool for plugins needs to be chosen. Currently I still use the classic tool of vim-plugin, and for neovim just input the following in Terminal or Iterm2:

sh -c 'curl -fLo "${XDG_DATA_HOME:-$HOME/.local/share}"/nvim/site/autoload/plug.vim --create-dirs \
       https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim'

After the installation of the 0.5 version neovim and vim-plugin, one can just simply copy the whole folder https://github.com/AlfredSAM/medium_blogs/tree/main/Neovim_Setups_for_Data_Science/nvim under the repository ~/.config. Thus in Terminal or Iterm2 input

nvim

to enter the app. The first time when one goes into neovim may find some errors, basically because of the missing plugins. So just click enter to skip the errors to go into the welcome screen and then input

:PlugInstall

and then wait for the installation to be finished. Now quit (:q!) and re-enter the neovim you will see the following screenshot:

Open nvim and see NvimTree is on the left

That is the clean opening when one opens neovim under some directory, and one can find NvimTree on the left for navigation of files. In the remaining part, several typical plugins are instructed.

General Setups for Editing

Plug 'glepnir/galaxyline.nvim' , {'branch': 'main'}
Plug 'kyazdani42/nvim-web-devicons' " for file icons
Plug 'kyazdani42/nvim-tree.lua'

NvimTree is the lightweight file explorer on the left hand side window, and galaxyline is the statusline plugin to indicate the basic information of the file on the bottom of the same window of the active buffer. They are both written in Lua and super fast. On the other hand, nvim-web-devicons is the plugin to add icons for files in neovim. Actually it is not necessary but good to use. However, the proper show-ups of icons are based on nerd-fonts. Now I am using font-fira-code-nerd-font in https://github.com/Homebrew/homebrew-cask-fonts/tree/master/Casks, which can support the icons and ligature to make the codes pretty. Generally speaking, a smarter way is also needed for file navigation besides NvimTree. fzf.vim can help find files based on the names of files and vim-ripgrep can help find files based on the pieces of codes inside files.

Plug 'junegunn/fzf', { 'do': { -> fzf#install() } }
Plug 'junegunn/fzf.vim'
Plug 'jremmen/vim-ripgrep'

The other two plugins are used for indent labeling (facilitate Python coding) and autocompletion of brackets:

Plug 'nathanaelkane/vim-indent-guides'
Plug 'Raimondi/delimitMate'

Last but not least, vim-visual-multi tries to mimic the key characteristic of Sublime Text about the operation using multiple cursors. One can check the document about the usage: https://github.com/mg979/vim-visual-multi/wiki

Plug 'mg979/vim-visual-multi', {'branch': 'master'}

Git

Data scientists usually rely on git for coding history management. Therefore, convenient git commands can usually help. Neogit is the Magit clone for neovim written in Lua and good to try.

Plug 'nvim-lua/plenary.nvim'
Plug 'TimUntersberger/neogit'

Syntax Highlighting

Starting from 0.5 version of neovim, a brand new and powerful tool for syntax highlighting called nvim-treesitter is introduced. One can check the supported languages here. Such plugin, as well as the plugins about general setups for editing, can make neovim competitive among the modern editors, especially those electron-based ones.

Traditional highlighting (left) vs Treesitter-based highlighting (right). Source: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/assets/example-cpp.png

Plug 'nvim-treesitter/nvim-treesitter', {'do': ':TSUpdate'}

Every time when one updating this plugin, one can notice that the information about parser for all languages will be also updated one by one shown under the status line. Therefore, just wait for the completion and then restart neovim to make the updates effective.

Language Server Protocol (LSP) integration

Currently, R, Python, and Julia provide their own packages to for convenient editing, which are related to Language Server Protocol (LSP): languageserver, python-lsp-server, LanguageServer.jl. With LSP support, editors can provide the features, like diagnostics, automatic completion and continuous hinting, and even auto formatting. Generally speaking, the corresponding plugins related to the LSP for specific languages are still needed to be installed to make the editor employ the functions of LSP. For 0.5 version onward, neovim has the built-in language server client to support LSP for multiple languages. We only need to install the following plugins and then make simple setups for the specific languages in use.

Plug 'neovim/nvim-lspconfig'
Plug 'ojroques/nvim-lspfuzzy'
Plug 'nvim-lua/completion-nvim'

Currently I mainly use R, Python, and Julia, so only make setups for them.

-------------------- LSP -----------------------------------
local lsp = require('lspconfig')
local lspfuzzy = require('lspfuzzy')
for ls, cfg in pairs({
  bashls = {},
  ccls = {},
  jsonls = {},
  julials = {on_attach=require'completion'.on_attach},
  r_language_server = {on_attach=require'completion'.on_attach},
  pylsp = {root_dir = lsp.util.root_pattern('.git', fn.getcwd()), on_attach=require'completion'.on_attach},
}) do lsp[ls].setup(cfg) end
lspfuzzy.setup {}
map('n', '<space>,', '<cmd>lua vim.lsp.diagnostic.goto_prev()<CR>')
map('n', '<space>;', '<cmd>lua vim.lsp.diagnostic.goto_next()<CR>')
map('n', '<space>d', '<cmd>lua vim.lsp.buf.definition()<CR>')
map('n', '<space>f', '<cmd>lua vim.lsp.buf.formatting()<CR>')
map('n', '<space>h', '<cmd>lua vim.lsp.buf.hover()<CR>')
map('n', '<space>m', '<cmd>lua vim.lsp.buf.rename()<CR>')
map('n', '<space>r', '<cmd>lua vim.lsp.buf.references()<CR>')
map('n', '<space>s', '<cmd>lua vim.lsp.buf.document_symbol()<CR>')

Interactive Modes for Data Scientists

Neovim can provide interactive modes for data scientists. One of the characteristics to distinguish between data scientists and other developers is that data scientists mostly rely on interactive modes when coding. It does not mean that other developers do not need to try running codes for test and debug. Data scientists usually need to work around the data on every aspect, so the tricky stuff is not about whether the codes are correct in terms of the syntax but the computation or visualization results based on the data needs to be examined. That is also the basic reason to trigger another electron-based editor of Jupyter for data science, which displays the code blocks and corresponding results in the interactive way. However, just like other electron-based editors, Jupyter also suffers from the performance problems, so I only use it for teaching. Within neovim, plugins are available for sending codes to the corresponding consoles to run. First, Nvim-R is the classical and popular plugins for R files. With this plugin, one can easily send the (blocks of) R codes to the R console to run and check the results. One can check the document for usage and find that neovim can behave as Rstudio in most of the aspects with the help of this plugin.

Plug 'jalvesaq/Nvim-R'

In terms of Python and Julia, another method to implement the interactive modes is preferred. The essential part for interactive modes is just to send codes to corresponding consoles, so if some plugin can provide the easy way to create the connections among buffers then the problem is solved. On the other hand, terminals within neovim are also useful for the shell commands, including the commands to open the consoles of some languages (e.g. ipython), so the ideal plugin should facilitate sending codes to terminal. Therefore, I come across the following

Plug 'kassio/neoterm'

Python files and send codes to ipython console opened in Neoterm

Concluding Remarks and Future Perspectives

In this post neovim setups for data science are shared with you. Generally speaking, I try to provide the minimal set of plugins needed for daily works on data science for your references, and anyone can revise the setup files according to their demands. Several remarks can be also raised here. First, the efficiency of the usage of neovim depends on both the quality of the editor and the familiarity of vim key-bindings. Therefore, practice can always help to memorize the key-bindings but the beginning of the tour to vim is the most painful. However, many resources can help, and for example, vimtutor is the basic tutorial in Mac/Linux/Unix systems. On the other hand, one can also check the documents of the above plugins as well as the reference setups in the folder, and check the current designs of key-bindings for specific plugins. Secondly, all of the setups of neovim are within the nvim folder, so the portability of neovim is also beneficial in practice. For example, one can open the git repository to save the nvim folder, and then pull the same settings for difference machines. On the other hand, nowadays powerful servers are available for model training and usually the some Linux system without GUI is the usual OS for server. Portability of neovim can indicate the advantages for remote works by just copying the related files to the server for setup. When working remotely, one can just ssh to the server and use the remote neovim in the server as IDEs for multiple languages.

Two issues need to be illustrated here as well. One is that the discussion is based on the nightly dev version of neovim, implying that the settings may need revisions in the future probably according to the newly built packages written in Lua. The other is that neovim is the command-line program in nature, and then it cannot employ some features so well. For example, when opening some files using neovim in Mac I can only use Terminal or Iterm2 to navigate to the designated repository at first. Simply clicking the file to open with neovim cannot be straightforward to setup. On the other hand, currently Org mode in Emacs is also popular and versatile for notes taking besides traditional Markdown. However, plugins for Org mode in neovim/vim are not that satisfactory, and advanced features beyond command-line may be required. In order to compensate these or other potential issues, sublime text is another favorite editor in use. Another post introducing the setups of sublime text for data science is on the way.