Environment Setups for Data Scientists under Garuda Linux

Published in

Geek Culture

15 min readJun 13, 2021

(The plan for the setups of Sublime Text is deferred, since ST4 is just available recently. Actually I just tried it when it was open to download. However, the key plugin LSP will be substantially upgraded in near future, so maybe I can just wait for it and check it out. Some days ago, I also raised the discussion here, audience who is interested in can also check it. On the other hand, the preliminary setup files can be found here.)

In this post environment setups for data scientists under Garuda Linux will be introduced. The first reason to talk about the setups in Linux is that many data scientists enjoy the user experience in Linux, and I just hope to extend my experience from MacOS to Linux to share it with all of you. Secondly, I am just fascinated by the related posts or videos (e.g. Introducing The Garuda Linux Community Distro Challenge!; My Garuda Linux Tweaks; Garuda Linux on a 2013 15" MBP; Is Garuda LINUX is worth using for daily usage????, etc.) about the stylish designs of its color themes, icons, modules, and layouts. Generally speaking, I always try to achieve the optimal balance among usability, efficiency, and beauty for daily coding experience, and such fashionable Linux system just deeply draws my attention. On the other hand, usability and efficiency in Linux system has not ever disappointed me, so I determine to have a try. The home page and the above posts or videos just introduce the easy steps to install the system and thus the brief technical aspects of the system. In this post I mainly focus on the setups in the viewpoints of data scientists.

An episode about my experience on Linux can be also shared here. The first time I just installed Linux was my third year PhD stage. Even though I was in business school the development of data science also activated me to explore how to employ the computational techniques to test the hypotheses derived by economics or marketing theories. On the other hand, I was taking a course about empirical modeling in which the lecturer just shared her early coding experience when she was a PhD student in the USA. That is the impressive story that her supervisor shared the complete notes of codes about the algorithms used in empirical models and tried to explain them to her in very details. I was deeply shocked by such kindness and generousness, and at the same time I just heard Fortran, a classical programming language for computation. I just reviewed the information in websites and found many materials about the usage of Fortran in unix-like systems. Therefore, I just tried installing Ubuntu in the computer. The early impression about Linux is good especially when I got used to the thoughts about the management of packages and documents. However, the experience about Fortran is somewhat harsh. I like coding in data science, but I find the debug process bloody horrible for Fortran. Unfortunately, I also touched cpp in my later stage for Rcpp. In recent years, I mainly use MacOS for office and home usage. MacOS is also the unix-like system and shares similar user experience in terms of programming as Linux systems. Of course, the compilation in MacOS indeed seems to bring in unanticipated troubles regularly. For example, please check my other recent discussion here. On the other hand, the daily works often require Microsoft Office and potentially other softwares which are not easily substituted for by other softwares in Linux. However, if data scientists are not required to make up too many PPTs then Linux is always a good choice.

Basic system information is shared here. As illustrated by the home page, Garuda Linux is arch based system. Among different desktop distros, I just choose Garuda KDE Dr460nized. Since it is not recommended to install Garuda Linux in virtual machines, I just find my old computer put aside for around 5 years to have a trial. The basic information about hardwares are as follows:

CPU: Intel i7-6700HQ (8) @ 3.5GHz
GPU: NVIDIA GeForce GTX 1060 Mobile
Memory: 2.27GiB / 7.65 GiB

Please check the folder of this blog: https://github.com/AlfredSAM/medium_blogs/tree/main/Environment_Setups_Data_Science_Garuda_Linux, within which the detailed instruction is here.

Upgrade the System after the Installation

After installation (using bootable USB), just reboot and go into Garuda Linux for the first time. The Garuda assistant should show up to ask to refresh the mirror-lists for the downloads of related packages. Thus, the system will ask whether to upgrade the system, and just press “yes” to wait for the upgrade process to complete. During the process, users are requested to input the password to authorize sudo. On the other hand, the responses to input y are also required to put the process forward. After the above upgrade process, the system also suggest installing the softwares. It is advised to NOT doing this step at this moment and reboot the system at first. Since Garuda Linux is the Arch based system which will provide rolling updates regularly. It is suggested to run the following before the installation of other softwares:

sudo pacman -Syu

Currently, Add/Remove Software no longer launches due to the change of pacman 6.0. The fastest way to figure it out is to reinstall libalpm library:

sudo pacman -Syu libalpm12

Please reboot and check it out: https://forum.garudalinux.org/t/problem-with-updates/5288. However, the GUI for Add/Remove Software is not important since the general Linux users should not care using terminal.

Installation of Brave Browser

Brave browser is preferred because of its built-in functions to block Ads and trackers, although Firefox is already installed be default. The installation is straightforward:

sudo pacman -S brave-bin

After installation, just set it as default browser by revising ~/.profile , which just records the environment variables when starting the session:

export BROWSER=brave

Please be sure to install brave-bin instead of brave since some hidden errors may occur. For example, I find the blocking functions not work for brave but normal for brave-bin. Please also check the discussion: https://forum.garudalinux.org/t/brave-crashing-every-few-minutes/1975.

Installation of Kitty

Kitty is the recommended terminal emulator for Garuda Linux as well as MacOS. Generally speaking, data science relies on coding and the command-line works, so the demands for the excellent terminal emulator are usually large. Please also check others’ comments on kitty: Kitty — Fast, Featureful, GPU Based Linux Terminal Emulator; Kitty Is A Fast And Feature Rich Terminal Emulator. In Garuda Linux, it seems to encourage to use konsole for shell commands. However, when coding in konsole I just discover some latency issues when using neovim. On the other hand, alacritty is a fast, cross-platform, and OpenGL terminal emulator and installed in Garuda by default. It seems to be a better choice than konsole, but it seems not support ligature currently which I am really fond of for coding: https://github.com/alacritty/alacritty/issues/50. Therefore, kitty is just a good choice. Kitty is also fast and GPU-based terminal emulator with tmux-like features embedded, and it can support ligature well if the appropriate fonts are set:

sudo pacman -S kitty

Please also refer to my setup file kitty.conf which should be put in the folder ~/.config/kitty/. I just choose FiraCode Nerd Font Mono for ligature, which can be installed using Add/Remove Software .

Install FiraCode in Add/Remove Softwares

One remark about the usage of kitty in MacOS is about the vertically misaligned problem for displaying nerd fonts, including FiraCode Nerd Font Mono , ONLY for MacOS. Please check https://github.com/kovidgoyal/kitty/issues/2022 and you can find the discussion is up until recently. That is the basic reason made me change back to iTerm2 before. However, in order for this post I just have a trial on the latest version of kitty in MacOS, and find that this problem fixed. Therefore, I just change back to kitty in MacOS and employ the same kitty.conf shared with you.

Installation of Files Readers and LibreOffice

The general office reading and editing tools are usually required for lives and works, and then several suggestions are given here. At the very beginning, only Okular is installed for document viewer. However, qview is recommended to view pictures and Zathura is recommended to view pdf. qview is so lightweight and fast image viewer and it can be easily installed in Add/Remove Software .

In terms of the pdf reader, zathura is a good choice especially for vim users and please check: https://wiki.archlinux.org/title/Zathura:

sudo pacman -S zathura

and other plugins are available to enhance its functions: https://wiki.archlinux.org/title/Zathura

Nowadays, Microsoft Office 365 and Google both provide the online office apps, and the users can always utilize them for documents editing if such cloud services are purchased and allowed by the companies they work for. Similarly, the individual usage of them also depends on his or her purchase and the trusts on the cloud. Otherwise, LibreOffice seems to be the alternative:

sudo pacman -S libreoffice

and select the stable version of libreoffice-still. On the other hand, tex is always helpful for math writing, so the following plugin is also installed:

sudo pacman -S libreoffice-extension-texmaths

Of course, compatibility problems between LibreOffice and other popular office tools always exist. Nowadays, WSL2 is substantially improved such that users may come back to windows to exploit the benefits from the Linux kernels and the advantages from Microsoft Office.

Homebrew setups

In order to copy the experience from MacOS, I also explore the Homebrew in Linux. Homebrew in Linux is still in its initial stage and it seems that limited amount of softwares can be installed smoothly. The tricky point about installation is that the instructions about Homebrew are mainly based on bash/zsh, but Garuda Linux use fish by default. The merits indicated by Fish vs. Zsh vs. Bash and Why You Should Switch to Fish just let me have a try on fish. However, some minor revisions of the codes are needed to comply with the syntax of fish. Generally speaking, the config file of fish is in /home/.config/fish/config.fish. One can use micro, the default editor in Garuda Linux to open such config file. It is noticed that source ~/.profile is in the file, which means that this file automatically load the setups from ~/.profile. From the home page of Homebrew, the suggested installation command is

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

However, this command is NOT valid in fish. The revised code is

bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
exit

raised by Unable to install Homebrew + fish terminal (Mac). Just temporarily go into bash ti execute the installation command and then come back to the fish. After the installation, the following steps are needed to let brew initialize every time when opening the fish:

echo 'eval (/home/linuxbrew/.linuxbrew/bin/brew shellenv)' >> /home/alfredfaisam/.profile

inspired by https://github.com/Homebrew/brew/issues/10114. On the other hand, it seems that Homebrew will create temporary files in /tmp which may return the errors about permission. Therefore, one can add the following line in ~/.profile:

echo 'export HOMEBREW_TEMP=/var/tmp' >> /home/alfredfaisam/.profile

inspired by https://github.com/Linuxbrew/brew/issues/923. Now we can shut off and then reopen the kitty, and then input

brew update

If the process can complete without errors, then the installation of Homebrew is successful. Other related packages can be installed now to support the functions of Homebrew:

sudo pacman -S base-devel

base-devel is a package group that includes tools needed for building (compiling and linking). It is also suggested to install the latest version of gcc:

brew install gcc

Up until this point, users can just use Homebrew as they do in MacOS. To be honest, currently Homebrew seems to target the Mac users in most of the time and provides limited supports for Linux users. For example, some errors about compilation exist for some softwares, such as Julia. Now, I just use Homebrew to install Neovim and Transmission, and they seem to work. One can also check https://cli-ck.io/transmission-cli-user-guide/ for the usage of Transmission.

Installation of Neovim

Installation of neovim is straightforward using homebrew . One can also refer to my last post about the experience using neovim as IDE for data scientists: https://alfredfaisam.medium.com/neovim-setups-for-data-science-5ea251e3735f. Now just use brew to install Neovim:

brew install --HEAD luajit
brew install --HEAD neovim

Then Install vim-plug for plugins management for neovim :

sh -c 'curl -fLo "${XDG_DATA_HOME:-$HOME/.local/share}"/nvim/site/autoload/plug.vim --create-dirs \
     https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim'

Change directory to ~/.config, just copy my suggested config folder for Neovim from https://github.com/AlfredSAM/medium_blogs/tree/main/Neovim_Setups_for_Data_Science/nvim. In kitty just input nvim to open it for the first time, and then one may see some errors due to the missing plugins. Therefore, one can just input

:PlugInstall

to wait for the installation to finish. Thus use :q! to quit and then reopen nvim, one can find that the latest files for treesit will be downloaded and compiled and the information is shown at the left bottom of the nvim window. Just wait for it to complete, and now nvim is ready to use. As indicated at the beginning of this post, ST4 is just available and the important plugin LSP seems to have substantial upgrade in near future, so the detailed instructions will be shared later. However, one can also refer to https://github.com/AlfredSAM/medium_blogs/blob/main/Environment_Setups_Data_Science_Garuda_Linux/Instructions.md for the installation of ST4 in Garuda Linux. Besides, the tips about the plugins for ST4 is still available now: https://github.com/AlfredSAM/medium_blogs/blob/main/Sublime-Text-4_Setups_for_Data%20Science/Sublime%20Backups.txt.

Miniconda Installation

conda is suggested for setup the Python or R environments for Data science. It seems that the above are just the advice for the basic usage for Garuda, and here is the real tips about the setups for data science. In general, conda is treated as the tool for packages management for the Python users, just as pip . However, conda is also the preferred for environments management for both Python and R users. Several projects usually coexist for the typical data scientist, and within which multi languages may be also applied. One can easily imagine that different projects may depend on different sets of packages even for the same language. Therefore, the requirement for the isolated environments is the vital. By isolated environments I mean

The change of the setups about the packages in one environment will NOT effect other environments.
The Python in one environment is NOT related to the Python in the other environment. The same is also applicable for R.
If some errors occur to make the messy stuffs for one environment, then the solution which is always available it just to completely remove that environment.

One can also check https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#managing-environments for more details about how to use conda to manage environment. On the other hand, my teammate Peter LO, also the expert in data science, recently explores Guix functional package manager, and also shares his experience about how to apply it for data scientists: https://peterloleungyau.github.io/post/guix_intro_1_motivation/. I believe that the management ideas and concepts are great and valuable, and in the future other tools also refer to. Here I only cover conda which should be the simple stuff for cross-platform users. In Garuda Linux, just use kitty to go to ~/Downloads, and then download the installation package:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

and then install it:

bash Miniconda3-latest-Linux-x86_64.sh

Please note that at the end of the installation the program will ask whether to add init commands. However, even input yes for this prompt one can still find that conda is NOT loaded when reopening the kitty. The reason is that the init commands are only added to the bash by default. Now one can check ~/.bachrc and should find additional lines about conda init are added at the end of this file. However, Garuda Linux use fish by default, so the above commands are not effective. One can input

/(your conda installation path)/bin/conda init fish

and then one can find the following lines are added to ~/.config/fish/config.fish:

#>>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
eval your/path/to/anaconda3/bin/conda "shell.fish" "hook" $argv | source
# <<< conda initialize <<<

Now reopen the kitty and then input

conda update conda

one can find the program can run as normal. One can also use

/(your conda installation path)/bin/conda init zsh

to add conda init commands to ~/.zshrc since we may also use zsh when necessary.

Setup Conda Environments for Python and R

It is natual to construct the separate environment for Python users: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html . On the other hand, (base) environment is usually kept clean to make sure the problems only affect other new built environments. Therefore, removing other environments is always the direct option to solve the problem and do it again. Just refer to the process about building the new environment for Python in the website, and then activate the new environment for the following setups. The relevant issue about computation efficiency using Python needs to provide optimized BLAS frameworks for numpy, and mkl is the ideal choice. Therefore, just activate the new Python environment and then use the following command to install numpy with mkl as the dependency:

conda install -c conda-forge numpy libblas=3.9.0=9_mkl

Other packages relying on numpy should be also optimized in terms of the matrix computation, like PyTorch. In order for the LSP of Python to work properly, the following plugins are also recommended:

pip install 'python-lsp-server[all]' python-lsp-black mypy-ls pyls-isort

On the other hand, separate environment for R is also available. For example, one can construct the new environment using the following R_4_mkl.yml:

name: R_4.0_mkl
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.8
  - conda-forge::r-base=4.1.0
  - conda-forge::libblas=3.9.0=9_mkl

It will construct the environment using R 4.1 named R_4.0_mkl. One may notice that one Python interpreter of the version 3.8 is also installed for this environment be default, so that one can also provide setups for Python packages within this environment besides R. One can also find that this yml just requires the dependency about matrix computation in R to also employ mkl. This is also another benefit to use R in conda environment. Sometimes, to link to mkl for BLAS/LAPACK is so tricky and not that flexible. However, with the help of conda this job can be made so easy. Similarly, after the setup one also needs to activate the environment and then install install.packages("languageserver") for LSP of R. Of course, conda also provides a bunch of R packages for installation: https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/. The key benefits of doing so is that the tree of R packages will be recorded and easily exported as the yml files for the cooperators to replicate. However, it seems that the compilation using install.packages()within R is required for the optimized performance especially for MacOS. For example, please check my raised discussion about the installation of xgboost in MacOS: https://github.com/dmlc/xgboost/issues/7017.

Installation of Julia

Installation of Julia in Garuda Linux is illustrated here. In MacOS, I just use Homebrew to install Julia by brew install --cask julia. However, the trial to install Julia using Homebrew in Garuda Linux fails and some errors exist for compilation and cmake files. Therefore, just conduct the installation by hand:

mkdir ~/opt
cd ~/opt
wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.1-linux-x86_64.tar.gz
tar -xvf julia-1.6.1-linux-x86_64.tar.gz

Run julia:

~/opt/julia-1.6.1/bin/julia

Furthermore, one can add the following to $PATH

echo 'export PATH="$HOME/opt/julia-1.6.1/bin:$HOME/.local/bin:$PATH"' >> ~/.profile

so that only inputting julia can run the program. Currently, conda does NOT provide the updated julia app to help build the isolated environment for julia. On the other hand, julia has its own env setups for different projects, and please check https://towardsdatascience.com/how-to-setup-project-environments-in-julia-ec8ae73afe9c. In general, julia can just construct the folder with separate environment. Basically the packages from different julia environments DO NOT interact, so the requirement about isolation can be also satisfied. It is also suggested to install the LSP package of julia for distinct projects. In julia console, just input

using Pkg
Pkg.add("LanguageServer")

Summary

In this post, basic setups for Garuda Linux in the viewpoint of data scientists are shared with you. With the great help of currently prosperous open-source tools, data scientists can lead a much easier life than before. One more issue about the usage of nvim with the above configurations about environments can be remarked here. If someone would like to employ some conda environment for some project or folder. Just use conda activate <env_name> to activate the target environment and then input nvim . Thus, nvim can read the LSP settings about Python or R under such conda environment automatically. For julia, it is very similar and no need to bother which conda environment is in use, since julia is installed globally. Therefore, just go to the target folder with the specific julia environment setup files, like Manifest.toml and Project.toml , and then input nvim . Now nvim can also read the LSP setups for the specific julia environment if julia LanguageServer is already installed for such julia environment or project.