Deep Learning Setup in Arch Linux: From Start To Finish with PyTorch + TensorFlow + Nvidia CUDA + Anaconda

This article serves two purposes: 1) give a — almost complete — step by step guide on installing Arch Linux with PyTorch + Nvidia CUDA/cuDNN + Anaconda + TensorFlow CPU(for tensorboard) for anyone looking into using Arch for that and 2) I always end up searching for the same information when I install Arch so I need a guide tailored for my machine!

Some troubleshooting can be found at the end of this article but really it only involves the issues I had to deal with during this installation so it’s not set in stone, just what worked for me. Some aren’t that important but I do have a slight OCD with my dmesg -l err,warn being relatively clean i.e., no errors whatsoever and no warnings that suggest instability.

Obviously the proper way to install Arch is to follow the official guide as you learn a lot in the process by reading the material. The Arch community is very active and most questions/problems you might have will be answered in the forums.

OK now that we got all this out of the way let’s jump right in!


Pre-installation

Installation medium booted successfully and we are looking at the prompt in the virtual environment.

Keyboard layout

Load a console keymap to match your keyboard layout. List available layouts using ls /usr/share/kbd/keymaps/**/*.map.gz. Load by running loadkeys <yourkeymap>.

Boot mode

Verify UEFI mode by checking if the directory is populated.

# ls /sys/firmware/efi/efivars

Let’s ping something

The install media provides netcl . The following will bring up a gui to setup your wireless connection, assuming you’re using wireless.

# wifi-menu
# ping www.google.com

Set the clock

Update the system clock and check its status.

# timedatectl set-ntp true
# timedatectl status

Ready the disk

Partition the disk using parted. I want my system to have swap, root and an EFI partition which is required anyway. If parted complains about alignment you can either ignore(don’t) or fix it by playing around by rounding to powers of 2 until the warnings stop. Run lsblk to find your drive.

# parted /dev/sda
mklabel gpt
mkpart ESP fat32 1MiB 513MiB
set 1 boot on
mkpart primary linux-swap 513MiB 4.5GiB
mkpart primary ext4 4.5GiB 100%

Enable swap and check using swapon --show.

# mkswap /dev/sda2
# swapon /dev/sda2

Format partitions, guide mentions that if using parted it’s not needed but I did it anyway.

# mkfs.fat -F32 /dev/sda1
# mkfs.ext4 /dev/sda3

Mount it all

# mount /dev/sda3 /mnt
# mkdir /mnt/boot
# mount /dev/sda1 /mnt/boot

Installation

Pre-installation complete! We can now install Arch Linux.

Install base packages

# pacstrap /mnt base base-devel

Generate fstab

Generate mnt/etc/fstab and check it matches docs.

# genfstab -U /mnt >> /mnt/etc/fstab

Chroot into the new system

# arch-chroot /mnt

Timezone and hwclock

# ln -sf /usr/share/zoneinfo/Core Worlds/Coruscant /etc/localtime
# hwclock --systohc

Locale

Uncomment en_US.UTF-8 UTF-8 in/etc/locale.gen or whatever language you want. Run locale-gen to generate the file. Set the LANG accordingly.

/etc/locale.conf
---
LANG=en_US.UTF-8

If the keyboard layout was changed make changes permanent by editing /etc/vconsole.conf.

/etc/vconsole.conf
---
KEYMAP=uk

Hostname

/etc/hostname
---
hostname

Add matching entry to hosts.

/etc/hosts
---
...
127.0.1.1 myhostname.localdomain myhostname

Network config

I like to use NetworkManager for managing my network connections. For details read the official docs.

# pacman -S networkmanager

Initramfs

Creating a new initramfs is not required but I’ve had issues in the past so I re-create it at this point.

# mkinitcpio -p linux

Root passwd

Set a root password by running passwd.

Boot loader

I prefer grub as my boot loader. Given that I have Intel CPU I also enable microcode updates by installing intel-ucode.

# pacman -S grub intel-ucode

Install the grub UEFI application.

# grub-install --target=x86_64-efi --efi-directory=boot/ 
--bootloader-id=grub

Generate the config file. Microcode updates will be added automatically.

# grub-mkconfig -o /boot/grub/grub.cfg

Reboot

# exit
# umount -R /mnt
# reboot

Post-installation

Hopefully everything worked just fine and we are logged into a working Arch environment. Now to the post-installation adventure!

Users

It’s always a good idea to not run things as root. Add a new user.

# useradd -m -G wheel -s /bin/bash yoda
# passwd yoda

To grant sudo access run visudo and add yoda ALL=(ALL) ALL.

Trim support for SSD

Add support for SSD trim. For more info check docs. Verify trim support by running lsblk -D. disc-gran/disc-max should not be empty if enabled.

The util-linux package provides fstrim.service and fstrim.timer. Enabling the timer will activate the service weekly.

# systemctl start fstrim.timer
# systemctl enable fstrim.timer

Enable NetworkManager service

After enabling run nmtui to connect via gui.

# systemctl start NetworkManager.service
# systemctl enable NetworkManager.service

Firewall

iptables is already installed. I like ufw. More info at docs.

# pacman -S ufw
# ufw enable // only once, when package is installed
# systemctl start ufw
# systemctl enable ufw

Xorg

Install Xorg display server if you want a desktop environment later.

# pacman -S xorg-server xorg-xinit xterm

The Deep Learning Stuff

How to install and configure Nvidia drivers, CUDA, cuDNN, Anaconda, TensorFlow and PyTorch with a sprinkle of troubleshooting at the end.

Nvidia drivers

To identify your card run lspci | grep -e VGA -e 3D. I have Nvidia GeForce GTX 960. Install the drivers.

# pacman -S nvidia nvidia-utils

The nvidia package contains a file which blacklists the nouveau module, so rebooting is necessary. In addition there is now support for DRM kernel mode switching.

To enable add the nvidia-drm.modeset=1 kernel parameter in etc/defaults/grub, and add nvidia, nvidia_modeset, nvidia_uvm and nvidia_drm to your initramfs in /etc/mkinicpio.conf. The following will update grub, initramfs and also configure Xorg.

# grub-mkconfig -o /boot/grub/grub.cfg
# mkinitcpio -p linux
# nvidia-xconfig

To avoid the possibility of forgetting to update initramfs after updates I created a pacman hook as mentioned here.

/etc/pacman.d/hooks/nvidia.hook
---
[Trigger]
Operation=Install
Operation=Upgrade
Operation=Remove
Type=Package
Target=nvidia
[Action]
Depends=mkinitcpio
When=PostTransaction
Exec=/usr/bin/mkinitcpio -P

*** see troubleshooting for resolution and framebuffer issues.

CUDA — cuDNN

Super easy just works out of the box. Read the docs if you want and you’re done by simply doing the following.

# pacman -S cuda cudnn

Installer suggests to logout out so it can be put to the path. Once logged back in we can test the system by running the samples provided by Nvidia; namely deviceQuery.

The cuda package places files in /opt/cuda. Copy the samples/ folder to the home directory and run make. Run deviceQuery in samples/bin/. The output should look like the following; it should pass the test.

Anaconda

Follow the instructions on the website to download. If using terminal then do the following and just go through the steps. You can create a conda environment to host your deep learning stuff but I don’t bother with that.

# wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-  
x86_64.sh
# bash https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-
x86_64.sh

TensorFlow

I install TensorFlow mainly for tensorboard visualization so the CPU version is enough for me. Great documentation at the website if you want to install the GPU version.

# pip install --ignore-installed --upgrade 
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-
1.2.1-cp36-cp36m-linux_x86_64.whl

Check that it’s working by running the following python code.

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

PyTorch

Follow the instructions given at the PyTorch website for your installation.

# conda install pytorch torchvision -c pytorch

Given that pytorch channel is added we can easily update the package by running conda update pytorch torchvision in the future. To test I run the following python code which also shows if the GPU can be accessed.

import torch
print(torch.cuda.is_available())

…troubleshooting 1

Your system is not currently configured to drive a VGA console on the primary VGA device. The NVIDIA Linux graphics driver requires the use of a text-mode VGA console. Use of other console drivers including, but not limited to, vesafb, may result in corruption and stability problems, and is not supported.

This warning can be seen when running dmesg -l err,warn. I wanted that to go cause I don’t want to deal with potential corruptions and instabilities in the future. I also found it was not allowing me to have a high-res console (more on that later).

You can find info about this here. The NVIDIA driver does not provide an fbdev driver for the high-resolution console for the kernel compiled in vesafb module. However, the kernel compiled in efifb module supports high-resolution nvidia console on EFI systems. Following the instructions here didn’t solve it for me. After searching I found this reply to Nvidia’s forums by a moderator.

The message is a little misleading in UEFI mode. What it means it that the GPU was initialized to a graphical mode using the legacy VGA BIOS, regardless of whether the system was booted in UEFI mode or not. Typically this happens if the Compatibility Support Module (CSM) is enabled in the system BIOS. If you have an option to disable CSM in the SBIOS, please try that.

This solved my problem. In my BIOS I had to change the following BIOS Config->Windows8->CMS Support->Never.

…troubleshooting 2

Low resolution console. The reason why this didn’t work for me was because of the problem with the CSM in BIOS. After fixing that, following the instructions in that post resulted in a high-res console. In a nutshell I changed the following and regenerated the grub config.

etc/defaults/grub
---
GRUB_CMDLINE_LINUX="nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off rhgb quiet"
GRUB_GFXMODE=2560x1440x32 
GRUB_GFXPAYLOAD_LINUX=keep
GRUB_TERMINAL_OUTPUT="gfxterm"
Like what you read? Give Kyriakos Efthymiadis a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.