Using NAIST server GPU’s for deep learning — Anaconda with TensorFlow

Kibrom Desta
inet-lab
Published in
5 min readJan 8, 2021

Installing TensorFlow in a remote server can be a hassle, but thanks to Anaconda, it makes it very easy to use. I myself had some problems setting up this environment but with the help of the internet and Assist. Prof.
Masatoshi Kakiuchi I have managed to set up my environment.

In this article, I will show you the step by step guide that you need to follow to set up a TensorFlow environment and how to use it together with NAIST’s GPU servers.

here are the steps:

  1. login to NAIST server with your mandara account: ssh mandara-id@h29grid-dev0
  2. go to the working directory: cd ../../../work/mandara-id
  3. Download Anaconda for Linux from the official website : wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
  4. Install anaconda: bash Anaconda3–2020.11-Linux-x86_64.sh
press enter multiple times to read the license agreements — who does tho :)
type yes and press enter when dis message comes

5. Select a folder where your anaconda will be installed. Mine I like to keep it in the working directory, so I will name my folder “anaconda3"

6. After the installation is about to finish the anaconda will give you the following prompt, select either no or yes. It doesn’t make any difference anyway because students don’t have administrative permissions to run conda init in NAIST’s servers.

7. Voila!! we have finished anaconda installation with these simple steps. Next is to create a virtual environment that we will install TensorFlow on. But we first need to change the permission of the conda so as for it to be used by all users. Here is how to do it:

chmod 755 /work/mandara-id/anaconda3/bin/conda

Installing TensorFlow in anaconda virtual environment.

We will create a new virtual environment for our TensorFlow. Follow the steps below to do so:

  1. create a virtual environment in the anaconda named ‘tf_gpu_cuda8’. Will use the bash to create the virtual environment and install TensorFlow on it. So, while you are still in your work directory run the following commands:
bash
export PATH=”/work/mandara-id/anaconda3/bin:$PATH”

The virtual environment name can be anything, I put cuda8 in my example because GPU Tensorflow uses CUDA, and installation of Cuda is needed for this to work. The following two commands install both TensorFlow and cudatoolkit. After the installation is finished the next command activates the virtual environment.

conda create -n tf_gpu_cuda8 tensorflow-gpu cudatoolkit
conda activate tf_gpu_cuda8

2. Our next step would be to check if all the installations are up and running. we also need to make sure the NAIST’s GPU is being used whenever we submit our jobs or do interactive jobs. Let’s see if TensorFlow is accessing the GPU in the interactive server by first logging into the interactive server.

qlogin -q grid_intr.q 
bash
export PATH=”/work/mandara-id/anaconda3/bin:$PATH”
conda activate tf_gpu_cuda8

Let’s run the following TensorFlow code in a python shell:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

If running the above python script results as follows with “Num GPUs Available: 1”. Then you are good to go and everything is set up.

Running batch jobs in NAIST cluster servers

Running your programs using an interactive server is a bit easier than batch jobs, but it comes short when the processing time is more than 10hours. The interactive servers can only be used for a short time. If we want to use the NAIST servers for a longer, say 1000 hours as in grid_short.q, we need to submit batch jobs to the cluster node.

Follow the following steps to submit batch jobs on one of the above cluster nodes:

  1. create a folder for the programs and bash scripts, I will name the folder ‘batch_job’, and place it in your working directory. The folder contains a bash script, check_tf.sh, and a python script, check_tf.py. The contents of the two files is shown below:

check_tf.sh

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -V
export PATH="/work/mandara-id/anaconda3/bin:$PATH"
source activate tf_gpu_cuda8
cd /work/mandara-id/batch_job
python3 check_tf.py

check_tf.py

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

2. Submitting the batch job is done with the following scripts:

qsub -q grid_low.q check_tf.sh

running the above script will create two files. The first file which ends with an extension of,*.sh.e*, will contain the run-time errors in your programs if there are any. And the second *.sh.o* will contain the outputs of your script if your program runs without errors. My outputs are as follows :

checking_tf.sh.e343445
checking_tf.sh.o343445

If you follow the steps, you will get results like mine and GPU should be accessible. Congratulation and happy research.

N.B: In the whole description mandara-id refers to your mandara ID and should be replaced with your mandara ID.

--

--

Kibrom Desta
inet-lab
Editor for

Doctoral graduate working as a data scientist.