Run FeatureCloud applications with GPU acceleration

Published in

Federated Learning with FeatureCloud

3 min readOct 18, 2022

FeatureCloud supports GPU acceleration, and in this story, we are going to explain how you can run your application with GPU access using the FeatureCloud platform. For running applications with GPU accelerations, some pre-steps should be taken. The first thing is enabling the CUDA application to run in Docker.

Setup Docker

Install Docker
Add your user to the docker group:

$ sudo usermod -aG docker $USER

Setup NVIDIA driver and runtime

Install NVIDIA driver
Verify the installation with nvidia-smi:

$ nvidia-smi
Mon Oct 17 14:24:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.85.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeFo$ nvidia-smi
Mon Oct 17 14:24:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.85.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|=================
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

3. Install NVIDIA container runtime:

$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |\
    sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
$ sudo apt-get install nvidia-container-runtime

4. Restart Docker:

$ sudo systemctl stop docker
$ sudo systemctl start docker

Now you should be able to run your CUDA application in Docker. One can check that NVIDIA is running in Docker:

$ docker run --gpus all nvidia/cuda:10.2-cudnn7-devel nvidia-smi

Running FeatureCloud Controller with GPU access

FeatureCloud supports GPU acceleration for CUDA applications by starting the controller with GPU access and running applications that can utilize NVIDIA GPUs. Once the GPU is locally available in the clients, they can start the controller with GPU access as follows:

$ featurecloud controller start --gpu=true

By default, the — gpu option is false, and once it is set to true, the controller’s docker container will be running with access to all the available GPUs in the local machine. If any of the prerequisites are not provided, starting the controller with GPU access may raise an error like this:

$ featurecloud controller start --gpu=true
Downloading...: 3it [00:00, 705.24it/s]
Controller could not be started. Error: 500 Server Error for http+docker://localhost/v1.41/containers/eb542786d00377fe42d32af7f8f9b6bfa7ae280b84be14d88be2711e6a441adf/start: Internal Server Error ("failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown")

It may be related to inaccessible GPUs because there are no physical GPUs on the machine, or the NVIDIA driver or container runtime is not installed. Therefore, make sure you go through the prerequisites carefully!

Running FeatureCloud Deep Learning app using GPU

The Federated Deep Learning app can run different deep neural network architectures in a federated fashion inside the FeatureCloud platform. The Deep learning app is implemented using the PyTorch library, while it supports other architectural and training elements to provide FeatureCloud users with enough flexibility to experiment with diverse architecture. It also supports GPU accelerations, while clients are not obliged to use GPU. Depending on the availability of the GPU, each client may decide to utilize the GPU at its discretion by changing the gpu option in the config.yml file.

The docker image of the deep learning app is available in the FeatureCloud docker repository under the name featurecloud.ai/fc_deep_networks with cpu and gpu tags. The latter is built by torch, torchvision, and torchaudio using CUDA 11.6 library to utilize GPU. For using the GPU version, one can download the following version of the deep learning app:

$ featurecloud app download featurecloud.ai/fc_deep_networks:gpu

One can run the deep learning app in the test-bed or use it in a federated collaboration in a workflow.

Run FeatureCloud applications with GPU acceleration

Setup Docker

Setup NVIDIA driver and runtime

Running FeatureCloud Controller with GPU access

Running FeatureCloud Deep Learning app using GPU

Written by Mohammad Bakhtiari Ai