Getting started with ML without much of software installation dependencies
Part II is here
Part III is here
On a new computer, how will we start machine learning development? Too many settings. Too many packages. Python 2, python3… Tensorflow with gpu support, without gpu support. Tensorflow2, tensorflow 1.14.2, tensorflow 1.15.2. Some ML training will work only on tensorflow 1.14. It may not support 1.15 series. How will we handle all these scenarios without much worry. What all configurations to be done for the NVidia GPU before getitng this started?
For a seasoned ML practitioner, this will be a cake walk. For a self-learner at the start of learning, it will be difficult to identify where to start. In this post, let me try to cover how to use tensorflow docker image for various use cases. In this post, we will see what bare-minimum required to be installed on the computer.
Here is the list of software to be installed
- docker
- nvidia gpu drivers
I always refer to How To Install and Use Docker on Ubuntu 18.04 for docker installation. This worked out for me in all my trials.
For installing nvidia drivers, first identify if gpu is present in the device by the command
$ lspci | grep -i nvidia
$ ubuntu-drivers devices
Once gpu is listed, you can run autoinstall as below to install drivers.
$ sudo ubuntu-drivers autoinstall
The next step will be to install nvidia container toolkit. The installation can be verified with the below command.
$ docker run --gpus all --rm nvidia/cuda nvidia-smi
Tensorflow on docker
Details mentioned in the page https://www.tensorflow.org/install/docker. Once you go through that, refer here.
We can select tensorflow tags as below
tensorflow/tensorflow:<version>[-gpu][-py3]
For example, if we want to use tensorflow version 1.15.2 with python3 support and gpu support, we can use the tag tensorflow/tensorflow:1.15.2-gpu-py3. For v2.0.1, gpu, it will be tensorflow/tensorflow:2.0.1-gpu
While using gpu, use the flag — gpus all to use the gpu in tensorflow docker container.
A sample docker run command will look something like
docker run -it --gpus all --name <name> -v <algo folder>:/algo -v <data folder>:/data tensorflow/tensorflow:1.15.2-gpu-py3 /bin/bash
We will get the advantage of using docker in ML domain. ‘Use and throw’, ‘change versions on need’, ‘Keep multiple conflicting versions’ etc are some of the features of this approach.
In the post Further tweaks to improve ML training experience, I tried to cover how to use GPU fully for ML training