GPU for Machine Learning (CUDA, cuDNN and tensorflow-gpu)

6 min readJul 5, 2023

Are you tired of seeing your CPU getting totally utilized while training machine learning model ?

Like this for example 👇

Well Nvidia has a solution for you 👍

Complex models require a descent amount of computing power. By default, for such tasks CPU does the job unless you enable your GPU which I will show you how to do it. CPUs take a lot of time in training whereas GPUs will not only save you a lot of time but also help you multitask on other works. There’s also the risk of damaging your chip’s health as people tend to change their cards more often than chips.

Lets start.

1. Check whether your GPU supports CUDA.

Here’s the link to check: https://developer.nvidia.com/cuda-gpus

Under CUDA-Enabled GeForce and TITAN Products.

The list is quite big hopefully you find your GPU here.

2. Microsoft Visual Studio.

Here’s the link to download: https://visualstudio.microsoft.com/downloads/

Download the Community one or if your organization provides professional or enterprise then also it will work.

We just need to install the Visual Studio core editor so uncheck all the boxes but if you are working in any specific environment then go ahead and check the required boxes.

Install.

3. Use correct versions of tensorflow, Python , CUDA and cuDNN.

Very Important
It is to be noted that here in this step you don’t have to download anything this step is to inform you about the correct versions of requirements.

Here’s the link: https://www.tensorflow.org/install/source_windows

Under GPU.

Different projects may require different tensorflow as some functionalities in one version may be deprecated in another version.

For your GPU to work you need to have the correct versions of CUDA, cuDNN, tensorflow and python. Like for example lets say, I want to have tensorflow-2.7. then i need to have CUDA 11.2, cuDNN 8.1 and any version of python between python 3.7 to 3.10.

4. Nvidia CUDA toolkit.

Here’s the link to install CUDA toolkit: https://developer.nvidia.com/cuda-toolkit-archive

Please Note that tensorflow hasn’t caught upto CUDA Toolkit 12.2.0 as of today. As you can see below the latest tensorflow-gpu==2.10 requires CUDA 11.2. So any CUDA version above 11.2 will not enable tensorflow with CUDA.

Here in this example I will download the updated CUDA 11.2 because my project requires tensorflow-gpu==2.9

Link for CUDA 11.2 update 2: https://developer.nvidia.com/cuda-11.2.2-download-archive

Download and Install.

5. Nvidia cuDNN.

Here’s the link for cuDNN : https://developer.nvidia.com/rdp/cudnn-archive

For CUDA 11.2 I am downloading cuDNN 8.1.1 February version due to its compatibility.

For my use I downloaded the Windows Zip i.e “cuDNN Library for Windows (x86)”.

Download.

6. cuDNN Setup.

Extract the cuDNN zip file in C drive.

Copy bin, include and lib folders from where you extracted

Paste them in file path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2

If systems asks for copy and replace agree to it.

Edit the path of environment variables for user and add these paths i.e. bin, libvvp and lib from CUDA main folder (not the extracted zip folder). You can do that for systems variables as well but it doesn’t matter because after CUDA installation the paths automatically get added in systems variables.

Restart pc for these changes to take effect.

7. Environment.

Open conda or mini conda command prompt.

Create an environment using conda.

conda create --name "Enter Name" python=="3.5 - 3.10"

After the environment has been created.

pip install tensorflow-gpu=="1.0 - 2.10"

8. Verification.

Enter python command line and type the following.

import tensorflow as tf

If you get an error here your tensorflow installation was not right.

Now type.

print(tf.test.is_gpu_available())

print(tf.test.is_built_with_cuda())

Both statement should return True.

As below.

Now you GPU is enabled with CUDA.

9. How to use.

In order to use the environment where the GPU is enabled you need to import that environment in you IDE. For example as shown below in PyCharm.

The environment created is present locally so we need to import that.

PyCharm directly gives you the list of available environments so locate the environment, select it and create the project.

All environments are present in C:\Users\”username”\miniconda3\envs

So for any other IDE just import the python application from the above location.

10. Results.

For testing run this code.

import os

import tensorflow as tf
from tensorflow import keras
from keras import layers
from keras.datasets import cifar10

physical_devices = tf.config.list_physical_devices("GPU")
tf.config.experimental.set_memory_growth(physical_devices[0], True)

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

model = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),
        layers.Conv2D(32, 3, padding="valid", activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(64, 3, activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(128, 3, activation="relu"),
        layers.Flatten(),
        layers.Dense(64, activation="relu"),
        layers.Dense(10),
    ]
)


def my_model():
    inputs = keras.Input(shape=(32, 32, 3))
    x = layers.Conv2D(32, 3)(inputs)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(64, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(128, 3)(x)
    x = layers.BatchNormalization()(x)
    x = keras.activations.relu(x)
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation="relu")(x)
    outputs = layers.Dense(10)(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


model = my_model()
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=3e-4),
    metrics=["accuracy"],
)

model.fit(x_train, y_train, batch_size=64, epochs=10, verbose=2)
model.evaluate(x_test, y_test, batch_size=64, verbose=2)

When you run the code you can see your GPU written in red.

In Task Manager, your GPU will get appropriately utilized

As per the complexity of the code my testing code is quite less complex

If in case your statements result in false.

Here are the reasons why:

Your CUDA installation was not correct meaning CUDA, cuDNN, Python and tensorflow-gpu versions must be comaptible with each other. Check this: https://www.tensorflow.org/install/source_windows.
Check environment variables because the important .dll files are in the paths which I mentioned earlier. If incorrect assignment is there your system wont be able to locate them.
Update GPU drivers in Nvidia GeForce Experience.

CUDA 12 and tensorflow.

As said earlier tensorflow hasn’t caught upto CUDA 12 but its possible to workaround, here’s are some links that can help: