šµāMake friends, make friends, never, never break friendsā¦āšµ
Personally, I cannot stand a barrage of Tensorflow warning messages (W) or, even worse, error alerts (E) related to GPU devices every time I run my AI projects with this library. This article is about the issue many Data Scientists encounter: the challenging compatibility between TensorFlow and CUDA tools. In details, in this article I want to address and resolve the following errors/warning messages thrown at the initialization process of TensorFlow:
###-------------------------------1------------------------------------
E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
###-------------------------------2------------------------------------
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
###-------------------------------3------------------------------------
I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] uccessful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
In the last Deep Learning project, I worked with TensorFlow 2.10. After I have navigated through the dense forest of NVIDIAās official documentation for the CUDA Toolkit and cuDNN installations, TensorFlow recognized a GPU device, but I kept receiving warning/info messages about the absence of TensorRT and not identified NUMA node. However, I was working on a remote server, and these warnings appeared to not affect the overall process of DL model training, so I have decided to ignore them.
Recently, I upgraded my personal PC with the NVIDIA GeForce RTX 3060. My PC runs on Ubuntu 22.04.3 LTS, and I have the official NVIDIA drivers for my GPU installed. When I check the status of my GPU using the nvidia-smi
command, here's what I observe:
nvidia-smi
commandHere is the useful command that can verify if you have a a CUDA-Capable GPU. If in the output you see that your graphics card is from NVIDIA and it is listed in NVIDIAās CUDA GPUs, your GPU is CUDA-capable.
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)
Installing TensorFlow latest version following the official guide
The official TensorFlow installation guide had changed significantly since my last visit. The familiar Anaconda-based instructions were no longer there. Having become accustomed to Conda, I decided to proceed with it nonetheless. Thatās what I have found in the āInstall TensorFlow with pipā
At this stage, I took a misstep ā I have read the requirements for GPU use and I was about to install CUDA Toolkit 11.8 and cuDNN 8.6.0, along with a compatible version of TensorRT. However, starting from TensorFlow version 2.14, thereās a noteworthy change in the installation process for Linux users:
The
tensorflow
pip package has a new, optional installation method for Linux that installs necessary Nvidia CUDA libraries through pip. As long as the Nvidia driver is already installed on the system, you may now runpip install tensorflow[and-cuda]
to install TensorFlow's Nvidia CUDA library dependencies in the Python environment. Aside from the Nvidia driver, no other pre-existing Nvidia CUDA packages are necessary.
I wanted to install the latest version of TensorFlow (currently it is 2.15), and I decided to proceed without manual pre-installation directly on my PC of CUDA Toolkit and cuDNN.
Moving on to the TensorFlow installation, I prefer using Anaconda for my Python projects due to its convenience. I began by creating a Conda environment based on Python 3.10. Following that, I executed the command from TensorFlowās official guide to install the latest version, which is currently 2.15, along with its CUDA dependencies.
conda create -n tf-test-1 python=3.10
conda activate tf-test-1
python -m pip install tensorflow[and-cuda]
conda list
Upon inspecting the list of installed packages by the command from the snippet above in my Conda environment, I noticed that CUDA version 12.2, cuDNN 8.9.4, and TensorRT version 8.6.1 were automatically installed.
Here I want to mention one thing, the CUDA version displayed in the nvidia-smi
output matched the version installed from the PyPI repository. But it is not always like this! The CUDA version shown in nvidia-smi
and the CUDA Toolkit version are two distinct entities. For further clarity, refer to the diagram below and read more about CUDA compatibility at NVIDIA's documentation.
NB! Installing CUDA and cuDNN from pip package manager makes them reside inside a Conda environment only and when I run nvcc -V
command globally , I see this: command 'nvcc' not found, but can be installed with sudo apt install nvidia-cuda-toolkit.
Finally, to assess how well TensorFlow 2.15 functions with these pre-installed drivers, I ran the test code suggested in TensorFlowās official pip installation guide. Letās see the output:
(tf-test-1) alpony@alpony:~$ python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2023-12-24 14:10:34.169720: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-24 14:10:34.190123: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-24 14:10:34.190150: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-24 14:10:34.190697: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-24 14:10:34.193989: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-24 14:10:34.541756: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-12-24 14:10:34.771530: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-12-24 14:10:34.791450: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] uccessful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-12-24 14:10:34.791543: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Bam! The issues with registering cuDNN, cuFFT, cuBLAS, and a glaring absence of TensorRT! However, it turns out that I am not alone facing these problems.
Honestly, exactly this StackOverflow post motivated me to write this article. After spending countless hours debugging and scouring TensorFlow, NVIDIA forums, and StackOverflow, I felt compelled to share my findings and a viable solution.
One interesting point raised in the discussion on TensorFlow GitHubās Issues section was that with such kind of errors (unregistered cuDNN, cuFFT, cuBLAS) TensorFlow will run without cuDNN. To put this to the test, I run command from TensorFlow Test module:
The outputs of these test commands confirmed that TensorFlow was indeed utilizing CUDA, but the warnings about missing TensorRT were accurate ā when Iāve tried to check the version of loaded tensorrt library, the python session got aborted:
Installing TensorFlow 2.13 and CUDA libraries manually inside Conda environment
Determined to resolve all issues that TensorFlow throws upon each initialization, I turned to manual installation via Anacondaās Conda repositories. This approach I used for earlier TensorFlow versions.
1First, I manually installed CUDA Toolkit and cuDNN from the conda-forge repository, ensuring compatibility as per NVIDIAās guidelines.
###############################################################################
# The available versions in the conda-forge repository
# (https://conda.anaconda.org/conda-forge/linux-64/):
# cudnn-(7.6.5, 8.0.5, 8.1.0, 8.2.0, 8.2.1, 8.3.2, 8.4.0, 8.4.1, 8.8.0)
# cudatoolkit-(9.2, 10.0, 10.1, 10.2, 11.0, 11.1, 11.2, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8)
###############################################################################
conda create -n tf-env-ultimate python=3.10
conda activate tf-env-ultimate
conda install -c conda-forge cudatoolkit=11.8 cudnn=8.8
NB! Setting the PATH correctly is a crucial step in this process:
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo āexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ā > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
conda deactivate
2Next, I installed TensorRT via pip manager. After reviewing the available versions, I opted for version 8.5.3.1, aligning with the CUDA package mentioned in TensorFlowās code.
##############################################################################
# Availible version of tensorrrt library:
# python -m pip index versions tensorrt
# Available versions: 8.6.1.post1, 8.6.1, 8.6.0, 8.5.3.1, 8.5.2.2, 8.5.1.7
##############################################################################
conda activate tf-env-ultimate
#checking that LD_LIBRARY_PATH is correct
echo $LD_LIBRARY_PATH
python -m pip install tensorrt==8.5.3.1
TENSORRT_PATH=$(dirname $(python -c āimport tensorrt;print(tensorrt.__file__)ā))
echo $TENSORRT_PATH
#linking tensorrt library files to LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/alpony/anaconda3/envs/tf-env-ultimate/lib/python3.10/site-packages/tensorrt
conda deactivate
3Finally, I installed TensorFlow library. I chose to install TensorFlow 2.13, as versions starting from 2.14 seemed to have the aforementioned issues with CUDA libraries.
conda activate tf-env-ultimate
echo $LD_LIBRARY_PATH
python -m pip install tensorflow==2.13
python3 -c āimport tensorflow as tf; print(tf.config.list_physical_devices(āGPUā))ā
š£Voila! š£ The issues with registering cuDNN, cuFFT, cuBLAS, and not-found TensorRT are finally resolved:
(tf-env-ultimate) alpony@alpony:~$python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2023ā12ā24 15:40:53.509643: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023ā12ā24 15:40:53.531641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023ā12ā24 15:40:54.171272: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023ā12ā24 15:40:54.186033: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023ā12ā24 15:40:54.186132: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:995] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
But what about the NUMA node information? I found a partial solution detailed in the article āFixing the āsuccessful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zeroā problemā by
Zukhriddin, which I followed. After implementing these steps, hereās the outcome:
(tf-env-ultimate) alpony@alpony:~$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2023-12-24 16:45:10.414244: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-24 16:45:10.436806: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Voila! A couple of Information prints (I) and thatās all.
Testing the performance of both Conda environments
Reaching the conclusion of this experience, I reflect on the hours invested in addressing those warnings and ask myself: was it truly worth the effort? To find an answer, I performed a test using a piece of Python code that builds a simple Neural Network and trains it on random data. The aim was to compare the performance (in time) of two TensorFlow setups: one from the official guide and the other, a product of my Ģ¶bĢ¶lĢ¶oĢ¶oĢ¶dĢ¶ Ģ¶aĢ¶nĢ¶dĢ¶ Ģ¶tĢ¶eĢ¶aĢ¶rĢ¶sĢ¶ manual installation.
Python code (inspired by this code published on StackOverflow):
import numpy as np
import tensorflow as tf
from timeit import default_timer as timer
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(2048, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer=tf.compat.v1.train.AdamOptimizer(0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
def load_data():
data = np.load('data.npy')
labels = np.load('labels.npy')
return data, labels
#I have used this code seperately to generate random dataset:
#import numpy as np
#seed_value=42
#def random_one_hot_labels(shape, seed=None):
#n, n_class = shape
#np.random.seed(seed)
#classes = np.random.randint(0, n_class, n)
#tmp_labels = np.zeros((n, n_class))
#tmp_labels[np.arange(n), classes] = 1
#return tmp_labels
#def generate_and_save_data():
#data = np.random.random((10000, 32))
#labels = random_one_hot_labels((10000, 10), seed=seed_value)
#np.save('data.npy', data)
#np.save('labels.npy', labels)
#generate_and_save_data()
data, labels = load_data()
durations = []
for i in range(3):
start = timer()
model.fit(data, labels, epochs=100, batch_size=32)
durations.append(timer() - start)
print(f"model.fit durations: {durations}")
The results, measured in terms of execution time (seconds), were revealing:
- Conda Environment (
tf-test-1
) with TensorFlow 2.15 (Automatic Installation with CUDA Libraries):
model.fit durations: [48.48324873100137, 48.309404147999885, 48.39296498600015]
(tf-test-1) alpony@alpony:~$
- Manually build Conda Environment (
tf-env-ultimate
) with TensorFlow 2.13 and manually selected CUDA Libraries:
model.fit durations: [46.51688312400074, 47.9981527480013, 49.01901237399943]
(tf-env-ultimate) alpony@alpony:~$
The outcomes were closer than I anticipated. It seems that sometimes, allowing minor warnings to pass without deep analysis can be the more prudent approach. Of course, TensorFlow provides some testing tools to ensure its proper functioning (for example, a check whether TensorFlow is built with CUDA).
A friend of mine often cautions me against getting too absorbed in every single warning, jokingly suggesting it could lead to spending every waking hour in front of the PC.
By sharing my story, I hope to aid others who may find themselves in a similar situation, demonstrating that sometimes, a easy-going approach can be equally effective.