Setting up PyCUDA on Ubuntu 18.04 for GPU programming with Python

Rajneesh Aggarwal
leadkaro
Published in
5 min readMay 14, 2019

--

Photo by Reina Kousaka on Unsplash

Compute Unified Device Architecture (CUDA) is a very popular parallel
computing platform and programming model developed by NVIDIA. It is only
supported on NVIDIA GPUs. A GPU has simple control hardware and more hardware for data computation that gives it the ability for parallel computation. CUDA allows a programmer to specify which part of CUDA code will execute on the CPU and which part will execute on the GPU.

CUDA programming make use of GPUs which has many small and simple processors that can get work done in parallel. CUDA is relatively easy to use, provides an unmatched set of first-party accelerated mathematical and AI-related libraries.

PyCUDA lets you access Nvidia’s CUDA parallel computation API from Python. PyCUDA provide abstractions like pycuda.driver.SourceModule and pycuda.gpuarray.GPUArray make CUDA programming even more convenient than with Nvidia’s C-based runtime. PyCUDA puts the full power of CUDA’s driver API at your disposal and it also includes code for interoperability with OpenGL.

Setting up PyCUDA development environment

Setting up your Python environment for GPU programming can be a very
delicate process. There are some prerequisites for setting up a development environment for CUDA, listed as following:

* A CUDA-supported GPU
* An NVIDIA graphics card driver
* A standard C compiler
* A CUDA development kit

Since CUDA programming is only supported on NVIDIA GPUs, please check that the NVIDA graphic card is installed on your machine using the following command:

$ lspci | grep -e "NVIDIA"or $ sudo lshw -C video

The above command should show the information regarding your graphics card. If NVIDIA graphic card is not available, please don’t proceed further.

If you have NVIDIA GPU available, first of all, install the NVIDIA driver for ubuntu as follows:

# Download the Anaconda installer for Linux and run the following to # install Anaconda for Python 3.7:
bash ~/Downloads/Anaconda3-2019.03-Linux-x86_64.sh
# Continue to follow the instructions of Anaconda installer. Once # finshed it will activate the conda's base environment on startup # of linux terminal. To deactivate the conda's base environment so # that it is not activated on startup,use the following command:
conda config --set auto_activate_base false
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pubsudo apt-get updatewget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi

Once, you get the information, install the following CUDA dependencies:

$ sudo apt-get install build-essential binutils gdb

In the above command, build-essential is the package with the gcc and g++ compilers, and other utilities such as make; binutils has some generally useful utilities, such as the LD linker, gdb is the debugger.

Now install a few additional dependencies that will allow to run some of the graphical (OpenGL) code included with the CUDA Toolkit:

$ sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev

Now, install the nvcc compiler which is the command-line CUDA C compiler, analogous to the gcc compiler:

$ sudo apt install nvidia-cuda-toolkit

After the package is finished installing, you may have to configure your PATH and LD_SYSTEM_CONFIG environment variables so that your system can find the appropriate binary executable and library files needed for CUDA.

$ gedit ~/.bashrc# Add the following at the end of the file:export PATH="/usr/local/cuda/bin:${PATH}export LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"

Please ensure that you’ve correctly installed the toolkit by using this command:

nvcc — version 

which will give you the version information of cuda toolkit compiler.

Install PyCUDA

Use the following commands to install PyCUDA along with its dependencies:

$ sudo apt-get install build-essential python-dev python-setuptools libboost-python-dev libboost-thread-dev -y$ pip install pycuda

Run the following program to check if everything is setup:

import pycuda
import pycuda.driver as drv
drv.init()
print('CUDA device query (PyCUDA version) \n')print('Detected {} CUDA Capable device(s) \n'.format(drv.Device.count()))for i in range(drv.Device.count()):

gpu_device = drv.Device(i)
print('Device {}: {}'.format( i, gpu_device.name() ) )
compute_capability = float( '%d.%d' % gpu_device.compute_capability() )
print('\t Compute Capability: {}'.format(compute_capability))
print('\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2)))

# The following will give us all remaining device attributes as seen
# in the original deviceQuery.
# We set up a dictionary as such so that we can easily index
# the values using a string descriptor.

device_attributes_tuples = gpu_device.get_attributes().items()
device_attributes = {}

for k, v in device_attributes_tuples:
device_attributes[str(k)] = v

num_mp = device_attributes['MULTIPROCESSOR_COUNT']

# Cores per multiprocessor is not reported by the GPU!
# We must use a lookup table based on compute capability.
# See the following:
# http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]

print('\t ({}) Multiprocessors, ({}) CUDA Cores / Multiprocessor: {} CUDA Cores'.format(num_mp, cuda_cores_per_mp, num_mp*cuda_cores_per_mp))

device_attributes.pop('MULTIPROCESSOR_COUNT')

for k in device_attributes.keys():
print('\t {}: {}'.format(k, device_attributes[k]))

If everything is fine, you should get the output very similar to the following:

CUDA device query (PyCUDA version) 

Detected 1 CUDA Capable device(s)

Device 0: GeForce GTX 1060
Compute Capability: 6.1
Total Memory: 6078 megabytes
(10) Multiprocessors, (128) CUDA Cores / Multiprocessor: 1280 CUDA Cores
ASYNC_ENGINE_COUNT: 2
CAN_MAP_HOST_MEMORY: 1
CLOCK_RATE: 1733000
COMPUTE_CAPABILITY_MAJOR: 6
COMPUTE_CAPABILITY_MINOR: 1
COMPUTE_MODE: DEFAULT
CONCURRENT_KERNELS: 1
....
....
TEXTURE_PITCH_ALIGNMENT: 32
TOTAL_CONSTANT_MEMORY: 65536
UNIFIED_ADDRESSING: 1
WARP_SIZE: 32

Footnote

Congratulations! You made it till the end. Now, with all the drivers and compilers firmly in place, you can start the actual GPU programming with PyCUDA. Happy coding!

If you liked this article, please leave a few 👏. It lets me know that I am helping.

About me

I am a hands on technical enthusiast having versatile skills. I have worked in a highly scalable and agile environment across Mobile, web, cloud, enterprise and AI applications. I am an AWS Cloud certified architect and have successfully migrated a lot of on-premise applications to AWS cloud. If you need any help, let’s connect over LinkedIn.

Follow me on 🐦 Twitter and 📝Medium.

Thanks for reading!

--

--

Rajneesh Aggarwal
leadkaro

Tech Enthusiast | Solving Artificial Intelligence ready world | Let’s Connect! Write to me at rajneesh.aggarwal@gmail.com