An Introduction to GPU Programming With Python

Published in

IllumiDesk

3 min readNov 20, 2017

“We are living in a world awash in data. Accelerated interconnectivity, driven by the proliferation of internet-connected devices, has led to an explosion of data — big data. A race is now underway to develop new technologies and implement innovative methods that can handle the volume, variety, velocity, and veracity of big data and apply it smartly to provide decisive advantage and help solve major challenges facing companies and governments.”

— Big Data: A Twenty-First Century Arms Race

As individuals and companies traverse the world of Big Data, it has become clear that one of the most valuable assets in the field is the power to program and develop as rapidly as possible. For many organizations, the ability to quickly prototype and ship a minimum viable product, and specifically the ability to do it before competitors, has been the difference between future success and failure.

GPU programming is a prime example of this kind of time and resource-saving tool. Whether you’re organizing mountains of weather data, modeling a bitcoin price predictor, or processing images for a classifier, the astounding benefits of GPU processing are already highly praised and well documented.

The barrier to entry for GPU programming remains a hurdle for many, however. Oftentimes, developers with Python experience in machine learning, deep learning, etc. find themselves stuck learning C++ or CUDA before they can even implement a GPU into their workflow. That is, until relatively recently.

Several Python-based GPU programming libraries have recently released to the public, and it warrants a second look at how and why many developers are now making this shift. To better understand these concepts, let’s dig into an example of GPU programming with PyCUDA, the library for implementing Nvidia’s CUDA API with Python.

#!/usr/bin/env python

import numpy

import pycuda.autoinit

import pycuda.driver as drv

from pycuda.compiler import SourceModule

# SourceModule compiles C code for CUDA

mod = SourceModule("""

__global__ void multiply_them(float *dest, float *a, float *b)

{

const int i = threadIdx.x; dest[i] = a[i] * b[i];

}

""")

multiply_them = mod.get_function("multiply_them")

a = numpy.random.randn(400).astype(numpy.float32)

b = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)

# Perform the computationmultiply_them( drv.Out(dest), drv.In(a), drv.In(b), block=(400,1,1), grid=(1,1))

print dest-a*b

Upon running the code, you will simply see a lot of zeroes. Here’s what’s happening in the background:

The python library compiles the source code and uploads it to the GPU
The numpy code has automatically allocated space on the device, copied the numpy arrays a and b over, launched a 400x1x1 single-block grid, and copied dest back.
Automatically inferred what type of cleanup is needed, and then performed any that is necessary

Additionally, Python libraries like PyCUDA are beneficial for several reasons:

Abstractions in Python make implementing CUDA easier and more convenient than before
Taking advantage of the speed of C++ written under the hood
Access all complete feature sets of CUDA through a Python wrapper
Organized and automatic garbage collection and exception handling via familiar Python methods

We recently announced the employment of GPU architecture here at IllumiDesk. The changes in modeling efficiency and speed have been tremendous. Hopefully, this example of accessing the power of GPU programming through Python will be a jumping-off point for your own projects.

Get Started With IllumiDesk

Originally published at https://blog.illumidesk.com.

An Introduction to GPU Programming With Python

Written by Nathaniel Compton