Learn CUDA

Javier Abellán Abenza
Neurosapiens
Published in
3 min readJan 18, 2019

CPU vs GPU

The diference between the CPU and the GPU is that the GPU is specialized for compute-intensive, highly parallel computation, and therefore designed such that more transistors are devoted to data processing rather than data caching and flow control.

More specifically, the GPU is especially well-suited to address problems that can be expressed as data-parallel computations: the same program is executed on many data elements in parallel

CUDA terminology

  • Host: CPU and RAM
  • Device: GPU and it’s RAM
  • Kernels: Special functions, which can be called from host code (regular C code running on the CPU) but are run on the device (GPU) N times in parallel, executed by N CUDA threads.

Kernels are executed N times in separate threads, which are for convenience, grouped in a hierarchy of blocks and grids:

  • Thread block: CUDA programming model follows a well defined thread hierarchy model in which threads during execution are grouped into so called thread blocks. Thread blocks can be one, two or three dimensional. This very naturally maps to vectors, matrices and volumes.
  • __global__: Special CUDA C keyword used as part of a method signature to mark that method as a kernel.
  • __host__: methods marked with this keyword can be only called from host code and will also run on the host
  • __device__: methods marked with this keyword can be only called from device code and will also run on the device.

CUDA Automatic Scalability

A CUDA program can run on every Nvidia card independtly of it number of Streaming Multiprocessors (SMs). A multithreaded program is partitioned into blocks of threads that execute independently from each other, so that a GPU with more multiprocessors will automatically execute the program in less time than a GPU with fewer multiprocessors.

3 key abstractions

  • hierarchy of thread groups
  • shared memories
  • barrier synchronization

References

--

--

Javier Abellán Abenza
Neurosapiens

M.S. Computer Science student interested in deep learning