GP-GPU Computing via C#
Almost every year we increase the number of CPU cores in our devices to increase overall performance and user experience. Having an eight-core phone is not a big deal nowadays. Although there exists another kind of programmable unit that is usually ignored by most of the programmers. It has a multitude of computing cores, hundreds of them, and it is a GPU or graphics processing unit responsible for drawing the user interface and handling 3D experience in games. From the very beginning of its existence, it was a highly specialized device designed just for transforming and rendering the given data, and there was only one-way flow of data: from CPU to GPU.
However, since the arriving of Nvidia CUDA (Compute Unified Device Architecture) in 2007 and OpenCL (Open Computing Language) in 2009, the graphics processing units became accessible for general-purpose, bidirectional computations (called General-Purpose GPU Programming or simply GPGPU).
Nick Mozgovoy shares his perspective as a .NET developer regarding a great opportunity to have access to a huge computational power of hundreds GPU cores, so I tried to figure out what is the current state of art in my domain.
First of all, what CUDA and OpenCL are and what are the differences between them?
In general, they are APIs that allow a programmer to perform a specific set of computations on GPU (or even exotic devices like FPGA). It means that instead of rendering the result on display, the GPU will somehow return it to the API caller.
There are considerable differences between the two technologies.
Firstly, CUDA is a proprietary framework developed and supported only by Nvidia, while OpenCL is an open standard rather than a complete solution or concrete implementation. Therefore, CUDA is available only on Nvidia devices, while any manufacturer may support OpenCL (by the way, Nvidia chips support it as well).
Secondly, CUDA is a GPU-specific technology (at least now), while OpenCL interface may be implemented by various devices (CPU, GPU, FPGA, ALU, etc.).
These differences have obvious consequences:
- CUDA is a little bit more performant than OpenCL on Nvidia chips;
- You certainly can rely on consistence between CUDA documentation and implementation having a single manufacturer (Nvidia), which is not the case with OpenCL;
- OpenCL is the only way to go if you have to support hardware other that Nvidia chips.
HOW IT WORKS
Image: CUDA processing flow
Let us describe how GPGPU works with the scheme represented in Image 2:
- Form the data to be processed in RAM
- Copy processing data into video RAM
- Instruct GPU to process the data
- Execute in parallel on each core
- Copy the result back to RAM
It should be noted that this kind of general-purpose GPU computations is reasonably restricted:
- they cannot perform any IO;
- they cannot directly reference data in computer memory.
Even though it seems simple on general scheme, the computation model and API are not that intuitive, especially considering the fact that the native API itself is available only in C and C++.
Methinks it is the main reason why the GPGPU programming is not really widespread yet.
GPGPU ON .NET PLATFORM
There is no native support of GPU programming on .NET platform yet, so we will have to rely on third party solutions. Moreover, there are not that many options to choose from, so let us briefly review available alternatives among actively developing projects. Interestingly, most of them focus on Nvidia CUDA rather than OpenCL.
Read the full article.