The evolution of a GPU: from gaming to computing
The above image shows the increase of computing power in time concerning CPUs and GPUs and highlights how the computational capabilities of GPUs grow faster than those of CPUs.
Today, the number of Floating Point OPerations (FLOP) per second (FLOPS) performed by a GPU can be easily reach the tents of Tera FLOPS, namely, the tents of thousands of billions FLOPS, something unconceivable 15 years ago. But how did it come to this?
CPUs and GPUs have have evolved their computing capabilities along different lines.
Up to 2000', CPUs have incremented their computing power by increasing the clock speed. This has been possible up to a certain point since faster clocks mean increasing heat to dissipate. Moreover, the law of physics do not permit to reach arbitrarily fast clocks. For these reasons, starting from 2000, instead of speeding up clocks, it has been preferrable to increase the parallelism with multicore solutions.
Opposite to that, GPUs have mostly linked their lives to gaming. Initiallty, game graphics was 2D and iconic: a picture made of sprites had to show what it was meant to look like, and the mind extrapolated the remaining part of the information. A simple CPU could do the job. Gradually, the desire of making graphics more realistic has pushed towards more computational needs, and graphical accelerators were needed, unbundling their role from the CPU.
The first GPU was issued and marketed by NVIDIA in 1999 under the name of GeForce 256. Powerful GPUs today permit a full 3D rendering in which the images are viewed from a locked perspective and the laws of physics are exploited and calculated to make the games hyperrealistic.
GPUs must be capable to shoot millions and millions of rays from a given source and to test whether they intersect the discretization elements (primitives) of the scene, typically triangles. If the ray intersects a primitive, this means it is seen from the camera. In addition, if a ball bounces in a game scene, the true law of physics ruling the elastic bomp must be solved. All these computational needs have made the GPUs true desktop supercomputers.
Nowadays, commercial CPUs make tents of cores, while GPUs make up to thousands of cores available.
Having off-the-shelf supercomputers at disposal, the idea of using them to accelerate applications of scientific computing was born, giving rise to the era of General Purpose GPU (GPGPU) computing.
Missing anyway a programming language simplifying the exploitation of GPUs for scientific computing, first attemps were carried out by few willing pioneers who used DirectX and OpenGL, namely, the same APIs used for graphics and gaming, for their scientific-computational purposes. It must be said that such an approach to GPGPU remained the prerogative of a small niche.
On October 2004, the University of Stanford released BrookGPU, a compiled programming language based on ANSI C and conceived to operate on GPUs like those produced by ATI and NVIDIA at that time. The idea was to hide the details of DirectX and OpenGL to simplify GPU programming. Although BrookGPU had the undoubted merit to having popularized GPGPU, the main limitation was compatibility. GPU driver optimization and update, while being a good thing for gamers, could break Brook’s compatibility overnight and this disrupted its use to develop industrial-quality code intended for deployment. For some time, BrookGPU remained the niche of curious researchers and programmers.
Three years later, on June 23, 2007, CUDA (Compute Unified Device Architecture) was released. CUDA is a hardware architecture and programming model developed by NVIDIA and initially based on ANSI C, but showing today a full compatibility with C++. CUDA enables the parallel programming of GPUs on NVIDIA graphic cards and today, after 14 years from its appearance, it has become a standard of GPGPU computing. CUDA hides a whole bunch of graphical operations within functions that make simple their invokation and permits the development of high-quality scientific code and of deployable industrial software. CUDA, obviously, is limited to NVIDIA cards, being it proprietary of NVIDIA.
In Novembre 2008, OpenCL (Open Computing Language) was also released, a framework based on ANSI C andC++. It enables the parallel programming on a host of platforms, including multicore CPUs and GPUs. From the point of view of GPGPU, it allows the execution of parallel code on graphic processors of different vendors, including those developed by AMD and NVIDIA, enabling software portability.
Finally, the above image shows the Top500 list of the best 10 supercomputers all over the world. The list shows how GPUs are catching on supercomputing on large mainframes. Although in the past years only few large scale supercomputers were based on the use of GPUs, as of November 2020, we have reached 5 out of 10. The list shows also how many cores do they have (from hundred thousands to millions, impressive!), their theoretical maximum computational power Rmax, their peak computational power Rpeak on Linpack routines and the consumed power.