From CPU to XPU, Intel break the frame with oneAPI

Sajjad Hussain
Dec 23, 2020 · 11 min read

According to different calculation methods, the types of computing chips can be divided into four major sections of SVMS, namely Scalar, Vector, Matrix, and Spatial. Four types of computing chips have cut the entire computing chip market. Scalar calculations are handled by CPUs, vector calculations are handled by GPUs, matrix calculations rely on ASICs, and spatial calculations use FPGAs. The four sections have been separated from each other for a long time. Developers who have mastered more than two sectors at the same time are rare to develop a hardware platform with face image analysis and deep learning, which usually requires a large team of engineers to work together.

Image for post
Image for post

Similar to the ultimate goal of physicists is to pursue the law of unification, computing chip manufacturers are constantly exploring the ability to provide a complete cross-architecture computing solution across the four major computing chip types, and realize the dream of heterogeneous multi-core processors. . This makes the industry’s TOP 10 manufacturers continue to integrate downwards and acquire other chip manufacturers to make up for their lack. The most famous recent examples include NVIDIA’s intention to acquire ARM, and AMD’s intention to acquire Xilinx.

However, the process of acquisition and integration is destined to be tortuous. Only those manufacturers who can take the lead in reaching the top can be regarded as the top players in the industry. After completing the acquisition of FPGA manufacturer Altera, Intel non-stop put forward the Odyssey plan, after a lapse of more than 20 years, the real discrete graphics products, put on the agenda again, and gave birth to the Xe series architecture.

Image for post
Image for post

If Intel just hopes to enter the NVIDIA and AMD discrete GPU markets where the struggle has long been heated, it would be a bit superficial. In fact, the Xe architecture fills Intel’s own vacancy in vector computing. So far, Intel has won the last important puzzle of heterogeneous computing, that is, scalar (CPU), vector (GPU), matrix (ASIC), space ( FPGA) Full coverage of the four computing types of chips.

Full coverage does not mean uniformity. Intel’s ideal is to be a developer who can solve all-round problems of CPU, GPU, ASIC, and FPGA through one portal and one platform. If placed three years ago, people would think that such an idea is very bold and whimsical. However, with the launch of the oneAPI software industry plan in November 2019, the oneAPI beta product landed, and a complete set of development tools including compilers, programming libraries, and analyzers were packaged in a unified manner. The dream of spanning four types of computing chips has become a reality.

So far, Intel has completed the first step of the CPU to XPU. A company spans CPU, GPU, AICS, FPGA, and through a platform deployment and control, that is, the XPU+oneAPI ultra-heterogeneous computing concept is formed, once again delineating the industry’s new Benchmark.

Image for post
Image for post

And just recently, Intel once again held an XPU and software conference to push the XPU+oneAPI concept products one step further. It not only announced the official delivery of the oneAPI Gold version in December this year, but also launched the Intel server GPU. Such a high-speed product iteration is a bit overwhelming. What kind of changes will new software and products bring?

oneAPI Gold Platform unification takes one step further

oneAPI Gold is an upgraded version of oneAPI Beta product, that is, oneAPI 1.0, which can be regarded as a real software platform key connecting Intel’s CPU, GPU, ASIC, and FPGA. In response to the resounding slogan at the beginning of the release of the oneAPI software industry plan, “No transistor left behind”.

The main form of oneAPI Gold to realize the unification of the software platform lies in the unified software stack. Simply put, programmers who are good at high-level languages ​​can complete their work regardless of the state of the hardware. At the same time, developers responsible for the optimization of the underlying hardware can also find their own tools in oneAPI Gold. At the same time, Intel also stated that it will migrate Intel Parallel Studio XE and Intel System Studio tool suites to oneAPI Gold products.

Image for post
Image for post

This means that all content of oneAPI Gold will be implemented around the stack. The hardware at the bottom of the stack is divided into different functional areas, including scheduler management, communication, device and memory management, tracing, debugging tools, and so on.

In the programming language, the core of oneAPI is the direct programming tool C++ (Data Parallel C++, or DPC++) programming language to express. The overview of the C++ programming language as a direct programming tool is: DPC++ is a source code document that allows developers to use a source code document CPU, GPU, FPGA and other hardware accelerators for coding. It is an open, cross-industry programming language. In fact, DPC++ is a C++ extension and added SYCL support, which can support data parallel and heterogeneous programming across CPUs and accelerators, simplify programming and improve the reusability of code on different hardware, and can be based on different accelerators. Perform tuning.

Although there is a very low entry barrier, not everyone still wants to rewrite the code according to the new specification. For this reason, Intel has also prepared a compiler that supports OpenMP, which is compatible with standard C++, Python, mxnet, sklearn, NumPy, XGBoost, and maintains continuity with existing codes. This is the first time that many companies and institutions can support DPC++. one of the reasons.

At the same time, oneAPI Gold also has a strong set of migration tools. Intel hopes to accommodate more XPUs through strong compatibility, even including NVIDIA CUDA. Sound familiar? As early as 2015, AMD tried to create a set of heterogeneous computing portable interface HIP, allowing AMD GPU to transplant code from the prevailing CUDA compilation environment, and developed a set of tools HIPify Tools, which can automatically convert CUDA code Into HIP code.

For Intel oneAPI Gold, CUDA code porting is just a small part of GPU porting in oneAPI Gold. Porting code from the NVIDIA CUDA compilation environment to AMD in the early years meant that it was necessary to solve the problem of transferring code from wide vector machines to narrow vector machines. Due to the superiority of Intel’s Xe architecture and support for variable vector widths, CUDA code can be more smoothly ported to Xe series GPUs.

In other words, the oneAPI Gold platform is actually made up of software and hardware that complement each other. The superiority of the Xe architecture brings convenience to software migration, and the great unification of the software level allows Intel to realize the coordination of heterogeneous multi-core processors. Especially for new GPUs such as Xe architecture, if you want to have a place in HPC, efficient code migration from CUDA is a very good shortcut.

OneAPI Gold is more powerful in that it is not limited to local. At the beginning of the release, it was supported by the Intel DevCloud cloud platform. Especially for a single developer, he may not need to prepare any local configuration and software installation. Through the cloud, he can get as much hardware support as possible, so that developers can find solutions that interest them.

The point is that the oneAPI Gold version will provide both a local installation package and a free online version provided by the DevCloud platform in December. Of course, you can also spend money to purchase a commercial version that provides global support from Intel technical consulting engineers.

After DevCloud is expanded, developers can also implement tests and workloads on Xe GPU hardware through the platform. Similarly, the Intel Iris Xe MAX graphics card that has been sold in some products is also within the adjustable range, and Xe-LP can be used by specific developers.

We can also see that oneAPI Gold has very high goals in terms of positioning. It is a unified and simplified programming model that not only simplifies the development process across multiple architectures, but also produces direct target code with no or negligible performance loss. At the same time, the oneAPI plan is based on industry standards and open specifications to support the adoption of this technology in a wide range of industry ecosystems to promote new evolution in the field of application development.

Image for post
Image for post

So far, Microsoft Azure, Google TensorFlow, and many institutions, companies and universities have announced support for oneAPI. For example, the Beckman Institute of Advanced Science and Technology at the University of Illinois announced that it will establish a new oneAPI Center of Excellence (CoE). They are extending the life science application NAMD to other computing environments through the programming model of oneAPI. NAMD is not only able to simulate large-scale biomolecular systems, but is also invested in trying to solve the challenge of COVID-19.

Speed ​​up the server with a graphics card

The superior performance of GPU’s massively parallel processing capabilities makes it easier to handle multi-threaded concurrent tasks. Intel introduced Linux-based Intel server GPUs for the first time. In a typical dual-card system, Intel server GPU supports more than 100 concurrent users of Android cloud games, and can be expanded to 160 concurrent users.

This means that cloud service providers can independently expand the graphics card capacity without changing the number of servers, allowing each system to support more streams and subscribers, and at the same time achieve a lower total cost of ownership (TCO).

For players, the experience gained is more obvious. The powerful cloud processing capability will not consume too much local performance of the mobile phone. Through the form of game screen streaming, mobile players can upgrade from the local 30FPS game refresh rate experience to 120FPS at once, gaining a qualitative leap, and forming a service provider A win-win effect for players.

Developers can use the common API in the current Media SDK, which will also be migrated to the oneAPI video processing library (oneVPL) next year. Currently, Intel is working with many software and service partners to jointly market Intel server GPUs, including Gamestream, Tencent, and Ubitus. Among them, Tencent has begun to use dual-card servers to generate 100 game examples including “Glory of the King”.

What’s interesting is that Intel server GPUs are based on the Xe-LP architecture like the Iris Xe MAX standalone, and will be launched within this year. On the consumer platform, the Iris Xe MAX based on the Xe-LP architecture is comparable to the MX350 solution, and has established cooperative relationships with OEMs such as Acer, Asus, and Dell, and has begun to distribute products to the notebook market.

In terms of specifications, Iris Xe MAX can be seen as an extension of Tiger Lake’s integrated GPU, with 96 EUs, 768 ALUs, and the frequency increased to 1650MHz. The underlying DG1 GPU contains two identical Xe-LP media encoding blocks, and provides a 128bit video memory controller. The video memory capacity is determined by positioning. For example, the onboard video memory of Intel server GPU is 8GB LPDDR4X-4266, and the onboard video memory of consumer version is 4GB.

Image for post
Image for post

In addition, Xe MAX keeps up with the popular support H.264, H.265 and the latest AV1 encoding. Under the 10nm SuperFin process, the chip area is very small, which is unmatched by the independent graphics chips of other manufacturers. It is also because of this that Intel can package 4 in the Xin H3C H3C 3/4-length, full-height x16 PCIe 3.0 expansion card Xe MAX.

The universal adaptability of the Xe-LP architecture enables whether it is Tiger Lake’s integrated GPU, DG1 discrete GPU, or SG1 aggregated by 4 DG1s, all use the same set of code base support. Even the extended code base will support more common data center products such as Linux. Intel has optimized Linux drivers, focusing on code reuse between operating systems, and further paying attention to Linux 3D performance. Currently, it provides three A fully validated and integrated release stack.

At the same time, Intel also announced that the implicit SPMD program compiler (ISPC) will run on the level zero interface (Level Zero) of oneAPI. Level Zero provides a collection of APIs of the hardware abstraction layer for XPU. It was created by Intel to provide the bottom layer. The direct-to-metal interface for customers to program across multiple hardware platforms.

Concluding remarks: “Turn the cocoon into a butterfly” through oneAPI+XPU

Whether oneAPI Gold or the Xe series architecture in a standalone server GPU, we can see that Intel has set very high goals for it. The beta version of oneAPI, which was launched in the fourth quarter of 2019 alone, contains dedicated development kits for high-performance computing (HPC), deep learning, IoT, and vision and video, and will be upgraded within one year. To oneAPI Gold, join commercial service support, join Intel Parallel Studio XE and Intel System Studio tool suite, and implement its own “Moore’s Law” at the software level.

In fact, this chip manufacturer has been deeply involved in the developer ecology for more than 20 years. It employs more than 15,000 software engineers worldwide, provides software deployments for more than 10,000 important customers, and is also the largest contributor to the Linux kernel, processing more than 50 codes per year Wanxing, optimized for more than 100 operating systems, and has an ecosystem of more than 20 million active developers, and these are just the tip of the iceberg.

The Xe series architecture is not as simple as a high-performance discrete graphics card. The starting product positioning spans four areas at once, Xe-LP for consumption and low power consumption, Xe-HP for data centers, and Xe- optimized for games. HPG, and even Xe-HPC for high-performance computing clusters and supercomputers.

Don’t forget, the Xe series architecture is just an important puzzle to realize Intel’s ultra-heterogeneous computing concept, it is not the whole story. Intel’s real purpose is to form a set of hardware and software combined with each other through oneAPI+XPU, and it is also an inclusive and open system. oneAPI not only provides a common and open programming environment, it also cooperates with multiple standard-setting organizations, industry and academic organizations to seek product specifications that can achieve interoperability and interchangeability.

Compared with some self-enclosing, self-produced and self-sold software and hardware vendors, Intel is taking a completely different path. They are constantly trying to use their leadership and influence to promote the common progress of the entire industry. Therefore, the DPC++ language and library of the oneAPI specification are open to the public, and other hardware manufacturers are also encouraged to join in and optimize the hardware through oneAPI.

While using the oneAPI software platform to increase its appeal, XPU provides strong hardware support. Intel’s CPU, GPU, AICS, and FPGA have personally built a model of a complete software and hardware ecosystem. I believe this is also the next few decades. The advertised direction of chip manufacturers.

There is no doubt that Intel is still following technological ideals to drive change, using technology, leadership and influence to continue to promote the formation of the oneAPI+XPU ultra-heterogeneous computing concept to achieve the unified software and hardware

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store