Exploring GPU Architecture: Trends, Challenges, and Solutions for Performance, Security, and Reliability

9 min readApr 6, 2023

Talk by Adwait Blog by RSM

Intro to GPU

Graphics Processing Units, or GPUs, are specialized computer chips that are designed to handle and process large amounts of visual data quickly and efficiently.

They are primarily used to power the graphics and video processing of computers, gaming consoles, and other digital devices, and are essential for rendering complex images and videos in real-time.

GPUs are different from traditional CPUs (Central Processing Units) in that they are optimized for performing a large number of simple calculations simultaneously, whereas CPUs are designed to handle more complex calculations one at a time.

As a result of their specialized design, GPUs are ideal for tasks such as gaming, video editing, 3D modeling, scientific simulations, and machine learning. They can also be used for general-purpose computing tasks that require a lot of parallel processing power.

Trends and GPU’s

The demand for GPUs continues to grow rapidly, driven by trends such as the increasing adoption of AI and machine learning, the rise of gaming and esports, and the need for high-performance computing in scientific research and other fields.

As Jog said, the challenge facing the GPU industry is scalability, both in terms of scaling out to meet increasing demand and scaling up to provide larger GPU portions with more cores. As GPUs become more complex and powerful, it becomes increasingly difficult to maintain efficient and reliable operation at scale.

The Security and reliability are also major challenges for GPUs, as they are for any computing system. Side-channel attacks and other vulnerabilities can compromise the integrity and confidentiality of data processed by GPUs, while faults and other issues can cause system failures and data loss.

To address these challenges, researchers are working on a range of approaches, including new cache and interconnect designs, memory hierarchy optimizations, and improvements to compute and memory utilization. They are also developing techniques to mitigate side-channel attacks and improve the reliability of GPU systems, such as fast and accurate reliability analysis and low-overhead fault protection.

As Jog told us that Efficiency is another area of focus, as GPUs can be power-hungry and generate significant amounts of heat. Researchers are exploring ways to reduce energy consumption while maintaining high performance, such as through more efficient memory access and management.

Overall, the GPU industry faces a range of challenges as it continues to grow and evolve, but researchers are working hard to develop innovative solutions to address these challenges and improve the performance, efficiency, and reliability of GPU systems.

GPU Architecture

As we know GPU architecture is designed to handle massive amounts of parallel processing for high-performance computing tasks, particularly for graphics and visual processing.

Discrete GPUs have their own processing power and memory separate from the CPU, which allows them to offload computationally intensive tasks and free up the CPU for other tasks.

At the heart of a GPU is the Streaming Multiprocessor (SM), which contains multiple Processing Elements (PEs) that can execute instructions in parallel. Each SM also includes a Load/Store (LD/ST) unit to handle data transfer to and from memory.

To improve memory access efficiency, GPUs use memory access coalescing, which combines multiple memory accesses into a single operation to reduce the number of memory transactions needed.

The Miss-Status Holding Register (MSHR) is a buffer in the GPU’s memory controller that holds data about memory access requests that have missed the cache. This helps to improve memory access efficiency by reducing the time spent waiting for data to be fetched from memory.

Overall, the GPU’s execution model is designed to maximize parallelism and minimize memory access latency, allowing for efficient and high-performance processing of large amounts of data.

GPU Security Research

As we know GPUs have become an essential part of high-performance computing, However, GPUs are also vulnerable to various security threats, including timing attacks. In a timing attack, an attacker measures the execution time of a cryptographic operation to extract sensitive information such as encryption keys.

Timing attacks on GPUs can be performed during the last round of execution, where a small change in the plaintext can cause a significant change in the ciphertext. This correlation can be exploited by attackers to extract the encryption key. To mitigate these attacks, various solutions have been proposed.

As Jog said , One solution is to disable coalescing altogether, which can prevent correlation timing attacks but causes a significant performance degradation of up to 178%. Furthermore, the degradation increases with plaintext size, making this solution unsuitable for practical use.

Randomized coalescing is another solution that introduces randomness to the memory access pattern to prevent correlation timing attacks. However, this approach incurs a high performance overhead and is vulnerable to caches and Miss-Status Holding Registers (MSHRs).

A software-based solution was proposed that addresses leakage in the AES application by redundantly managing data. This approach offers good security, but it is only applicable to the AES application.

To balance security and performance, a bucketing-based memory coalescing approach called BCoal was proposed. This approach bucketed coalesced memory accesses to reduce variance and thus prevent correlation timing attacks. By adjusting the number of buckets, BCoal can provide a tradeoff between security and performance. BCoal was found to be secure against timing attacks and efficient in terms of performance, generating less DRAM traffic and achieving optimal coalesced accesses.

Overall, GPU security research has led to the development of various solutions to mitigate timing attacks, and BCoal has shown promising results in achieving a tradeoff between security and performance. However, further research is necessary to ensure that these solutions are secure against sophisticated attacks and are practical for real-world use.

GPU Reliability Research

GPUs, like any other computing system, are susceptible to various types of faults and errors that can impact their performance and reliability. Soft errors, caused by high-energy radioactive particles like cosmic rays, can result in bit flips and affect the accuracy of computations. Permanent and other faults can also lead to crashes, hangs, and other issues. One of the most significant concerns related to errors in GPUs is the risk of silent data corruption (SDC), where incorrect output is produced without any indication of a problem. SDCs can be particularly dangerous in critical applications, where the consequences of inaccurate results can be severe. For example, in healthcare applications, SDCs can result in incorrect diagnoses or treatment recommendations, potentially causing harm to patients. Long-running applications can also be severely impacted by errors in GPUs, as the accumulation of errors can lead to a degradation of performance over time.

RISK Handling

ECC To mitigate these risks, various solutions have been proposed, including fault-tolerant designs, error correction codes (ECC), and redundancy techniques. While error checking and correction (ECC) is a widely used protection mechanism in GPUs, it has its limitations. ECC can protect against single-bit errors and detect double-bit errors, but it can be prohibitively expensive to implement protection against multi-bit errors. Other techniques, such as duplication/triplication and selective re-computation, can be used to provide additional protection. Duplication/triplication involves performing the same computation on multiple GPUs and comparing the results to detect errors. Selective re-computation involves identifying the parts of the computation that are affected by errors and re-computing only those parts.

Checkpointing

Checkpointing is another technique used to provide fault tolerance in GPUs. Checkpointing involves periodically saving the state of the computation and restarting from the last saved state if an error occurs. This technique can be particularly useful for long-running computations where the accumulation of errors can lead to degraded performance over time.

However, even with these protection mechanisms in place, current protection may not be enough, especially as new techniques, such as low-voltage GPU caches, can lead to high error rates. This highlights the need for continued research and development of new protection mechanisms that can provide more comprehensive and cost-effective solutions for error detection and correction in GPUs. When considering which fraction of memory to protect in GPUs, it is essential to analyze the application’s memory access patterns. Typically, a small fraction of memory is highly accessed or highly shared, while the rest of the memory is accessed less frequently. This is known as the “hot memory” problem. Protecting the entire memory can be prohibitively expensive, so it is essential to identify the hot memory regions that are critical for the application’s correctness and performance. In hot memory regions, the impact of faults is typically more severe, and the output can be highly sensitive to errors. To identify hot memory regions, application profiling can be used to understand memory access patterns and the frequency of access to different memory locations. One approach is to analyze the source code and identify memory-intensive parts of the application that are likely to have hot memory regions. Profiling tools can then be used to gather data on memory access patterns and identify the specific memory locations that are highly accessed or highly shared.

P-BICG

P-BICG (Parallel BiConjugate Gradient) algorithm is a memory-intensive application commonly used in scientific computing. Source code profiling can be used to identify the parts of the code that are memory-intensive and likely to have hot memory regions. Memory access pattern analysis can then be used to identify the specific memory locations that are highly accessed or highly shared, and prioritize protection mechanisms for those regions.

Secure vs Reliable

When it comes to GPU protection mechanisms, there is a tradeoff between security and reliability. Replicating hot memory for detection and correction purposes can improve reliability by reducing the chances of silent data corruption (SDC) caused by faults or errors. However, replicating data also increases the risk of side-channel attacks, which can compromise security.

To balance security and reliability, it is important to implement a redundant data management approach that minimizes the overhead while improving both security and reliability. This can involve identifying the small fraction of memory that is highly accessed or highly shared and replicating only that portion of memory. This helps to minimize the risk of side-channel attacks while improving reliability by detecting and correcting errors. In terms of implementation, GPUs’ latency tolerance feature can help keep overheads low by allowing the replication mechanism to work in the background without impacting overall system performance. Additionally, source code profiling can help identify hot memory and determine the most effective replication strategy. Reliability evaluation is important to measure the effectiveness of the protection mechanisms. In one study, a replication mechanism resulted in a 98.97% drop in SDC outcomes, indicating significant improvement in reliability. However, it is also important to consider the overhead due to data replication and ensure that it does not negatively impact overall system performance. In summary, redundant data management can help improve both security and reliability at a low overhead cost. By identifying hot memory and replicating only the necessary portions, it is possible to balance security and reliability while minimizing the risk of side-channel attacks and maintaining overall system performance.

Conclusion

In conclusion, GPUs have become increasingly important in the computing industry due to their high performance and energy efficiency. However, as the demand for GPUs increases, so do the challenges in terms of scalability, security, and reliability. Soft errors caused by high-energy particles can lead to silent data corruption, crashes, and other issues, making it important to protect against faults in GPU memory. Replication mechanisms, such as bucketing-based memory coalescing, can improve reliability while keeping performance overhead low. It is also essential to consider the tradeoffs between security and performance when selecting security solutions for GPUs. Overall, while GPUs offer numerous benefits, it is crucial to address the challenges and limitations associated with their use to ensure their continued success and relevance in the computing industry.

References :