The Power of Parallel Programming: Why It’s a Necessity

from Moore’s law to the need of multicore processors

Durganshu Mishra
7 min readOct 8, 2023
Photo by Rafael Pol on Unsplash

In the ever-evolving landscape of computing, one concept has become not just a trend but a necessity: parallel programming. The days of exponentially increasing clock speeds driving improvements in single-core performance, as predicted by Moore’s Law, are behind us. Instead, the future of computing lies in the realm of parallelism.

Moore’s Law is falling behind :(

Picture this: colossal amounts of data pouring in from every corner of our digital world, complex simulations demanding more computational power than ever, and artificial intelligence algorithms hungry for processing capabilities. In the face of these challenges, the single-core processor is no longer the hero of the story. To meet the demands of modern computing, we need to harness the power of parallel programming.

In this series of articles, we embark on a journey through the world of parallel programming, exploring its profound significance and practical applications. In this inaugural installment, we’ll lay the foundation by addressing two fundamental questions: Why is parallelism a necessity, and what are the tangible benefits it brings to the table?

As we delve into the heart of parallel programming, you’ll discover that it’s not just a choice for optimizing software performance; it’s a requirement in today’s digital landscape. By the end of this article, you’ll gain a deeper understanding of why parallelism is indispensable and how it revolutionizes the way we approach computing challenges. So, fasten your seatbelts as we embark on a journey to uncover the transformative power of parallel programming.

By The Office on Giphy

A brief history of parallelism

Parallel computing, with its roots dating back to the early days of computing, has a storied history. In the early 1970s, the Illiac-IV was a pivotal milestone when it was delivered to NASA Ames Research Center. Despite its conceptualization beginning in 1952, construction only commenced around 1966. The remarkable Illiac-IV featured a single core housing 64 processing elements in a 4x4x4 cube formation, each functioning as an autonomous mini-computer. It delivered remarkable computational prowess, achieving up to 200 MFLOP/s, a groundbreaking record for its time.

Illiac -IV in 1972: 200 MFLOP/s

Apple iPhone 8 in 2017: 297 MFLOP/s

Illiac-IV By Steve Jurvetson from Menlo Park, USA — Flickr, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=755549

Vector processing, a groundbreaking advancement in the realm of high-performance computing, took center stage with the introduction of the iconic Cray-1 supercomputer. While the Illiac-IV excelled in parallelism, the Cray-1 harnessed the transformative potential of vector processing. Conceived by the visionary Seymour Cray in the mid-1970s, this pioneering machine harnessed specialized hardware to execute operations on entire arrays of data within a single instruction. In addition to its 8 general-purpose registers and 8 floating-point registers, the Cray-1 featured 8 vector registers, each boasting a 64-bit width, unlocking unprecedented computational power.

Cray-1 By Irid Escent — 20180227_132902, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=85791445

The Cray-1 achieved peak performance, reaching an astounding 240 MFLOP/s, setting a new standard for high-performance computing. Cray Research’s subsequent models continued to push the boundaries, with companies like Hitachi and Fujitsu following suit in their own product offerings. Remarkably, vector processing remains as relevant today as it was in its inception, serving as a foundation for modern computing architectures.

Since then, a continuous stream of innovations has transformed the landscape of computing. From the inception of Cluster Computing, which interconnected multiple PCs to amplify computational power, to the integration of external accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field-Programmable Gate Arrays), parallel computing has undergone a profound and ongoing evolution.

The TOP500 list is a well-known ranking of the world’s most powerful supercomputers. It is published twice a year, typically in June and November, by a group of international experts in high-performance computing. The list ranks supercomputers based on their performance in a standardized benchmarking test known as the High-Performance Linpack (HPL) benchmark.

The Need for Parallelism

Moore’s Law and the End of Single-Core Performance

Gone are those days when just upgrading your processor would speed up your application by 3x. Better or for worse, the single thread performance is saturating, and people keep boasting that ‘Moore’s Law is Dead!!!’. So, what exactly is Moore’s law?

Moore, who co-founded Intel Corporation, noted that the number of transistors on a semiconductor chip was doubling approximately every two years, leading to a consistent increase in computing power while reducing the cost per transistor. Despite being widely called as a ‘law’, it was an empirical observation and prediction. Essentially, it states that:

Every 18 to 24 months, the number of transistors on a microchip will double, while the cost per transistor will halve.

Moore’s Law had profound implications for the technology industry, as it drove the development of increasingly powerful and smaller electronic devices. It became a guiding principle for the semiconductor industry and a source of inspiration for innovation and investment. For decades, semiconductor companies focused on increasing the number of transistors in their microprocessors and the increase in performance followed almost in parallel.

Source: karlrupp/microprocessor-trend-data: Data repository for my blog series on microprocessor trend data. (github.com)

However, several factors are contributing to the perception that Moore’s Law is coming to an end or at least slowing down.

A pivotal turning point occurred around 2005 when it became evident that the linear correlation between the increase in transistor count and single-threaded performance was no longer as pronounced. Simultaneously, CPU clock frequencies had already approached saturation levels, further underscoring the challenges in sustaining traditional performance scaling.

So, why is this happening?

With the limited space vendors have on their microprocessors, they can only fit a certain number of transistors. Until now, with advancements in nanotechnology and related manufacturing regimes, it was possible to shrink the size of transistors, thereby resulting in more transistor density on the exact size of the board. However, as transistors shrink, they approach the fundamental limits of atomic and quantum physics. When transistors become too small, quantum effects such as tunneling start to dominate, making it challenging to maintain reliable and predictable behavior.

Similarly, further miniaturization of transistors requires increasingly complex manufacturing processes and materials. These challenges not only increase costs but also limit the rate of progress.

In addition to the economic constraints and feasibility of manufacturing processes, heat dissipation becomes a significant issue. High power densities lead to overheating and reduce chips’ reliability, lifespan, and performance.

One perspective on why many people believe that Moore’s law is nearing its end is that the doubling of transistor density no longer means that the cost is getting halved.

While this may be a matter of one’s perspective, what truly matters here is that alternatives must be found, and progress made.

Data-Intensive and Computationally Intensive Applications

Modern applications, including data analytics, AI, and simulations, crave immense computing power. Data analytics, for example, must swiftly process colossal datasets. This involves sorting, filtering, aggregating, and generating insights from vast information. Parallelism splits these tasks into smaller, concurrent chunks, hastening analysis and enabling rapid data-driven decisions.

Similarly, training AI models, particularly deep learning neural networks, is computationally intensive. It involves processing large datasets and adjusting model parameters through numerous iterations to optimize performance. Parallelism expedites model convergence by distributing computation across multiple cores or GPUs, allowing larger, precise models to be trained within feasible timeframes.

By Étienne Jacob on Colossal

Scientific and engineering simulations like weather forecasting and fluid dynamics involve intricate numerical calculations. These simulations often involve solving partial differential equations and running simulations over extended periods. Parallelism divides simulations into smaller solvable segments, slashing time and facilitating high-resolution modeling, thus advancing scientific research.

Big Data and Scalability

In recent years, there has been an unprecedented explosion of data generated from various sources, including social media, IoT devices, sensors, e-commerce, scientific research, and more. This deluge of data is commonly referred to as “big data,” characterized by its immense volume, high velocity, and diverse variety of formats. For instance, social media platforms generate billions of posts, images, and videos daily, while IoT devices continuously collect data on environmental conditions and machine performance. Genomic research and e-commerce platforms also contribute to this data tsunami.

How do we handle this insane amount of data?

Scalability is the capability of a system to handle increasing workloads and accommodate the ever-expanding datasets without compromising performance. In the context of big data, scalability is vital for several reasons. It allows organizations to manage growing data volumes effectively by horizontally scaling and adding more computing nodes to handle larger datasets without performance degradation. Scalable systems also ensure rapid processing to support real-time analytics and decision-making. Furthermore, they offer cost-efficiency by enabling the allocation of resources as needed, reducing upfront hardware costs, and optimizing resource utilization.

The combination of parallelism and scalability is essential to tackle the challenges posed by the influx of big data, empowering organizations to extract valuable insights and drive innovation.

This brings us to the end of this article. In this post, we discussed briefly on how parallelism emerged as a natural choice for extracting maximum performance and how its relevance will keep increasing. In the upcoming posts, we’ll discuss about the fundamentals of parallel programming, specifically talking about the theoretical and programming aspects of parallel applications. Until then, keep parallelizing!

--

--