Distributed Shared Memory Programming for Hadoop, MapReduce, and HPC architectures

Shared memory distributed programming paradigm enables various tasks to communicate with each other by sharing the memory from different disk locations. In this model, the memory is shared and distributed across several locations of the disks at different memory locations. Through synchronization, multiple threads of a single process access the address space of memory locations by writing and reading to the same location. However, the synchronization mechanism also ensures the sequence of the read and writes operations in which various tasks may run into deadlock situations, as both the tasks might be looking to write to the same address space with the same key. Such contention of deadlocks to keep the integrity of the data can be accomplished leveraging locks, barriers, and semaphores. A semaphore operation can enable multiple distributed tasks with post and wait operations in queue. The lock paradigm performs lock and unlock operations to the specific shared memory address space to prevent simultaneous writing. The barrier works on task to prevent other tasks to process, till all the tasks reach the same point of event. MapReduce framework in Hadoop leveraged shared memory distributed programming with a built-in functions to handle distributed shared memory mechanism. MapReduce also provides a clearer storage layer for accessing the address space from any distributed cluster through shared-memory programming. MapReduce has the internal functions for synchronization and barrier between the map and reduce phases. HDFS natively provides the storage layer for shared memory distributed programming for the tasks to access the memory across HDFS.

The first MPI industry standard was constructed in 1996 dubbed MPI-1. However, MPI-1 had scalability issues. The development of MPI-2 in the subsequent years improved the scalability. The High Performance Computing communities embraced MPI-3 in 2013. MPI (Message passing interface) is a standard protocol for data exchange through messages on distributed computing clusters. The idea of message passing interface originated as early as in the late 1980s when CERN started building Internet applications for the corporation through the message-based network that are packetized. The message streams through standard TCP (Transmission Control Protocol) or through UDP (which is independent message — User Datagram Protocol) through communication partners. MPI also specifies the storage location of the message in the memory upon arrival, the origin of the message, and the sequence of the delivery of the message. MPI supports large-scale parallel computing with heterogeneous architectures and aggregation operations such as Scatter, Broadcast, and Gather on divergent platforms and disparate vendors with greater portability.

The distributed computing with MPI in the cloud-computing environment performing on High Performance Computing clusters can be challenging especially with SPMD process (Single program, Multiple Data) for processing millions of parallel processes. Synchronization with messaged passing distributed programming with shared memory distributed programming can be complex especially as the time consumption on the transfer of messages as opposed to the synchronization of messages can be exponential. As an analogy, when thousand people are in an amusement park, things operate smoothly, but the magnitude of the queue for the rides, the capacity of the walkways becomes increasingly difficult when there are millions of people. Similarly, the contention of shared memory and shared resources reaches a critical point when the capacity of the network bandwidth exceeds the supply of the simultaneous processes of data overloads the system. The system with so many concurrent processes tends to be complex to debug especially a single program that runs around a million processes. STAT is one of the debugging tools that can aid debugging the performance bottlenecks in a distributed cloud computing environment.

In contrast to shared memory distributed programming, where the tasks share the address space for performing the write and read operations over distributed clusters on disks or memory locations, messaged passed distributed programming exchange communication through the messages. The limitations with message passed distributed programming particularly when processing large-scale data processing in cloud computing environment are memory overloads, data overload operations, and latency over the network. The message passing distributed programming does not get the native support from the hardware of the system as the communication occurs through the messages. The standard specification of communication is from the Message Passing Interface (MPI) that transmits the message for both sending and receiving. Unlike, shared memory distributed programming, the message passing distributed programming requires programming to encode the functions for partitioning the tasks and exchange the communication of the messages in a specific pattern. Considering the location of the data is across the cloud-computing environment, the latency can be exponential when accessing the remote data lakes as opposed to accessing on-premise with localization of the data due to network latency and synchronization data points could lead into potential bottlenecks due to large-scale data processing. Both shared memory distributed programming and message passing distributed programming are insufficient to handle the large-scale data natively in a cloud computing environment as they require either pre-development efforts in case of message passing distributed programming to exchange the messages in the environment and shared memory distributed programming require post-development efforts for replication or migration of the data. When there are large group of people accessing the data which resides in the critical sections of the memory, the contention for deadlocks increases exponentially along with the network latency. Also, in both cases, the distributed programs can introduce synchronous and asynchronous modes to process the data. In asynchronous mode, the data gets posted immediately without having to wait for any contention, thus improving the performance of the distributed tasks. However, it can cause potential issue with the consistency of the data in case of excessive data transfers. In case of synchronous programs, they introduce the barriers and locks causing delays on the performance of the system. However, it can significantly improve the consistency and integrity of the database. The other challenges in the development effort in a cloud computing environment required for both shared memory distributed programming and message passing distributed programming can stem from heterogeneous cloud environment. The heterogeneous cloud-computing environment deals with different operating systems, different network protocols, and different hardware computing environment, and the interfacing of multiple programming languages HDFS.

Organizations adopting cloud computing for extreme-data workloads are looking at introducing the paradigm of distributed shared memory programming to migrate to the cloud. The cloud computing is distributing environment offering several components at different levels such as networking infrastructure, hardware, or software. Each offering has its own advantages and disadvantages for better control and mixture of the CPU resources, execution of commands over the different operating systems, hot data, cold data storage service level definitions, access to applications. High performance computing clusters run message passed distributed programming and shared memory distributed programming. However, leveraging supercomputers on a timely basis or leasing them have impacts on economies of scale for the organizations. Therefore, leveraging cloud options can reduce the ownership of the hardware and number of applications for shared memory distributed programming, message passing distributed programming, and parallel programming.

Conventionally, any distributed shared memory programming can introduce bottlenecks due to multiple accesses to the critical sections of the memory address space. This is due to the large-scale data processing that requires stability, consistency, and integrity of the data and application. The pre-development and post-development efforts can be significant for shared memory distributed programming and message passing distributed programming accessing the kernel of the operating system, thus increasing the overhead to the system at an overall level. The architecture of building asynchronous programs for distributed tasks can reduce the performance bottlenecks and improve the scalability and consistency in the system. Such architectural model can be beneficial for MapReduce. Distributed shared memory operates on uniform memory access architecture and non-uniform-memory-access that have different access times to the processor’s memory. Coherency, synchronization, and consistency are the core factors to consider when designing a distributed memory architected system for large-scale data processing in a cloud-computing environment. With right pre-development and post-development programming efforts and fine-tuning the memory parameters can minimize all the bottlenecks in the system.


Anbar, A., Narayana, V. K., & El-Ghazawi, T. (2012). Distributed Shared Memory Programming in the Cloud. 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 707–708. http://dx.doi.org/10.1109/CCGrid.2012.48

Manoj, N., Manjunath, K., & Govindarajan, R. (2004). CAS-DSM: A Complier Assisted Software Distributed Shared Memory. International Journal of Parallel Programming, 32, 77–122. http://dx.doi.org/http://dx.doi.org.proxy.cecybrary.com/10.1023/B:IJPP.0000023480.82632.87

Prasad, S. K., Gupta, A., Rosenberg, A. L., Sussman, A., & Weems, C. C. (2015). Topics in Parallel and Distributed Computing (1 ed.). Burlington, Massachusetts: Morgan Kaufmann.

Sakr, S., & Gaber, M. (2014). Large Scale and Big Data: Processing and Management. Boca Raton, Florida: Auerbach Publications.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.