Parallel V/S Distributed Computing

Published in

Big Data Center of Excellence

2 min readJul 31, 2019

We can say that there’s a fine line or overlapping patches between parallel and distributed computing. As parallel computing may be defined as the tightly coupled form of distributed computing.

Now the question arises that if both are similar than whats the need of a distributed system ? The answer is there is limit to extend the configuration on a system but clustering many systems give great configurations.

It is quite opposite of what we call as “Virtualization”. In virtualization we pretend many virtual systems compose of a real system whereas in distributed computing many real system comprise of a virtual system.

In parallel computing, all processors may have access to a shared memory to exchange information between processors.
In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors

Distributed computing allows different users or computers to share information. Distributed computing can allow an application on one machine to leverage processing power, memory, or storage on another machine. It is possible that distributed computing could enhance performance of a stand-alone application, but this is often not the reason to distribute an application. Some applications, such as word processing, might not benefit from distribution at all. In many cases, a particular problem might demand distribution. If a company wishes to collect information across locations, distribution is a natural fit. In other cases, distribution can allow performance or availability to be enhanced. If an application must run on a PC and the application needs to perform lengthy calculations, distributing these calculations to faster machines might allow performance to be enhanced.

Distributed computing and big data

Distributed computing is used in big data as large data can’t be stored on a single system so multiple system with individual memories are used.

Big Data can be defined as a huge dataset or collection of such huge datasets that cannot be processed by traditional systems. Big Data has become a whole subject in itself which consists of a study of different tools, techniques and frameworks rather than just data. MapReduce is a framework which is used for making applications that help us with processing of huge volume of data on a large cluster of commodity hardware.

Why MapReduce?

Traditional systems tend to use a centralized server for storing and retrieving data. Such huge amount of data cannot be accommodated by standard database servers. Also, centralized systems create too much of a bottleneck while processing multiple files simultaneously.

MapReduce solves such bottleneck issues. MapReduce will divide the task into small parts and process each part independently by assigning them to different systems. After all the parts are processed and analyzed, the output of each computer is collected in one single location and then an output dataset is prepared for the given problem.

Parallel V/S Distributed Computing

Distributed computing and big data

Why MapReduce?

Written by Varun Singh Negi