The Journey to Distributed Systems: Part 4 — Distributed Computing

Williams Adu
The Andela Way
Published in
3 min readAug 8, 2019
Photo by Lorenzo Herrera on Unsplash

Similar to all that we’ve seen in our journey to distributed systems, computation with one computer is easy. A single computer has been enough for most tasks for years now. A single thread in a single process has mostly been a good tool for executing our day to day programs and tasks. If that is not enough, there is multithreading for parallel execution on a single core. Even more, modern computers come with multi-core processors and that helps in running concurrent execution of tasks. Ok, that’s good enough. Well, imagine we get to a stage where one computer isn’t enough and so we would like to distribute computation tasks across more than one computer — distributed computing. Whew! Don’t shake. Doing this now is getting better as compared to the past.

First, let’s explore why we need distributed computing. In our usual day, we just need to give our processor some input, it processes it and returns an output. All computational devices work in the same fashion. This is just enough until we need more computation resources. Imagine working a very large graphics or video rendering processing project or working on a complicated scientific problem. Most often, the processing power in a single computer is not enough. In that sense, we may benefit from breaking the large computational tasks into pieces and distribute them across a number of computers. Remember, all the computers need to work in unison so that the desired output/result is retrieved.

At this point, you may have realized that if you have a task, that needs to be performed in a single large step, that may not be suitable for distributed computing. You may have difficulty distributing a task where each intermediate step requires the result from the previous step. The best-suited cases are tasks where it can be broken down to multiple sub-tasks that can be computed independently of each other. Now, the real meat. Are you thinking of how this is achieved?

Similar to the master-slave architecture, there is a host computer and there are a bunch of helper computers. The host computer is responsible for running the main program, creating the sub-tasks and distributing them to the helper computers. The host takes the result from all the other computers and generates the final output.

MapReduce

Does distributing computing sound a bit like MapReduce? In a way, it does. It is a good way for people to visualize how distributing computing. MapReduce summarizes all the computation through 2 functions — map and reduce. If you really want to investigate further on this, feel free to look more into Hadoop and Spark.

Summary

Distributed computing can be fun and very effective for the right set of problems. Anyway, don’t forget that the CAP theorem still applies here.

In Part 1 — Introduction

In Part 2 — CAP theorem

In Part 3 — Distributed Storage

In Part 4 — Distributed Computing

--

--