Explore Coding
Published in

Explore Coding

Distributed Computing and Apache Hadoop: Why? and How?

A dashboard with user statistics : It provides us with a glimpse about the vast scale of data genaration in today’s interner-era computer systems. Picture credits: https://unsplash.com/photos/JKUTrJ4vK00?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink
Amount of data generated from users before, and after the introduction of the commercial Internet

Distributed computing paradigm

Clients submitting it’s tasks to the distributed computing cluster

Apache Hadoop

Hadoop manages computing nodes through the Resource manager node
Hadoop Distributed File System

How Hadoop operates

Clients submitting jobs
App master is created
App master talks to the HDFS
App master takes care of the job executions

A Few thoughts on job scheduling

First-in First-out (FIFO)

Capacity Scheduler

Fair Scheduler

In the end



Stories from developers exploring different dimensions of programming. Kudos to https://www.freepik.com/vectors/man for the awesome publication icon.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tharindu Bandara

Cloud and AI Researcher | Ex-Senior Full-Stack Engineer@WSO2 | IAM Specialist | Ph.D. Student@ University of Melbourne