CODEX

What is Apache Hadoop in Big Data

Published in

CodeX

9 min readFeb 4, 2021

Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was originally developed to support distribution for the Nutch search engine project. Doug, who was working at Yahoo! at the time and is now Chief Architect of Cloudera, named the project after his son’s toy elephant. Cutting’s son was 2 years old at the time and just beginning to talk. He called his beloved stuffed yellow elephant “Hadoop” (with the stress on the first syllable). Now 12, Doug’s son often exclaims, “Why don’t you say my name, and why don’t I get royalties? I deserve to be famous for this!”

The Apache Hadoop framework is composed of the following modules

Hadoop Common: contains libraries and utilities needed by other Hadoop modules
Hadoop Distributed File System (HDFS): a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster
Hadoop YARN: a resource-management platform…

CODEX

What is Apache Hadoop in Big Data

The Apache Hadoop framework is composed of the following modules

Written by Bhanwar Saini