Hadoop Multi-Node Cluster Setup

This blog aims to explain the process of setting up the Hadoop version 1 multi-node cluster. Anyone after going through this blog will be able to deploy the multi-node cluster of Hadoop v1 effectively with all the resources for the proper functioning of the program.

Published in

The Startup

4 min readOct 20, 2020

In this Big Data world, using a Distributed Storage filesystem is a must. Due to huge data, it is very difficult for anyone to cope up without taking the help of some distributed filesystem.

Brief about Hadoop!

Hadoop is one of the most popular Distributed filesystems, & it is widely adopted & used by multiple companies today. The hard reality is, Hadoop is been used at multiple companies, but there is no such proper article that explains the exact procedure of installing it. Moreover, multiple people & teachers even in the colleges say that, if you are already getting Hadoop installed by various distributions like Cloudera or HortonWorks, then you should not know the process of installation. As a consequence of this way of teaching leads to a zero understanding of the Hadoop Internals.

According to me, if you want to learn Hadoop & also want to see its power in reality, you should be able to install it by knowing its properties, because then only you will be able to tweak the working of the Hadoop according to your need.

Installing Hadoop Multi-Node Cluster!

Note: This blog aims to explain the installation process on any centos based OS or centos itself, like Redhat Linux, Amazon Linux, etc. In this blog, I have mentioned the links of the software which are only compatible with centos based OS, but anyhow, the process of installing the hadoop multi-node cluster, is exactly same in any other OS. The only difference will be you have to download the software compatible with that OS.

Software Required to install the cluster!

JDK version 8 hotspot version (Best compatible with Hadoop)
Hadoop 1.2.1

After, downloading the above Softwares, install them in all the machines which have to be used in the multi-node cluster using the commands shown below.

Command to install JDK:

rpm -ivh <jdk file name>

Command to install Hadoop:

rpm -ivh <hadoop file name> --force

This ends the installation process of the Hadoop cluster. You can verify the installation by running the “hadoop” command.

Configuring Hadoop Multi-Node Cluster!

After the installation of Hadoop, now the directory “/etc/hadoop” will be created.

Configuring Master Node!

Switch to this directory, then open “hdfs-site.xml” file with any of the text editors you like.

For basic connectivity between master & data node, write the property shown below between the “configuration” tags present inside the “hdfs-site.xml” file present in the node, which you want to act as the master node.

Property for the Master Node!

<property>
     <name>dfs.name.dir</name>
     <value>/namenode</value>
</property>

Now, create the directory “/namenode” in the master node. This name of the directory can be anything according to your interest but keep in mind, this name should be exactly the same in the file for configuration.

One more property has to be added into another file i.e. “core-site.xml”. Open the file & add the property shown below to that file.

Property for the Master Node!

<property>
     <name>fs.default.name</name>
     <value>hdfs://<public ip of the master node>:<port number on which master is running></value>
</property>

In the above property write the public IP & port of the master node at the place of placeholders.

Now, format the data node directory for the proper working of the cluster by the command given below.

hadoop namenode -format

The last step for the master node configuration is that the services of the namenode have to be started by running the command given below.

hadoop-daemon.sh start namenode

Service is started or not, it can be verified by the “jps” command. If services are started, then “Namenode” will be listed when the jps command is been executed.

Configuring Worker/Slave/Data Node!

Switch to this directory, then open “hdfs-site.xml” file with any of the text editors you like.

Property for the Worker Node!

<property>
     <name>dfs.data.dir</name>
     <value>/datanode</value>
</property>

Now, create the directory “/datanode” in the master node. This name of the directory can be anything according to your interest but keep in mind, this name should be exactly the same in the file for configuration.

One more property has to be added into another file i.e. “core-site.xml”. Open the file & add the property shown below to that file.

Property for the Worker Node!

<property>
     <name>fs.default.name</name>
     <value>hdfs://<public ip of the master node>:<port number on which master is running></value>
</property>

In the above property write the public IP & port of the master node at the place of placeholders.

The last step for the worker node configuration is that the services of the worker/data node have to be started by running the command given below.

hadoop-daemon.sh start datanode

Service is started or not, it can be verified by the “jps” command. If services are started, then “Datanode” will be listed when the jps command is been executed.

Note: For successful connectivity of the Data node to master node, either rules for connectivity has to be added to the firewall or the firewall has to be disabled, only then the connectivity between master & slave nodes will be there.

This concludes the aim of this article, if you have done the above-explained properly, then you will be able to run the multi-node cluster of Hadoop successfully now.

I hope my article explains each and everything related to the topic with all the deep concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to give an applaud to this blog!