A quick guide to Multi-Node Hadoop Cluster setup for Beginners

  • Hadoop-3.1.3
  • JAVA 8
  • SSH
  • At least 2 laptop/desktop connected by LAN/Wi-Fi
  • masternode
  • slave
sudo vi /etc/hostname 
sudo vi /etc/hosts
192.168.1.4 masternode 192.168.1.23 slave 
ip addr show
service sshd restart
  • Command to generate SSH key in masternode: ssh-keygen
  • It will ask for folder location where it will copy the keys, I entered /home/username/.ssh/id_rsa
  • It will ask for pass phrase, keep it empty for simplicity.
  • Next copy the newly generated public key to auth file in your users home/.ssh directory. Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Next execute — ssh localhost to check if the key is working.
  • Next, we need to publish the key to the slave node. Command: ssh-copy-id -i $HOME/.ssh/id_rsa.pub <username>@slave
  • First time it will prompt you to enter the password and publish the key.
  • Execute ssh <username>@slave again to check if you are able to loging without password. This is very important. Without public key working,the slave node cannot be added to the cluster later.
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/data/nameNode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/data/dataNode</value> </property> <property><name>dfs.replication</name> <value>1</value> </property> </configuration>
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/data/datanode</value> </property> </configuration>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>

Issues I faced

Well as with any projects small or big, it wasn’t a smooth sailing as the documentation above looks like. I initially started with an aim to create a multimode cluster using windows machines but somehow the ssh server and client did not workout with the username convention that windows follows. So I used the next best option available to me — creating a cluster using Ubuntu on my old laptops. During setting up the configuration I faced couple of major hurdles:

  • With the jdk versions: I had installed Open JDK 11 and it looks like there is some incompatibility with the Open JDK 11 with that of Hadoop 3.1.3. The nodemanager daemon was not coming up due to this. After googling around I settled for Open JDK 8.
  • Username used for Hadoop installation in master and slave node: Initially I created two different username (like master & slave) in masternode and slave due to which the ssh command triggered by the Hadoop start-all scripts was failing as it was trying to use master@slave while I did not have ‘master’ username created on the slave node. This was resolved once I switched to the same username in both nodes.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Snigdha Sen

Snigdha Sen

I am currently pursuing PhD in Machine Learning and Big Data analytics from IIIT, Allahabad. I am a Data Science enthusiast.