Image source : http://www.kdnuggets.com/wp-content/uploads/big-data-visualization.jpg
Installation of Hadoop 2.7.3

Pre-requisite for Hadoop :

Java
 
Install jdk-8 : sudo apt-get install openjdk-8-jdk
set java home : gedit /etc/environment
Put below line at end of file
JAVA_HOME=“/usr/lib/jvm/java-8-openjdk-amd64”
check : java -version
If it’s showing version number then you have installed successfully.

Ssh

Install ssh : sudo apt-get install ssh
Generate public key:

ssh-keygen -t rsa -P “”

Make generated public key authorized :

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
check : ssh localhost

Now, We’ll start installation of hadoop.

Step 1 : Download Hadoop
http://hadoop.apache.org/releases.html

Step 2 : Unpack the downloaded file.

tar -xvzf [path and name of file with extension]

step 3 : Move the folder ‘ hadoop-2.7.3’ to ‘home’ directory.

step 4 : Configure the hadoop variables to ‘bashrc’ file

open bashrc file : gedit ~/.bashrc
At the end paste the below lines : 
 
# Start of Hadoop Variables
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
 export HADOOP_HOME=/home/lee/Project/hadoop-2.7.3
 export PATH=$PATH:$HADOOP_HOME/bin
 export PATH=$PATH:$HADOOP_HOME/sbin
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export HADOOP_OPTS=“-Djava.library.path=$HADOOP_HOME/lib”
#End of Hadoop Variables

Note : You have to check only 2 things here
 1. Check JAVA_HOME.Whether it is same or not
 2. Check HADOOP_HOME.Whether it is stored in same folder.
 Do changes according to that.

Note : Run below command to save to hadoop variables in system.
source ~/.bashrc

step 5 : Make a directory where hadoop resides.

mkdir hadoop_store

Make a directory in hadoop_store.

mkdir hdfs

Make another 2 directory in hdfs.

mkdir namenode
 mkdir datanode

Make a directory in hadoop-2.7.3 folder.

mkdir tmp

Step 6 : Modify Hadoop Config Files

We are going to modify following files:

hadoop-env.sh

hdfs-site.xml

core-site.xml

mapred-site.xml.template
 
Note : All file resides in hadoop-2.7.3/etc/hadoop
 
hadoop-env.sh :

open file : gedit hadoop-env.sh

Add below line to end of file :

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

hdfs-site.xml :

open file : gedit hdfs-site.xml

Add below line in between <configuration> </configuration> tag.

<property>
 <name>dfs.replication</name>
 <value>1</value> 
 </property><property>
 <name>dfs.namenode.name.dir</name>
 <value>file:######NAMENODE_FOLDER_PATH######</value>
 </property><property>
 <name>dfs.datanode.data.dir</name>
 <value>file:######DATANODE_FOLDER_PATH######</value>
 </property>

Note : Set your path of datanode and namenode in <value> tag.
 In my case, It’s like this : 
For namenode : <value>file:/home/lee/Project/hadoop_store/hdfs/namenode</value>
 For datanode : <value>file:/home/lee/Project/hadoop_store/hdfs/datanode</value>

core-site.xml :

open file : gedit core-site.xml
Add below line in between <configuration> </configuration> tag.
<property>
 <name>hadoop.tmp.dir</name>
 <value>######TMP_FOLDER_PATH######</value>
 </property> 
 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:54310</value> 
 </property>

mapred-site.xml :

Before opening this file fire the below command :

cp mapred-site.xml.template mapred-site.xml
open file : gedit mapred-site.xml

Add below line in between <configuration> </configuration> tag.

<property>
 <name>mapred.job.tracker</name>
 <value>localhost:54311</value> 
 </property>

Step 7 :

  • format Hadoop Filesystem

Before we start Hadoop we need to format the Hadoop filesystem.

$hadoop namenode -format
  • start hadoop
$start-all.sh
  • Run jps to see running process
$jps
  • Stop hadoop
$stop-all.sh

Visit localhost:50070 in browser.

Hadoop Installtion completes Here :)