Guide to install and run Hadoop on Windows
--
Hadoop is a software framework from Apache Software Foundation which is used to store and process Big Data. In this article I’ve compiled the steps to install and run Hadoop on Windows
Prerequisite:
Install Java Development Kit: https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html
Install Visual C++ and other runtimes: https://www.computerbase.de/downloads/systemtools/all-in-one-runtimes/
- Download Hadoop:
Download Hadoop 3.1.0: https://mirrors.huaweicloud.com/apache/hadoop/core/hadoop-3.1.0/
Open Winrar as Administrator
Extract the tar file
2. Setup System Environment variables:
Search for “environment” in start menu search bar
Click on Environment variables
Click on New and create a new variable called HADOOP_HOME and paste the path of the Hadoop bin file in variable value
Click on New and create a new variable called JAVA_HOME and paste the path of the java bin folder in variable value
Click on Path and click on Edit
Click on Edit and add the paths for Java and Hadoop here
3. Configurations:
Open the etc folder in the Hadoop directory
Open core-site.xml in the Hadoop directory using notepad and copy this in xml property in the configuration of the file
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Open mapred-site.xml file with notepad and copy this property in the configuration
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Create a folder called ‘data’ in the Hadoop directory
Create two folders called datanode and namenode inside the data folder
Open the hdfs-site.xml using notepad and copy the below configuration
Note: The path of namenode and datanode would be the path of the datanode and namenode you just created
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>E:\hadoop-3.1.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>E:\hadoop-3.1.0\data\datanode</value>
</property>
</configuration>
Open yarn-site.xml and change the configuration
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Open Hadoop-env.cmd using notepad and set the path of JAVA_HOME
4. Install Windows OS specific files:
Download the bin folder from : https://github.com/s911415/apache-hadoop-3.1.0-winutils
Replace this bin folder with the one currently in Hadoop directory
5. Verify:
Verify if Hadoop is installed by running the following command:
hadoop version
6. Format the namenode:
hdfs namenode –format
7. Change the directory to sbin folder:
cd E:\hadoop-3.1.0\sbin
8. Start datanode and namenode:
start-dfs.cmd
Two separate cmd windows will open for namenode and datanode
9. Start yarn:
start-yarn.cmd
Two separate cmd windows will open for yarn resource manager and yarn node manager
Note:If you get the error:NoClassDefFoundError org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager
copy “hadoop-yarn-server-timelineservice-3.x.x” from ~\hadoop-3.x.x\share\hadoop\yarn\timelineservice to ~\hadoop-3.x.x\share\hadoop\yarn folder.
And
If the error is permissions is set incorrectly then open Command prompt as Administrator
10. Start Hadoop in browser:
Address for namenode information:
localhost:9870
Address for nodemanager:
localhost:8042
Hadoop is now installed
Cheers✌