Step by Step Guide for Hadoop 3 Installation for Windows Machine on Huawei Cloud

Published in

Huawei Developers

6 min readJun 21, 2023

Introduction

Hello all, I’m going to introduce “Step by Step Guide for Hadoop 3 Installation for Windows Machine on Huawei Cloud”. As a prerequisite, ECS (image of windows machine) and VPC installation (must same region) should be installed on the Huawei Cloud and then after connecting remotely, we can start our installation processes. Enjoy the reading.☕

What Is Hadoop?

Using straightforward programming concepts, the Apache Hadoop software library provides a framework for the distributed processing of massive data volumes across computer clusters. From a single server to thousands of devices, each providing local computing and storage, it is intended to scale up. The library itself is designed to identify and handle problems at the application layer rather than relying on hardware to provide high availability. As a result, a highly-available service is delivered on top of a cluster of computers, each of which may be prone to failures.

Advantages Of Using Hadoop

✅Scalable: Due to its ease of storing and distributing very huge datasets at once on servers that might be operated in parallel, Hadoop is a storage technology that is highly scalable.

✅Economical: When compared to conventional database management systems, Hadoop is very economical.

✅Quick: Hadoop uses clusters to manage data, offering a special form of distributed file system-based storage. The special ability of Hadoop to map data on clusters allows for faster data processing.

✅Flexible: Hadoop gives businesses the ability to quickly and easily access and process data to produce the values they need, giving them the tools they need to gain insightful knowledge from a variety of concurrent data sources.

✅Fault tolerant: One of Hadoop’s greatest advantages is its fault tolerance. By duplicating the data to another cluster node, this fault resistance is offered. In the event of a failure, the duplicated node’s data can be used to ensure data consistency.

So, how did we go about it!

We have many steps to start Hadoop 3 installation, these are below:

Step 1: Install Java JDK 1.8.x.xxx

Step 2: Download Hadoop and extract and put the C drive

Step 3: Set Path in Environment Variables

Step 4: Config files under Hadoop directory

Step 5: Create folder datanode and namenode under data directory

Step 6: Edit HDFS and YARN files

Step 7: Set Java Home environment in Hadoop environment

Step 8: Complete the set up and run the test it (start-all.cmd)

Let’s start 👇

Step 1: Make sure that Java is install your machine if not, you will be Java website and download/install there:

https://www.oracle.com/java/technologies/downloads/#java8

Oracle official website windows versions

After downloading and installing the Java, go to command prompt and check the Java version

Step 2: Open Hadoop Website and download the 3.x.x version. I use 3.2.4 version for this example.

https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz

After the download copy the zip file into the C drive and extract in here. (it takes almost 1 minute⏱)

Step 3: Set Path in Environment Variables for Java and Hadoop programs.

Step 3–1:

Click Advanced System Settings
Click Environment Variables
Click User Variables New Button
Add the Variable Name (HADOOP_HOME) + Variable Value (C:\hadoop-3.2.4)
Click path into the User Variables and add the path (%HADOOP_HOME%\bin)

User Variable Path Parameters for Hadoop

Step 3–2:

Click Advanced System Settings
Click Environment Variables
Click User Variables New Button
Add the Variable Name (JAVA_HOME) + Variable Value (C:\Program Files\Java\jre1.8.0_271)
Click path into the User Variables and add the path (%JAVA_HOME%\bin)

Step 4 We arrange and configure our documents

Step 4–1: Configure core-site.xml document

Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the core-site.xml document and click edit. On notepad document you will be add this code between the and save it.

<configuration>
    <property>
	    <name>fs.defaultFS</name>
	    <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Step 4–2: Configure mapred-site.xml document

Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.

<configuration>
    <property>
	    <name>mapreduce.framework.name</name>
	    <value>yarn</value>
    </property>
</configuration>

Step 5 Create folder datanode and namenode under data directory

We will create 3 folder:

Create folder “data” under “C:\hadoop-3.2.4”
Create folder “datanode” under “C:\hadoop-3.2.4\data”
Create folder “namenode” under “C:\hadoop-3.2.4\data”

Step 6: Edit HDFS and YARN files

Step 6–1: Configure hdfs-site.xml document

Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.

<configuration>
    <property>
	    <name>dfs.replication</name>
	    <value>1</value>
    </property>
    <property>
	    <name>dfs.namenode.name.dir</name>
	    <value>hadoop-3.2.4/data/namenode</value>
    </property>
    <property>
	    <name>dfs.datanode.data.dir</name>
	    <value>hadoop-3.2.4/data/datanode</value>
    </property>
</configuration>

Step 6–2: Configure yarn-site.xml document

Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.

<configuration>
    <property>
	    <name>yarn.nodemanager.aux-services</name>
	    <value>mapreduce_shuffle</value>
    </property>
    <property>
	    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
	    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

Step 7: Set Java Home environment in Hadoop environment

Step 7–1: Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the hadoop-env.cmd document and click edit. On notepad document you will be chane this code structure and save it:

@rem The java implementation to use.  Required.
set JAVA_HOME=C:\Progra~1\Java\jre1.8.0_271

@rem The jsvc implementation to use. Jsvc is required to run secure datanodes.
@rem set JSVC_HOME=%JSVC_HOME%

✅ C:\Progra~1\Java\jre1.8.0_271

❌ C:\Program Files\Java\jre1.8.0_271

after the configuration if we start to made with C:\Program Files\Java\jre1.8.0_271 for this configuration:

hadoop -version

In this example, since the JAVA_HOME path contains spaces, I received the following error:

JAVA_HOME is incorrectly set

Use “Progra~1” instead of “Program Files” Use “Progra~2” instead of “Program Files(x86)” After replacing “Program Files” with “Progra~1”, we closed and reopened PowerShell and tried the same command.

Step 7–2: Enter website: https://github.com/s911415/apache-hadoop-3.1.0-winutils and download zip folder.

After the download, extract the zip folder, copy folder bin and replace existing bin folder in C:\hadoop-3.2.4\bin folder. (click replace 4 item and accept)

Step 8: Complete the set up and run the test it

Step 8–1:Enter C:\hadoop-3.2.4\bin when inside the folder, click the path area and write cmd

hdfs namenode -format

and when we run the code and see this result, looks like everything is ok 🙂

hdfs namenode -format command run succesfully

Step 8–2:Enter C:\hadoop-3.2.4\sbin when inside the folder, click the path area and write cmd

start-all.cmd

and will be see 4 result to ok for this step:

Namenode cmd will be open and logs start to run
Datanode cmd will be open and logs start to run
Resourcemanager cmd will be open and logs start to run
Nodemanager cmd will be open and logs start to run

when we run the code and see this result, looks like everything is ok again 🙂🙂

Conclusion

On Hadoop Web UI, There are three web user interfaces to be used:

Name node web page: http://localhost:9870/dfshealth.html
Data node web page: http://localhost:9864/datanode.html
Yarn web page: http://localhost:8088/cluster

If you have any thoughts, suggestions please feel free to comment or if you want, you can reach me at guvezhakan@gmail.com, I will try to get back to you as soon as I can.

You can reach me through LinkedIn too.

Hit the clap button 👏👏👏 or share it ✍ if you like the post.

Elastic Cloud Server (ECS)

Huawei's Elastic Cloud Server (ECS) provides scalable compute resources for you to flexibly deploy applications and…

www.huaweicloud.com

Apache Hadoop

This is a release of Apache Hadoop 3.3 line. Key changes include A big update of dependencies to try and keep those…

hadoop.apache.org

Step by Step Guide for Hadoop 3 Installation for Windows Machine on Huawei Cloud

Introduction

What Is Hadoop?

Advantages Of Using Hadoop

So, how did we go about it!

Let’s start 👇

Conclusion

Elastic Cloud Server (ECS)

Huawei's Elastic Cloud Server (ECS) provides scalable compute resources for you to flexibly deploy applications and…

Apache Hadoop

This is a release of Apache Hadoop 3.3 line. Key changes include A big update of dependencies to try and keep those…

Written by Hakan GÜVEZ