Step by Step Guide for Hadoop 3 Installation for Windows Machine on Huawei Cloud
Introduction
Hello all, I’m going to introduce “Step by Step Guide for Hadoop 3 Installation for Windows Machine on Huawei Cloud”. As a prerequisite, ECS (image of windows machine) and VPC installation (must same region) should be installed on the Huawei Cloud and then after connecting remotely, we can start our installation processes. Enjoy the reading.☕
What Is Hadoop?
Using straightforward programming concepts, the Apache Hadoop software library provides a framework for the distributed processing of massive data volumes across computer clusters. From a single server to thousands of devices, each providing local computing and storage, it is intended to scale up. The library itself is designed to identify and handle problems at the application layer rather than relying on hardware to provide high availability. As a result, a highly-available service is delivered on top of a cluster of computers, each of which may be prone to failures.
Advantages Of Using Hadoop
✅Scalable: Due to its ease of storing and distributing very huge datasets at once on servers that might be operated in parallel, Hadoop is a storage technology that is highly scalable.
✅Economical: When compared to conventional database management systems, Hadoop is very economical.
✅Quick: Hadoop uses clusters to manage data, offering a special form of distributed file system-based storage. The special ability of Hadoop to map data on clusters allows for faster data processing.
✅Flexible: Hadoop gives businesses the ability to quickly and easily access and process data to produce the values they need, giving them the tools they need to gain insightful knowledge from a variety of concurrent data sources.
✅Fault tolerant: One of Hadoop’s greatest advantages is its fault tolerance. By duplicating the data to another cluster node, this fault resistance is offered. In the event of a failure, the duplicated node’s data can be used to ensure data consistency.
So, how did we go about it!
We have many steps to start Hadoop 3 installation, these are below:
Step 1: Install Java JDK 1.8.x.xxx
Step 2: Download Hadoop and extract and put the C drive
Step 3: Set Path in Environment Variables
Step 4: Config files under Hadoop directory
Step 5: Create folder datanode and namenode under data directory
Step 6: Edit HDFS and YARN files
Step 7: Set Java Home environment in Hadoop environment
Step 8: Complete the set up and run the test it (start-all.cmd)
Let’s start 👇
Step 1: Make sure that Java is install your machine if not, you will be Java website and download/install there:
https://www.oracle.com/java/technologies/downloads/#java8
After downloading and installing the Java, go to command prompt and check the Java version
Step 2: Open Hadoop Website and download the 3.x.x version. I use 3.2.4 version for this example.
https://dlcdn.apache.org/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
After the download copy the zip file into the C drive and extract in here. (it takes almost 1 minute⏱)
Step 3: Set Path in Environment Variables for Java and Hadoop programs.
Step 3–1:
- Click Advanced System Settings
- Click Environment Variables
- Click User Variables New Button
- Add the Variable Name (HADOOP_HOME) + Variable Value (C:\hadoop-3.2.4)
- Click path into the User Variables and add the path (%HADOOP_HOME%\bin)
Step 3–2:
- Click Advanced System Settings
- Click Environment Variables
- Click User Variables New Button
- Add the Variable Name (JAVA_HOME) + Variable Value (C:\Program Files\Java\jre1.8.0_271)
- Click path into the User Variables and add the path (%JAVA_HOME%\bin)
Step 4 We arrange and configure our documents
Step 4–1: Configure core-site.xml document
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the core-site.xml document and click edit. On notepad document you will be add this code between the and save it.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Step 4–2: Configure mapred-site.xml document
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 5 Create folder datanode and namenode under data directory
We will create 3 folder:
- Create folder “data” under “C:\hadoop-3.2.4”
- Create folder “datanode” under “C:\hadoop-3.2.4\data”
- Create folder “namenode” under “C:\hadoop-3.2.4\data”
Step 6: Edit HDFS and YARN files
Step 6–1: Configure hdfs-site.xml document
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>hadoop-3.2.4/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>hadoop-3.2.4/data/datanode</value>
</property>
</configuration>
Step 6–2: Configure yarn-site.xml document
Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the mapred-site.xml document and click edit. On notepad document you will be add this code between the and save it.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Step 7: Set Java Home environment in Hadoop environment
Step 7–1: Enter: C:\hadoop-3.2.4\etc\hadoop and right click on to the hadoop-env.cmd document and click edit. On notepad document you will be chane this code structure and save it:
@rem The java implementation to use. Required.
set JAVA_HOME=C:\Progra~1\Java\jre1.8.0_271
@rem The jsvc implementation to use. Jsvc is required to run secure datanodes.
@rem set JSVC_HOME=%JSVC_HOME%
✅ C:\Progra~1\Java\jre1.8.0_271
❌ C:\Program Files\Java\jre1.8.0_271
after the configuration if we start to made with C:\Program Files\Java\jre1.8.0_271 for this configuration:
hadoop -version
In this example, since the JAVA_HOME path contains spaces, I received the following error:
JAVA_HOME is incorrectly set
Use “Progra~1” instead of “Program Files” Use “Progra~2” instead of “Program Files(x86)” After replacing “Program Files” with “Progra~1”, we closed and reopened PowerShell and tried the same command.
Step 7–2: Enter website: https://github.com/s911415/apache-hadoop-3.1.0-winutils and download zip folder.
After the download, extract the zip folder, copy folder bin and replace existing bin folder in C:\hadoop-3.2.4\bin folder. (click replace 4 item and accept)
Step 8: Complete the set up and run the test it
Step 8–1:Enter C:\hadoop-3.2.4\bin when inside the folder, click the path area and write cmd
hdfs namenode -format
and when we run the code and see this result, looks like everything is ok 🙂
Step 8–2:Enter C:\hadoop-3.2.4\sbin when inside the folder, click the path area and write cmd
start-all.cmd
and will be see 4 result to ok for this step:
- Namenode cmd will be open and logs start to run
- Datanode cmd will be open and logs start to run
- Resourcemanager cmd will be open and logs start to run
- Nodemanager cmd will be open and logs start to run
when we run the code and see this result, looks like everything is ok again 🙂🙂
Conclusion
On Hadoop Web UI, There are three web user interfaces to be used:
- Name node web page: http://localhost:9870/dfshealth.html
- Data node web page: http://localhost:9864/datanode.html
- Yarn web page: http://localhost:8088/cluster
If you have any thoughts, suggestions please feel free to comment or if you want, you can reach me at guvezhakan@gmail.com, I will try to get back to you as soon as I can.
You can reach me through LinkedIn too.
Hit the clap button 👏👏👏 or share it ✍ if you like the post.