Hadoop on Windows!

Niraj Jain
Analytics Vidhya
Published in
6 min readFeb 23, 2021

Hello folks! I am back with an new agenda,as you can get it from the title we are going to learn about installing and running few commands of hadoop on windows.I recently come across the hadoop framework,as it is in my course, name BDA(BigDataAnalytics) in my last semester of engineering.Primarily it seems hadoop works effortless with OpenSource OS like Ubuntu ..etc,i faced certain issues while installing hadoop on my windows10 laptop,so thought of given a shot a clear picture and to the point steps to install hadoop without any error (seamlessly).

What is an Hadoop?

Hadoop is an software framework especially build for handling bigdata of size ranging from petabytes to zettabytes of data.It was originally design with an inspiration from Google File System(GFS) and MapReduce.

There are two main components of hadoop framework they are:-

  1. Hadoop Distributed File System(HDFS)
  2. MapReduce

HDFS is the main storage unit of Hadoop,concerning with storing and accessing file system,it relies on principles of Distributed File System,and it is based on Master-Slave architecture.

It basically divides the data into a fixed block size of 128MB,and again it replicates each block so to ensure the fault tolerance (i.e. even if a node is not working ,there will be no harm to your data,data will always be there).

So Lets Jump over Installing Hadoop!

The first step of installing the hadoop is not installing hadoop!,haha kidding but you need to ensure that you have java install on your machine with correct version i.e Java8.

If you have higher version install java8 from here https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

and update your environment variables,and then check using java -version if it is properly installed.

After downloading Java,now you can download Hadoop version 3.1.4 from here: https://archive.apache.org/dist/hadoop/common/hadoop-3.1.4/hadoop-3.1.4.tar.gz

After installing you can now unzip the folder the view would be like this

Now,Setup the environment variables:

search for env like this

then click on Envrionment Varibales and then edit system variables;

copy the path of hadoop till the bin folder inside hadoop folder and create variable name as HADOOP_HOME as shown below:

Likewise,create a new User variable with variable name as JAVA_HOME and variable the path of bin folder in the Java directory.

Now we need to set Hadoop bin directory and Java bin directory path in system variable path.Edit Path in system variable.

click on New and add the bin directory path of Hadoop and Java in it.

Also put /sbin folder path which is inside hadoop folder to the system path variables.

Configurations

Now we need to edit some files located in the hadoop directory of the etc folder where we installed hadoop.The files that need to be edited have been highlighted.

  1. Edit the file core-site.xml in the hadoop directory.Copy this xml property in the configuration in the file.

*Please Note:in above file mention your machine name/along with your machine ip address.

2. Edit the mapred-site.xml and copy this property in the configuration

3. Create a folder ‘data’ in hadoop directory

create a folder with the name ‘datanode’ and ‘namenode’ in this data directory

4. Edit the file hdfs-site.xml and add below property in the configuration

Note:The path of namenode and datanode across value would be the path of the datanode and namenode folders you just created.

5. Edit yarn-site.xml and add below property in the configuration.

6. Edit hadoop-env.cmd and replace %JAVA_HOME% with the path of the java folder where your jdk 1.8 is installed

  • *Please Note:Here is where many people makes mistake as by default while installing java jdk ,it will store in Program<space>File folder ,so because of the space there is some issue with hadoop which doesn’t recognize it,so what we can do is simply put java jdk folder to new MyDrive folder in my case and put this path as environment variable both to JAVA_HOME system variable and system path variable.
  • Hadoop needs windows OS specific files which does not come with default download of hadoop.
  • To include those files, replace the bin folder in hadoop directory with the bin folder provided in this github link.
  • https://github.com/s911415/apache-hadoop-3.1.0-winutils
  • Download it as zip file. Extract it and copy the bin folder in it. If you want to save the old bin folder, rename it like bin_old and paste the copied bin folder in that directory.

NOW,Finally after such an hercules task we can now test if the hadoop is properly install by simply typing the commad hadoop version.

yeppy,since now it doesn’t throw an error,now we can say our windows machine has hadoop install on it.

Format the Namenode

Formatting the NameNode is done once when hadoop is installed and not for running hadoop filesystem, else it will delete all the data inside HDFS. Run this command-

hdfs namenode -format

now let us start with start the dfs and yarn command ,for this this cmd files are available in sbin,we have already given path of sbin in system path variables so we dont need to get inside the sbin folder to run the file.

This step involves starting the namenode and datanode with this commands:-

by perfoming above commands you will get four Apache Distribution windows.

now you can go to this url:http://localhost:9870/ where you can get status of your dfs,namenode and datanode.

Working with HDFS:-

I will creating a demo file.txt in my local file system ,inorder to put it in hdfs usjng hdfs command line tool.

I will create directory name /user using the following command

As now you can see new folder /user is created in hdfs.

Now,then we will create a txt file in our local file system ,so that we can put it to hdfs using following commands.

As,you can see now RLabInst.txt is now store in our hdfs file system.

To see it whats inside the file we can perform this command,

So,Congratulations!! we have successfully installed hadoop and perform few commands ;).Still there is more to do in hadoop we have just seen the glimpse of it,you can more commands from here https://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

I hope now there will be no error while installing hadoop on windows and if there is anything you can always comment here it down.Thankyou once again! 😃

--

--

Niraj Jain
Analytics Vidhya

Niraj Jain is Full Stack Developer,Backend Developer,speaker,writer. Learn UnLearn ReLearn <-