Hadoop Installation on Windows WSL 2 on Ubuntu 20.04 LTS (Single Node)
Finally its possible to have your own Linux OS up and running in your windows backyard. Feature is available now for some time for Windows 10 pro and Windows 11 Insider program.
My configuration: Windows 11(Insiders) + VSCode ( light weight…directly support remote development and debugging with WSL, also edit files on Obuntu from Windows), 16 GB RAM
What is WSL ? Windows subsystem for Linux (and types). Windows manages hardware requirements for WSL (just like installing Docker/VM…but in this case taken care by Windows OS for us)
Enable Windows Subsystem for Linux from Windows Features :
Download Ubunto 20.04 (or any choice of Linux) from Microsoft Store. Once downloaded, Open from start menu or from store itself or open command prompt and type ubuntu2004. First time open installs OS asking for username/password (I gave hadoop/hadoop).
Browse Obuntu files in windows:
\\wsl$
I am using VSCode to do below commands, but below can be done directly on your Ubuntu app command prompt. If using VSCode, install Remote-WSL extention (by Microsoft). Open Terminal in VSCode, type ubuntu2004. Type code . (code + space+ .). This will install VSCode remote extension on your Ubuntu for editing files inside ubuntu OS- >
Once WSL ubuntu prompt is available, execute these commands one by one :
(NOTE: wherever you find localhost text in below commands, replace with yours, for instance prompt shows in following format user@localhost (hadoop@…)…so replace with … where localhost is written.
Install prerequisite software's:
sudo apt-get update
sudo apt-get install -y openssh-client openssh-server vim ssh -y
sudo apt install openjdk-8-jdk openjdk-8-jre
Open .bashrc file
code ~/.bashrc OR sudo nano ~/.bashrc
and add below 2 lines (as shown in Image, save and close):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
If permission to save denied then close without save and run below code and repeat above steps:
sudo chown -R hadoop ~/.bashrc
Download hadoop.3.2.1.tar.gz, unzip and move to /usr/local, add permissions ->
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gztar -xzf hadoop-3.2.1.tar.gz
sudo mv hadoop-3.2.1 hadoop
sudo mv hadoop /usr/local
sudo chmod 777 /usr/local/hadoop
Open the file
code ~/.bashrc OR sudo nano ~/.bashrc
and add following lines in end of file (Save and close):
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Reload the changes done above, create some directories for HDFS namenode. datanode, logs:
source ~/.bashrc
mkdir -p $HADOOP_HOME/hdfs/namenode
mkdir -p $HADOOP_HOME/hdfs/datanode
mkdir $HADOOP_HOME/logs
To edit series HFDS configuration files, change directory to the folder and open hadoop-env.sh:
cd $HADOOP_HOME/etc/hadoop
code hadoop-env.sh OR sudo nano hadoop-env.sh
Add this line in end of file (save and close):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Edit core-site.xml:
code core-site.xml OR sudo nano core-site.xml
Edit configuration to make it look like this (save and close):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000/</value>
</property>
</configuration>
Edit hdfs-site.xml:
code hdfs-site.xml OR sudo nano hdfs-site.xml
Edit configuration to make it look like this (Save and Close):
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hdfs/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hdfs/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
Edit mapred-site.xml:
code mapred-site.xml OR sudo nano mapred-site.xml
Edit configuration to make it look like this(Save and close):
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
Edit yarn-site.xml:
code yarn-site.xml OR sudo nano yarn-site.xml
Edit configuration to make it look like this(Save and Close):
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
</configuration>
Generate ssh key and add to authorized keys in Ubuntu:
cd ~ssh-keygen -t rsa -P " -f ~/.ssh/id_rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keyschmod 600 ~/.ssh/authorized_keys
Open these 2 files
code /etc/ssh/ssh_config OR sudo nano /etc/ssh/ssh_configcode /etc/ssh/sshd_config OR sudo nano /etc/ssh/sshd_config
and add in last line (save and close):
Port 2222
Open this file:
code ~/.ssh/config OR sudo nano ~/.ssh/config
and add below lines (save and close)
Host *
StrictHostKeyChecking no
Prepare Namenode for HDFS and restart ssh service:
hdfs namenode -format
sudo /etc/init.d/ssh restart
Finally Start hadoop by below command:
start-all.sh
Check whether all running or not:
jps
Type this command to check whether hdfs is working properly:
hdfs dfs -mkdir /temp
hdfs dfs -ls /
See that above command lists the new directory temp created under HDFS
On Restart of system, if you cant seem to run hdfs commands in Ubuntu, then try this lines:
sudo service ssh restart
start-all.sh