Installing Hadoop on Ubuntu 20.04

tanut aran

Published in

CODEMONDAY

3 min readJan 31, 2022

Below I wrap up how to installing process. This is good for experimental NOT production at all.

What you need to do

Install Java
Download Hadoop
Set environment
Edit Hadoop XML
start-dfs.sh
start-yarn.sh

If success you will see

localhost:8088 → See Hadoop icon screen
localhost:9870 → See cluster status screen

Install Java

Update and search for the new JDK.

If you are not familiar with Java, ignore its term we only need JDK.

sudo apt update
sudo apt-cache search openjdk

Latest LTS is 11 so I will install 11

sudo apt install openjdk-11-jdkjava -version
javac -version

Download Hadoop

Visit link below. In the command line you will need wget <link> to download it. Extract it to your home directory.

Choose the newer version. Here is 3.3.1 then choose the tar.gz

Apache Downloads

We suggest the following site for your download: https://dlcdn.apache.org/hadoop/common/ Alternate download locations…

www.apache.org

Setup Environment

Setting the variable for Hadoop and also path for convenient calling of Hadoop command in .bashrc

export HADOOP_HOME=/home/hadoop/hadoop-3.3.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

At this point you must be able to call the following binary from anywhere

hadoop
hdfs

Edit Hadoop XML and Start

I think Hadoop page is good already. Link below.

Some quick overview will make reading easier

Hadoop will ssh to localhost so you will need to setup SSH key
You need pseudo distribution mode
Copy paste XML from Hadoop guide
Start DFS
Start YARN

Hadoop: Setting up a Single Node Cluster.

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform…

hadoop.apache.org

Common error: JAVE_HOME not found

JAVA_HOME need to be set in

etc/hadoop/hadoop-env.sh

NOT in .bashrc !

Common error 2: Cannot start YARN

Error when start-yarn.sh

resourcemanager is running as process 48888. Stop it first and ensure /tmp/hadoop-hadoop-resourcemanager.pid file is empty before retry

Resource manager still running despite stop-dfs.sh so you need to stop ALL

stop-all.sh

Note

Just leave the process like that seem like we do not need to run it with systemctl or service as we usually do.

Check the status page

Finally you must see the result like below

Some tips if you deploy it on the server.

Use ssh to forward it down to localhost then open it with your browser.

ssh -L 9870:localhost:9870 -nNT ubuntu@<your-server-ip>ssh -L 8088:localhost:8088 -nNT ubuntu@<your-server-ip>

Hope this helps !