Part 1: How To install a 3-Node Hadoop Cluster on Ubuntu 16

Mariano N. Lirussi
Hexacta Engineering
4 min readAug 21, 2018

This guide is intended to provide a quick and easy way to install Cloudera Hadoop on local servers.

For this reason, we need to prepare the infrastructure environments prior to installation.

In this way, we assume that you have some basic knowledge on networking, linux administration and Apache Hadoop.

1.-Requirements

We recommend complying with the following minimum hardware requirements:

Master: CPU x6 core — 12Gb Mem — 80Gb HD

Node: CPU x4 core — 4Gb Mem — 80Gb HD

In this case, we configured the network at each node as follows:

master.hexacta.com 10.0.5.1

node1.hexacta.com 10.0.5.2

node2.hexacta.com 10.0.5.3

Once the network is ready, we make sure we have the latest updates on each node.

-:# apt-get update && apt-get upgrade -y
-:# apt install ssh rsync

2.- Config Hosts

The configuration of host names and their relationship to the corresponding IP addresses is a very important point to consider. These IP address will be used in the /etc/hosts file on all nodes.

On our case, we have it this way:

10.0.5.1 master.hexacta.com10.0.5.2 node1.hexacta.com10.0.5.3 node2.hexacta.com127.0.0.1 localhost127.0.1.1 localhost

3.- Create user and ssh-key

Now we must create the hadoop user on all nodes with sudo permissions. Besides, it should not ask for a password using the sudo command. For this, we use the commands to create the user:

-:# adduser hadoop

We added him to the group with sudo permissions.

-:# adduser hadoop sudo

Then in order not to require the passwd we must edit the sudo configuration with the command:

-:# visudo

Within the configuration, we must add the line for the hadoop user under the %sudo section as seen in the following line:

# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
hadoop ALL=(ALL) NOPASSWD: ALL

Now it is necessary to create the ssh key for the hadoop user so that the Master node can manage the Nodes securely remotely.

In the Master with the hadoop user session, we generate the key with the following command:

-:# ssh-keygen -b 4096

Then we copy the public key to the master and the nodes we want to install.

-:# ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@master.hexacta.com-:# ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@node1.hexacta.com-:# ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@node2.hexacta.com

4.- Swappiness

To avoid swappiness errors or warnings we will customize a kernel parameter.

We change it in the current execution with the command

-:# sysctl vm.swappiness=10

To persist this change in future restarts we add the configuration line at the end of the file

/etc/sysctl.conf :

-:# echo ‘vm.swappiness = 10’ >> /etc/sysctl.conf

NOTE: Swappiness is a Linux kernel parameter that controls the relative weight given to swapping out of runtime memory, as opposed to dropping pages from the system page cache.

So far, we have correctly configured the infrastructure for the installation of any Hadoop distribution system.

5.- Install Cloudera

Now let’s install Hadoop-Cloudera-Manager. The Cloudera Manager is an administration tool that will help you administrate the services on your Hadoop Cluster. There are a free and an Enterprise version. We used the free version to set up the whole cluster.

First, we need to download the installer of the latest version of Cloudera-manager

-:# wget http://archive.cloudera.com/cm5/installer/5.15.0/cloudera-manager-installer.bin

We have to change the installer permissions to be able to run it.

-:# chmod u+x cloudera-manager-installer.bin

Run the file with sudo to start the installation.

-:# sudo ./cloudera-manager-installer.bin

Cloudera-Manager-README:

This Readme gives useful details for the subsequent installation of Cloudera manager, such as the Linux versions it supports, let’s click on “next”.

This is the Cloudera Standard License, let’s click on “Next” after reading it.

We accept the license’s terms of use.

We click “next” to accept the license of the Oracle Java SE Plataform

Accept Oracle License

We expect the Cloudera Manager Server installation process to be completed

After the installation of Cloudera Manager is finished, we can continue with the second part of Cluster Setup by going to http://master.hexacta.com:7180/ for our example, with the user name: admin and passwd admin.

Installation successfully completed

After the installation is complete, access the site http://master.hexacta.com:7180/. We can continue with the installation and configuration of the Cloudera-Manager cluterization from that point onward.

On Part 2, we will be explaining how to set up the Hadoop Cluster with Cloudera Manager.

--

--