Part 2: Install and Setup a 3-node Hadoop Cluster Cloudera

Mariano N. Lirussi
Hexacta Engineering
6 min readSep 21, 2018

Let’s resume with our installation of a Hadoop cluster. First, login to the site we configured in the first part of the installation: http://master.hexacta.com:7180

Login with the default credentials

user: admin

pass: admin

It is important to read and accept the Cloud Manager License

Now, we arrive to the moment where we must select the edition of Cloudera that we want. For the purpose of this document, we are going to select the option “Cloudera Express”.

Then we can receive some interesting topics and information about the versions and requirements.

The time has come to for us to specify the hosts that we are going to configure as nodes for our cluster.

This specification has a relation with the configuration of /etc/hosts that we saw in the first part of the installation.

This is why we are only going to put the IPs of the Master and the nodes.

Now we will see if the hosts are ready for installation and running as needed. If all is well, we continue with the configuration of the nodes.

In the following step, we are going to select the repository method and any additional parcels we need to add to those already in the CDH suite. By default, we select the options ‘Use Parcels’.

To continue we have to read and accept the license of Oracle JDK to be able to install and use it.

Next, we have the option to enable ‘single user mode’. In this case, we leave it as it is without enabling.

We have to configure the user and his credentials for the automatic management of the master and the nodes. Let’s configure it with a user other than root. Select ‘another user’ and put the user ‘Hadoop’.

Select ‘all host accept same private key’ and load the id_rsa.pub file of the Hadoop user that we created in the master in the first part of the installation. We load the passphrase if we configured any.

In case you don’t want to use the SSH key file you also have the option ‘All hosts accept same password’ and we complete the password field. Clearly, the Hadoop user must have the same password in all nodes.

The other parameters are left with their default values unless we have to change them.

In the next screen, we see the process of installation and agents.

Once you successfully complete the installation on the cluster nodes, let’s continue with the installation.

Now we have to see the process of installing the parcels in each node.

If everything went ok, we are going very well. Now we are going to inspect the installation in the nodes, this process might take some time. After this, we will have successfully finished the installation of our cluster in all the machines where we plan to run it.

Cluster Setup:

Great, now we are going to start with the Cluster Service Setup. Here we can select the option that best suits our needs. We can select if we need some extra service besides those from Core Hadoop.

Once we know what services we are going to run in our cluster, we can configure the assignment of them within our nodes. There are several possibilities to set these preferences but in this opportunity, we are going to pass them by since they can be configured later from the options from the Cloudera manager.

Awesome!

Now we must configure and test the connection to the database. Some Hadoop Core services need a Posgresql to run their service (like Hive). There are two ways to run the database:

  • In an external Postgresql: this option is the right one to run a Hadoop cluster installation in production.
  • The ‘Embedded Database’ option, which is the option we chose in this guide to continue without additional resources.

After selecting the Embedded Database, we test the connection to Posgresql. If successful, we are ready to continue.

We are almost finished but we still have to check and customize any additional parameters.

As like the block size of HDFS or the tolerance of failures in the volumes among others.

In general, most of these options can be configured from the system already installed, so we will continue without making changes.

Very well done!

We arrived at the process of installation and execution of each of the services we decided to install and with the preferences we selected.

If we get to this point, it means that we already have our Hadoop cluster installed, configured and running successfully.

Congratulations! On the dashboard screen, we have our Hadoop cluster with cloud manager installed, working and ready to work. You can get some notifications like the Postgresql database embedded as we selected in our installation, for each of the notifications the system offers a description and documentation to know how to address the problem.

Good luck and enjoy the Hadoop cluster.

See you next time.

--

--