SaltStack: Install Ambari (HDP 2.4.2) and pre-configure Ambari client nodes

Hadoop cluster with seven Raspberry Pi (source: Adafruit’s blog)

The best thing I found in Ambari is I can deploy and/or configure a Hadoop cluster (and Hadoop ecosystem) with a few clicks. Also, it monitors all nodes and selects the best resources for YARN (if you have a huge cluster to store files, why won’t you want to run processes there?). But installing Ambari and pre-configuring the nodes can be tedious — and SaltStack is here to help us.

In next posts I’ll write about SaltStack and security solutions for your Hadoop cluster: authorization through Kerberos, HDD encrypt with LUKS and OpenVPN to access the applications’ web UI.

Before proceeding I strongly recommend to use CentOS 7 as OS, because I found issues with Debian 7.X (and Debian 7.11 is the latest release that had support from Debian team).

Step I: Configure the SaltStack environment

The cluster has this design: one server for Ambari, one server for NameNode (NN) and four servers for DataNode (DN).

This process uses the Java JDK 8u60 and you should store it in your SaltStack directory, for CentOS:

mkdir /srv/salt/rpm
wget -P /srv/salt/rpm \
--header='Cookie: oraclelicense=accept-securebackup-cookie' \ http://download.oracle.com/otn-pub/java/jdk/8u60-b27/jdk-8u60-linux-x64.rpm

If you want to install the JDK 8u60 in Debian 7 see this Gist.

First, create an RSA passwordless key

ssh-keygen -t rsa -b 2048 -C 'ambari-YYYYMMDD' -N ''

Then create the pillar file named /srv/pillar/ambari.sls

And finally copy the next content to file named /srv/salt/ambari/init.sls

Step II: Deploy the Ambari server and the pre-requisites for Ambari clients

Run

salt '*' state.sls ambari 

Step III: Setup the Ambari server

At Ambari master server, run

ambari-server setup 

and select these options:

  • Customize user account for ambari-server daemon? NO
  • Custom JDK (Path to JAVA_HOME): 
    CentOS: /usr/java/jdk1.8.0_60
    Debian: /usr/lib/jvm/j2sdk1.8-oracle
  • Enter advanced database configuration? YES
  • Enter choice database: 1 (PostgreSQL (Embedded))
  • Proceed with configuring remote database connection properties? YES

Finally start Ambari

/etc/init.d/ambari-server start 

Step IV: A new Datanode was added

Just run

salt '*' state.sls ambari