Setting up a local Hadoop cluster with Vagrant and Salt
First you need to download and install Vagrant. Vagrant needs a base box to start with so let’s add one
vagrant box add hashicorp/precise64
This will download a Vagrant box from a public box repo (https://vagrantcloud.com/) and store it in a local repo so that subsequence Vagrant projects can reuse this box.
Adding Salt formulas
Salt formulas are community developed Salt modules that you can drop-in your project. In order to get Hadoop up and running, we need 3 formulas
- oracle-java-7: install Oracle Java 7
- hostsfiles: append minion’s ip and host to /etc/hosts
- hadoop-formula: install and manage Hadoop services e.g. hdfs, mapred, yarn etc
I have combined those 3 into 1 single project so that we can get started quickly but you mix and match them anyways you like.
git clone git@github.com:hotienvu/hadoop-vagrant-salt.git
vagrant up
The Vagrant file in the project root directory defines our cluster. Take a look inside and you will see that we want to create 3 machines with the following names:
- master: The salt master. This instance orchestrates the installation and configuration of everything
- hadoop_master: this serves and a name node as well as a secondary namenode
- hadoop_slave_1: a data node
Run Vagrant up. After it’s done you should have 3 machines running locally under 10.10.1.11-13. You would be able to ssh into any of them
vagrant ssh salt_master
Notice that /srv/salt and /srv/pillar are created for you and synced with salt/root and salt/pillar in the host machine. You can make changes to Salt sls flies in the host machine and run state.highstate in the virtual machine and vice versa.
Let’s look at how the salt master is defined
config.vm.define “master” do |master|
master.vm.box = “hashicorp/precise64"
master.vm.network “private_network”, ip: SALT_MASTER_IP_ADDRESS
master.vm.hostname = “salt”
master.vm.provision “salt” do |salt|
salt.master_config = “salt/etc/master”
salt.install_master = true
salt.no_minion = true
salt.run_highstate = false
salt.colorize = true
end
end
The settings for the Salt master is declared in salt/etc/master. There is only one thing we want to add for now i.e. to tell the master to auto accept key everytime a new minion register itselfs
auto_accept=True
As for the minions, the following settings are applied:
mine_functions:
network.ip_addrs:
— eth1
network.interfaces: []
grains.items: []
grains:
roles:
— hadoop_master/hadoop_slave
Note that we want to defer running highstate by setting salt.run_highstate = false. This is because we want to add the master ip address to minion’s /etc/hosts file. Otherwise the minion won’t be able to find the master.
script = “sudo echo ‘#{SALT_MASTER_IP_ADDRESS} salt’ >> /etc/hosts; salt-call state.highstate”...
hs.vm.provision “shell”, inline: script
There is still one important thing to be done. The hadoop_master needs to talk to the slaves via ssh-ing. Since hdfs and yarn are run under “hdfs” and “yarn” user respectively, we need to add public keys to these user’s authorized_keys files. Fortunately Hadoop-formula already takes care of this for you. All you need to do is to create 2 dsa key pairs (1 public and 1 private) and copy them to salt/roots/hadoop/files/
- dsa-hdfs/dsa-hdfs.pub
- dsa-yarn/dsa-yarn.pub
And you’re done. Run Vagrant up again. After it’s done, ssh into salt_master and run jps command under root user and you should see Hadoop services up and running!
root@hadoop-master:/home/vagrant# jps
2134 Jps
1594 SecondaryNameNode
993 NameNode
1363 ResourceManager