Getting Started With Kubernetes : Part 3: KubeSpray : On Premise Installation Guide
In this session, we will be talking about another popular solution for creating on premise kubernetes cluster.
Kubernetes Cluster Setup with Kubespray & Ansible
1. Generate ssh key on host machine and transfer the keys to master and nodes to establish password-less authentication.
Here,
Master : 172.27.XX.1XX
Node1 : 172.27.XX.4XX
Node2 : 172.27.XX.8XX
Command :
ssh-keygen
Transfer the keys to all master and nodes.
Command :
ssh-copy-id root@<ip-address>
Note : In case you do not know password, just copy and paste the keys.
2. Get the kubespray repo on your machine and untar it.
Command :
wget https://github.com/kubernetes-incubator/kubespray/archive/v2.7.0.tar.gztar -xzf v2.7.0.tar.gz
3. Install the required packages as mentioned in requirements.txt.
Note : pip needs to be installed on the machine as a pre-requisite.
sudo pip install -r requirements.txt
4. Create a folder for your cluster and copy the required files from “sample” folder.
mkdir inventory/mycluster cp -rfp inventory/sample/* inventory/mycluster
5. Remove the unnecessary files. [OPTIONAL STEP]
Command :
rm -rf inventory/sample inventory/local/
6. Update Ansible inventory file with inventory builder
declare -a IPS=(172.27.XX.1XX 172.27.XX.1XX 172.27.XX.1XX 172.27.XX.1XX)
CONFIG_FILE=inventory/mycluster/hosts.ini python3 contrib/inventory_builder/inventory.py ${IPS[@]}
7. Update your host.ini file as per your master-node architecture
[k8s-cluster:children]
kube-master
kube-node
[all]
master ansible_host=172.27.XX.1XX ip=172.27.XX.1XX
node1 ansible_host=172.27.XX.XX ip=172.27.XX.XX
node2 ansible_host=172.27.XX.XX ip=172.27.XX.XX
[kube-master]
master
[kube-node]
node1
node2
[etcd]
master
node1
node2
[calico-rr]
[vault]
node1
node2
8. Run ansible playbook to create the cluster.
ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml
Addition of Node
Update hosts.ini [add a node in hosts.ini] and run below command.
ansible-playbook -i inventory/mycluster/hosts.ini scale.yml –flush-cache
ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml
Deletion of Node
It supports two ways to select the nodes:
Use — extra-vars “node=<nodename>,<nodename2>” to select the node you want to delete.
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml -b -v \ --private-key=~/.ssh/private_key \ --extra-vars "node=nodename,nodename2"
or
Use — limit nodename,nodename2 to select the node
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml -b -v \ --private-key=~/.ssh/private_key \ --limit “nodename,nodename2" Ex. :
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml -b -v --private-key=~/.ssh/private_key --extra-vars "node=node2"
Clean up
Normally running reset.yml will cater all the clean up but sometimes to completely reset everything we need to run both the command in below sequence.
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml --flush-cache
ansible-playbook -i inventory/mycluster/hosts.ini reset.yml –flush-cache
Issues & Solutions
Issue 1 : Permission denied
Solution : You need to give all root/sudo access to the user which is used by ansible to use the playbook.
Issue 2 : Python not found.
Solution : You need to check on master if python is installed on different folder or not present.
If python is not present on master, then install.
If python is installed at different location, create a symlink.
ln -s <src_dir> <destination_dir>
Issue 3 : The conditional check ‘hostvars[item][‘cluster_id’] == cluster_id’ failed.
Solution : Remove nodes from [calico-rr] and re-run below playbooks in sequence.
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml --flush-cache
ansible-playbook -i inventory/mycluster/hosts.ini reset.yml --flush-cache
Issue 4 : Failed to create ‘IPPool’ resource: resource already exists: IPPool(default-pool).
Solution : Run below playbooks in sequence.
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml --flush-cache
ansible-playbook -i inventory/mycluster/hosts.ini reset.yml –flush-cache
Issue 5 : apt-get update”, “msg”: “W: The repository ‘https://download.docker.com/linux/ubuntu xenial Release’ does not have a Release file.
Solution : Run below playbooks in sequence.
ansible-playbook -i inventory/mycluster/hosts.ini remove-node.yml --flush-cache
ansible-playbook -i inventory/mycluster/hosts.ini reset.yml –flush-cache
Issue 6 : fatal: [node3]: FAILED! => {“msg”: “Timeout (12s) waiting for privilege escalation prompt: “}
Solution : Enable Internet access on the machine where error is occurring.
Issue 7 : E:Unable to parse package file /var/lib/apt/lists/partial/us.archive.ubuntu.com_ubuntu_dists_xenial-updates_universe_binary-i386_Packages.diff_Index (1), E:Unable to parse package file /var/lib/apt/lists/partial/us.archive.ubuntu.com_ubuntu_dists_xenial-backports_main_i18n_Translation-en.diff_Index (1)”}
Solution : Run below commands on the machine which is giving error.
sudo rm -r /var/lib/apt/lists/*
sudo apt-get update