AWS-Kubernetes cluster automation with Terraform and Kops


We all have read hundreds of hello-world tutorials which their goal was introducing Kubernetes, showing the power that can bring to developers and operations teams with its scaling capabilities, allowing faster development cycles.

Those tutorials are really useful to start playing with the technology, then you learn it and finally you wonder what are the best practices to run it in production.

A common pattern I’ve seen when deploying Kubernetes in production environment, it’s trying to keep the underlying infrastructure behind corporate firewalls or VPN.

A cluster it’s just a set of virtual machines and firewall rules, that all together make Kubernetes work. So, Why would we need to expose the underlying infrastructure to the Internet? Of course, services deployed inside might be exposed with a Load Balancer, but there’s no need to expose the VMs where the Kubernetes masters and workers are deployed.


Kops (Kubernetes Operations), is a tool to create, mantain and destroy Kubernetes clusters in AWS, GCE and Vmware Vsphere — for the purpose of this tutorial we will talk about AWS only — . Kops has the concept of private topology where the cluster runs in an isolated network.

Kops can create the network for you, then you will need to connect to your corporate VPC and modify your routing tables in all subnets. A better approach is to tell Kops the VPCs, subnets and NAT gateways to use, so it does not create additional resources and it’s already connected with your corp network.

You can export the existing cluster definition to Terraform files, but I found it easier and cleaner use YAML files for the cluster and instance group definitions.


In order to save money and have all steps automated we have the constraint of being able to start from scratch on a daily basis for non-production environments. We use Terraform to manage the infrastructure so it’s easy to create and destroy. But every time you re-create things you have new VPCs, subnets and NAT gateways, so you have to modify the cluster definition.

One nice feature that Terraform has it’s template rendering. So you could have the cluster and instance groups definitions templated and let Terraform to populate it with the right VPCs, subnets and NAT gateways.

The following snippet shows a Terraform template example, that will be the source for the Kops cluster.yaml definition. The same applies for workers and masters instance groups.

############################################
# NOTE: This file is managed by Terraform #
# Don't make any manual modification to it #
############################################
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2017-08-07T00:00:00Z
name: ${cluster_name}
spec:
[...]
configBase: s3://${kops_state_store}/${cluster_name}
sshAccess:
- ${vpc_cidr}
- YOUR_CORPORATE_NET1
- YOUR_CORPORATE_NET2
- [...]
kubernetesApiAccess:
- ${vpc_cidr}
- YOUR_CORPORATE_NET1
- YOUR_CORPORATE_NET2
- [...]
kubernetesVersion: ${k8s_version}
masterInternalName: api.internal.${cluster_name}
masterPublicName: api.${cluster_name}
networkCIDR: ${vpc_cidr}
networkID: ${vpc_id}
networking:
weave: {}
nonMasqueradeCIDR: 100.64.0.0/10
subnets: subnets:
- cidr : ${priv_subnet_cidr_1}
name: mysubnet-${priv_subnet_az_1}
type: Private
id: ${priv_subnet_id_1}
zone: ${priv_subnet_az_1}
egress: ${nat_gw_id_1}
[...]

The snippets below shows how to declare, initialize and render a template resource:

data "template_file" "cluster" {
template = "${file("${path.module}/templates/cluster.tpl")}"
vars {
kops_state_store = "${var.kops_state_store}"
k8s_version = "${var.k8s_version}"
cluster_name = "${var.cluster_name}"
vpc_id = "${var.vpc_id}"
vpc_cidr = "${var.vpc_cidr}"
priv_subnet_id_1 = "${var.priv_subnets_id[0]}"
nat_gw_id_1 = "${var.shared_nat_gw_id[0]}"
}
}
[...]
// Rendering templates
resource "null_resource" "export-cluster-rendered-template" {
triggers {
template = "${data.template_file.cluster.rendered}"
}
provisioner "local-exec" {
command = "cat > ${path.module}/manifests/${var.stage}/cluster.yaml <<EOL\n${data.template_file.cluster.rendered}EOL"
}
}

That way, you can destroy and re-create everything from scratch without the hassle of manually editing all definition files.


To have full automation and being able to destroy everything there might be a couple of extra things to do:

  • Kops requires a bucket to store the cluster state. If you delete the cluster, it will keep that data inside, so you can recover from it. If you are creating the bucket with Terraform (as we do), you will not be able to delete the bucket as it is not empty. A simple bash script can help:
delete_cluster_config() {
bucket=$1
echo "Removing all versions from $bucket"
versions=`aws s3api list-object-versions --bucket $bucket |jq '.Versions'`
markers=`aws s3api list-object-versions --bucket $bucket |jq '.DeleteMarkers'`
let count=`echo $versions |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing files"
for i in $(seq 0 $count); do
key=`echo $versions | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $versions | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
let count=`echo $markers |jq 'length'`-1
if [ $count -gt -1 ]; then
echo "removing delete markers"
for i in $(seq 0 $count); do
key=`echo $markers | jq .[$i].Key |sed -e 's/\"//g'`
versionId=`echo $markers | jq .[$i].VersionId |sed -e 's/\"//g'`
cmd="aws s3api delete-object --bucket $bucket --key $key --version-id $versionId"
echo $cmd
$cmd
done
fi
}
  • When you want Kops to use your subnets, you have to tag them with: KubernetesCluster = mycluster.example.com and kubernetes.io/role/internal-elb = "" . In a multi Availability Zone setup, where you allocate 1 subnet per AZ, these make Elastic Load Balancers to join all networks, having a NIC in all subnets. But, when you want to destroy the cluster, Kops looks for the KubernetesCluster tag and tries to delete it, since there are dependencies like NAT gateways, that were created by Terraform and don’t have those tags, Kops fails to delete it, giving up after some minutes. So before running kops delete , you can just create a bash script like the following:
ids=$(aws ec2 describe-subnets --filters "Name=tag-key,Values=KubernetesCluster" --filters "Name=tag-value,Values=mycluster.example.com" | jq '.Subnets[] | .SubnetId')
echo $ids | xargs -I {} sh -c 'aws ec2 delete-tags --resources {} --tags "Key=KubernetesCluster,Value=mycluster.example.com"'

With all these steps we will save money for pre-production environments and we will have a consistent and automated way to go to production.