Deploy native Kubernetes cluster via AWS CDK

Published in

Hallblazzar ：Developer Journal

7 min readAug 30, 2020

Preface

It has been a year after I assumed AWS Cloud Support Engineer. Looking back on the past 365 days, I feel none of them was unreasonably wasted. Peoples I met and cases/issues I worked on made me grow — just as Amazon’s philosophy, “it’s always Day 1”.

But I have to say that I really don’t have free time to do side-projects during this year. Challenges filled every single days — bleeding edge technology, monthly goals, anxious users and urgent response times. Working as a Support is really different from working as a Developer. At least under most of circumstance, pressures from users won’t directly apply to developer. I spent most of my time learning non-technical skills to handle user’s issues more smoothly. For instance, effective negotiate and build trust with users.

Now I’m getting little more used to working as a Support, so I start to have some free time to do some interesting things I’d like to do. In this topic, I’ll describe the progress that how do I use the AWS CDK to design a script to deploy a native Kubernetes(K8s) cluster on AWS. In addition to CDK, I also use kubeadm as the core to automate the whole deploying process. If you’d like to check the final script directly, please check the repository on my GitHub, kubeadm-CDK.

About this topic

What information are included?

How do I design CDK script and shell scripts to deploy native K8s cluster, and how to make them work.
Some technical issues(bombs💣) when I implement them.
Future works.

What information are NOT included?

Basic of AWS CDK and AWS services. If you’d like to know that, please consider refer to AWS official documents and samples. Otherwise, if you buy AWS Support plans, you could look for AWS BD/TAM/SA’s assistance or you could also consider create a support cases for guidance(may be I’ll be the one provide you assistance in the case 🤣 ).
Basic of K8s.If you’d like to know that, please consider refer to Kubernetes official document.
How to use CDK to deploy EKS. This topic is about deploy NATIVE K8s cluster 🤣.

1. How do I design CDK script and shell scripts, and how to make them work.

In my opinion, if I’d like to deploy an application or a service via CDK, I need to figure out:

What AWS services are required as infrastructure?
How to automate deployment process based on these infrastructure?

Infrastructure

Basically, for first question, to deploy a K8s cluster, the following AWS resources are required:

EC2 instances to serve as master and worker nodes.
Isolated VPC and subnets to ensure EC2 instances won’t be effected by existing VPC related configurations.

Based on resource above, I also want to:

Secure Control Plane to make administrators could only access it privately.
Open least ports for both control plane and worker node. They could work properly under the settings, and keep cluster way from unexpected traffics as far as possible.

Therefore, the following resources are additionally required:

An additional EC2 instance to serve as a bastion host. This bastion host should be the ONLY host could access all ports of all host on VPC.
Security groups to satisfies network security requirement.

Once making sure what resources are required, then with CDK’s help, basic deployment scripts could be constructed.

K8s cluster deployment

Now the problem is, how to deploy K8s cluster on EC2 instance automatically?

There are many existing K8s cluster deployment approaches. For instance, kubespray, Rancher or Ubuntu Juju. After spending many efforts on testing and surveying for long time, I finally decided to use kubeadm. The reasons are:

This tool is officially developed, maintained and supported by Kubernetes community.
I could gain more control over installation than other tools, but it also doesn’t require me to manage all details as directly install everything by myself.
Few dependencies and configuration for dependencies are required.
I could simply use shell script to automate deployment process with kubeadm. To install K8s cluster via kubeadm, I just need to follow instructions in kubeadm installation guide and cluster creation guide. Put these instructions in shell script and inject to EC2 user data looks simple.

Based on kubeadm’s workflow, CDK scrips and shell script should follow the order below:

But things are always not as simple as I thought … .

2. Issues

💣Issue 1. Security group rule.

My first problem is, Pods on Kubernetes cluster created by my script cannot access the internet. In general, after bootstrapping a K8s cluster, the most important thing is making sure network connectivity. It will effects applications and services could work properly or not. The problem is, if I create Ubuntu Pods on cluster, and perform commands like ping, curl in these Pod via kubectl exec, I could observe that traffics below couldn't be successfully established:

Pod <-> Internet [ping IP address was allowed, but ping specific domain name failed]
Pod <-> Pod [ping both cluster IP or Service/Pod DNS name failed]
Pods were unable to resolve DNS records or even access CoreDNS Pod.

The first thing came up to my mind was security group. Actually, when I planned security groups rules, I just simply followed the port tables in kubeadm’s document. But one thing the document didn’t mention was, for cross-worker node traffic, ports would randomly be used.

It relates to how K8s network works. When Pods sits on different worker nodes, if these Pod would like to communicate with each other, CNI plugin on these worker node will convert source and destination IP address between cluster IP and host IP to ensure packets could reach destination. Under the situation, port in packets will still be retained. Therefore, if worker nodes don’t expose all ports to each other, then cross worker node traffic will be blocked, and connections between Pods cannot be established. That is the reason I add the function, __attach_inter_worker_access_rule() in security group setting.

After attaching the rule, it looked all network connectivity issues were gone. But the happiness didn’t last too long — when I deploy the CDK script again, different issue occurred.

💣Issue 2. Node taint issue.

The new circumstance was, network connectivity became unstable. The syndromes were the same as the ones I encountered previously, but with different behavior:

For Pod <-> Pod connectivity, connection intermittently lost. It means, if some Pods could connect to each other, but some couldn’t.
For DNS resolving, failure also occurred intermittently. Some Pods could resolve DNS record and connect to CoreDNS Pods, but some couldn’t.

So I had 2 choices: use TCPDump to analyze packets, or figure out what configurations led the situation. Finally, I chose the latter one. The reason was, clusters were constructed from scratch, so it shall not suffer from network issues. Analyze progress and setting I used was a good starting point.

My first decision was trying other CNI Plugins. In the beginning, the one I used was Flannel, a simple and reliable solution. However, the issue forced me try other options such as Weave Net and Calico, but they were still unable to solve it. Therefore, I thought CNI Plugin might not be the problem.

Then I started thinking OS-level issue. Basically, the one I used to create instance was Ubuntu 20.04(Focal Fossa). Thought I could successfully bootstrap K8s cluster, according to kubeadm installation instructions, it looked Xenial(Ubuntu 16.04) should be the one the APT packages source the kubeadm packed for. However, changing OS version was still unable to solve the issue. I also tried Ubuntu 18.04(Bionic Beaver) but problem did still persist.

As a result, I did the following things:

Tried using different OS and CNI Plugins combinations.
Tried adding different wait and delay condition in shell scripts for worker nodes and master nodes.

These attempts took me almost all 1-week after after-work times! The main reason was, CDK is based on AWS CloudFormation. Using it to creating and deleting resources is significantly SLOW. In my case, it took me at most half of hour to create EC2 instances(only EC2 instances!!!!💀💀💀💀💀)(wait/delate condition time hadn’t been counted!!!!💀💀💀💀💀).

When I started to decide to give up the kubeadm, it occurred to me that I should verify nodes the Pods with network connectivity issue, and I found the root cause — those Pods were created on master nodes. Based on my security group rules(according to kubeadm recommendation), master node only allows traffics from worker nodes to access its 6443 port. Therefore, if Pod were scheduled on master node, being unable to establish connection with CoreDNS and Pods on worker nodes was expected.

However, according to kubeadm troubleshooting guide, by default, the node-role.kubernetes.io/master:NoSchedule taint will be applied to control-plane nodes. For that reason, Pods SHALL NOT able to be scheduled on master nodes. To verify that, I added the following settings to kubaadm configuration yaml file:

And it WOKRED!!!! Such a great document!!!! Thanks, Kubernetes!!!!!

💣Additional Issue. Rook Ceph

To provide cluster created by script a persistent storage, I also add instructions to deploy Rook Ceph in scripts. But the problem is, no matter how many disk space did I allocate to EC2 instances, the ceph status command always throws the error that no disk space could be used(using rook-toolbox).

After verifying logs of each Pod created by Rook Ceph, I found messages mentioned that no disk could be mounted. Therefore, I tried attach additional disk to nodes in CDK scripts, and it WORKED 💀. However, I think I just solve the issue by dumb luck. To use Rook Ceph as persistent storage for K8s cluster, great understanding to Ceph is required. It will be one of my future work.

3. Future work.

As I mentioned in preface, you could see the final scripts in my personal GitHub, kubeadm-CDK. If you encounter issues while deploying, please feel free to let me know via GitHub issues. I’d be glad to provide assistance.

Besides, here are what could be improve for the project:

Allow the script to deploy multiple master nodes control plane.
More CNI plugin options.
More persistent storage option(based Rook).
Concrete IAM permission list for deployment.

If you’re also interested in the project, you also star it on my GitHub. Besides, any advice and feedback are welcomed!