Kubernetes in AWS Production

Ryan Day
Wireless Registry Engineering
6 min readMay 23, 2016

This blog post focuses on overcoming the gap between a proof-of-concept Kubernetes deployment and a deployment that has to operate within an existing AWS setup that includes various APIs, databases, and so forth.

Introduction

Our pre-k8s setup was fairly simple. We deployed one binary (per AWS instance) to handle all our data input (InputAPI), and one binary to handle processing and consumption of the data (OutputAPI). This approach works great for our Input API, as these binaries do not change often and we get fairly good CPU usage due to a high-degree of concurrent processing.

In the beginning our Output API followed the same principal. The Output API, quickly grew in scope and functionality. This got us looking at containers, and the final straw came when we realized that we had to rebuild our graph processing nodes. The cleanest solution, we found, is to split the analysis problem into multiple concurrent services (>> than a number of AWS instances) that form a hash ring. In addition, we have started deploying quite a few microservices: ranging from batch DB fixes to various metric calculations. Finally, we also wanted to release without putting even more pressure on DevOPs.

We decided to run with the following architecture:

  1. Kubernetes (Google, growing support base)
  2. Jessie host OS (Honestly, we cannot find a reason why CoreOS is better than Jessie at running containers)
  3. Alpine Linux containers (small footprint and security focused)
  4. AWS (as we already have all our DB and API instances there)

We chose Kubernetes to manage our services cluster. It has a large user base, lots of documentation, and the community seems awesome. Cementing the idea was the article about the history of containers at Google, “Borg, Omega, and Kubernetes“. It is a great engineering read, and it clarifies many of the design decisions. This in turn makes it very easy to understand the good, the bad and the ugly aspects of Kubernetes.

Wireless Registry’s Kubernetes setup in AWS.

We picked Debian/Jessie as the operating system for the master and the minions. For the containers, Alpine Linux won (see the oft quoted comment). Alpine Linux is very tiny and strives to be secure. It provides a C library (musl) and BusyBox for basic unix utilities. We need very very little to run a Go binary, and Alpine Linux provides it.

Problem statements

This AWS script can, out-of-the-box, set up a simple proof-of-concept cluster. Unfortunately, it takes additional efforts to address the following issues:

  1. Deploying a Kubernetes cluster into an already existing AWS VPC setup.
  2. Allowing Kubernetes services to discover and access outside components (e.g., DBs, queues, etc) and vice versa.

Deploying a Kubernetes cluster

We started by digging through cluster/aws/options.md and cluster/aws/config-default.sh. These two files are a great way to understand how the cluster configuration really works. Note, there are environment variables in config-default.sh that are not documented elsewhere.

When deploying into an existing VPC, the first thing to do is plan the network layout: 1) Choose a CIDR that does not overlap with any existing subnets in the existing VPC; 2) Ensure that the DNS hostname support is enabled.

Note that the default Kubernetes deployment scripts do not expect that there is an already-existing addressing scheme, and thus can cause your network setup to fail. For example, the default NON_MASQUERADE_CIDR will prevent your network from operating if your VPC uses the 10.0.0.0/8 network. In short, clearly announce the non-conflicting CIDR in the script:

export NON_MASQUERADE_CIDR=”172.16.0.0/14"

Having set up the networking, edit the provisioning script to reflect it:

export SERVICE_CLUSTER_IP_RANGE=”172.16.0.0/16"export DNS_SERVER_IP=”172.16.0.10"export MASTER_IP_RANGE=”172.17.0.0/24"export CLUSTER_IP_RANGE=”172.18.0.0/16"

And then run the script to deploy the robot army.

When scaling out your k8s cluster, the autoscaling group uses a launch configuration that was created by the initial kube-up.sh script. This launch configuration can be modified to provide a custom setup for your minions. Aside from the standard AWS configuration, you will notice a User Data section of the launch configuration. This section is where Kubernetes stores a custom script that will be executed on minion provisioning. This script is gziped, and must be uncompressed before you can work with it. Using the AWS CLI tool, grab a copy of all your launch configurations. Locate your cluster configuration, and copy the base64 encoded UserData field to another file.

aws autoscaling describe-launch-configurations > launch-configs.json# Place encoded data in userdata-orig.b64cat userdata-orig.b64 | base64 -d | zcat > userdata-orig.sh

The script is failry simple and you can see how the entire provisioning process for a minion kicks off.

Some unexpected deployment issues

  1. Alpine C libraries. Alpine Linux <= 3.3 uses musl libc <= 1.1.12. This version of musl does not recognize the “search” and “domain” lines in /etc/resolv.conf. Kubelet is specifically configured to only require hostnames, and *search* for the configured DNS domain name. So Kubernetes basically doesn’t work with Alpine Linux <= 3.3. All containers must be based on Alpine:Edge at this point in time.
  2. Kafka. Kafka has a setting for the advertised host name and consumers must be able to resolve this hostname. Otherwise the connection is never fully established and consumers will never see messages, and will fail to shut down correctly.
  3. DNS Based Service Discovery. Just about all of Kubernetes requires DNS resolution to work. Your VPC has to have DNS names added automatically for your nodes to communicate. SkyDNS must register your cluster services for addons to work. Anytime there is an unexpected problem, DNS was usually at the root.

Service discovery

Kubernetes does already come with an intra-cluster service discovery whereby the master sets “service” environment variables for each deployed pod. Though basic, it works robustly in our current workflow. There is however no solution for a k8s service to discover a service outside the cluster, and vice-versa.

One approach can be to tightly couple a system to AWS (see this article for details). We have opted not to adopt this approach and instead decided to re-use our Zookeeper service that already stores token and configuration documents. The key idea is to have a service discovery table written in a Zookeeper document. The location of our Zookeeper cluster and the document name are provided to each Kubernetes artifact through the YAML configuration files.

Service discovery via ZooKeeper.

In more detail, the document is a list of services, defined as:

type Service struct {
Name string `json:”name”` // Unique name of service
Description string `json:”description”` // Human parsable description
Endpoints []string `json:”endpoints”` // List of hostname:port pairs
}

Following this approach, Kubernetes services and our legacy services can append themselves to the services file. Each Zookeeper client keeps a watch on the file in facilitate automated discovery. The only manual step that we have to is to register DBs and queues (Solr, C*, Kafka, etc).

We first implemented the described approach by having each Kubernetes service announce itself. We have since then moved to a simpler solution whereby there is a single Kubernetes service that listens on the Kubernetes’ master’s channel for new service announcements (see api references operations and watchevents). Every received announcement is then added/removed from the Zookeeper document.

A few notes on our k8s monitor

  • Long-poll connections will occasionally be dropped, make sure to properly handle the dropped connection and reconnect to the API. For example, in the code below the client reconnects whenever the valid channel is closed due to EOF.
  • Only announce services that have a cluster-external load balancer set-up.

Thanks to srdjan marinovic for feedback and edits.

--

--