Part 2: Critical decisions we made while migrating to Kubernetes

Private k8s API Endpoints and Helm

--

This post highlights some of the significant choices we made when migrating our infrastructure from AWS ECS and Convox to EKS. I’m part of the infrastructure team at the Haven Life Insurance Agency, where we have a rapidly growing engineering team, and a rapidly growing infrastructure.

In an effort to be more secure, during our Kubernetes migration we used Kubernetes private endpoints. Because we’re a life insurance agency, and security is paramount to us, using private endpoints was a choice that gave us the highest level of security. When we were using the Convox API, it required us to use API keys against a public endpoint. While API keys are a strong form of security, the risk of a key leaking was too great. So we were thrilled to move to Kubernetes private APIs, which removed both user API keys and public endpoints.

How we allowed our administrators to access the private endpoints

Private endpoints restricted much of our employees’ access, but the tradeoff is that we are more secure. However, we still needed our company’s administrators to have access to the clusters from their machines. We achieved this by port forwarding the internal Kubernetes API endpoints through SSH tunnels. This required the following three changes on our administrators’ machines:

  1. A LocalForward entry for each cluster in .ssh/config that mapped the private k8s API endpoint to a port on the host.
  2. An entry in /etc/hosts to forward. the internal API endpoint DNS to localhost.
  3. An update to the cluster.server entry in .kube/config to use the port specified in the SSH config (#1).
# Local forward entry in ~/.ssh/config. Exposes the private EKS endpoint on port 30000 locally
LocalForward 30000 <private EKS endpoint>.eks.amazonaws.com:443
# /etc/hosts entry to map private EKS endpoint DNS to localhost
127.0.0.1 <private EKS endpoint>.eks.amazonaws.com
# finally, update the local ~/.kube/config to use the port number specified in #1
clusters:
- cluster:
certificate-authority-data: #####,
server: https://<private EKS endpoint>.eks.amazonaws.com:30000
name: my-cluster

These changes were required for each of the clusters we run. This meant that each administrator needed to add and maintain identical changes to /etc/hosts and ~/.kube/config for each cluster on their local machine. To mitigate this, we encapsulated the config in a Docker image that the admins could use. Authentication to the cluster used AWS IAM (via aws-iam-authenticator), so we needed to pass the required AWS environment variables into the container when it was run like this:

docker run -it — rm \
-v `pwd`:/code \
-w /code \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_SECURITY_TOKEN=$AWS_SECURITY_TOKEN \
-e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN \
filepath/<k8s tools image> bash

The final piece of the puzzle was to execute a script at container startup that substituted the host IP into the /etc/hosts file in the container. We did this by having a static /etc/hosts template with entries like this:

DFM_HOST_ADDR      <private k8s API name>.<region>.eks.amazonaws.com

And then a script that substituted the DRM_HOST_ADDR string for the runtime docker for mac host address:

#!/bin/bash -exdfm_host_addr=$(dig +short docker.for.mac.host.internal)
cat /tmp/hosts.tmplt | sed s/DFM_HOST_ADDR/$dfm_host_addr/g >> /etc/hosts

Using private k8s API endpoints certainly makes access to the clusters more complicated, but it was a tradeoff we knew was necessary.

Helm and a Single Config Repository

Another significant choice we made while migrating from Convox to Kubernetes was to use Helm (a tool for templating Kubernetes config), and to migrate to a single config repository for all applications. Prior to our migration to Kubernetes, we had config files spread across app teams’ repositories, and some config in a common repository. This old distributed setup caused two main problems:

  1. Inconsistency: this applied to config values, layout and tooling versions (in this case docker-compose).
  2. Duplication: there were many cases where complete application configuration sections were copied from other apps. Without a means of visualizing the config as a whole, it was easy for the level of duplication to continue growing.

We started our Helm journey by creating half a dozen basic charts (for example, charts for applications, jobs and cron jobs, namespaces, and Ingress) as we migrated the first few apps. As the migration progressed and our knowledge grew, we were able to encapsulate the common configuration in an incremental way into these charts. This in turn yielded much more consistency between applications. Helm also gave us an easy means of publishing and versioning these base charts. As we transitioned to having the majority of apps running on the k8s platform, we used the versioning to safely roll out major or breaking changes required in the base charts.

Even with the per-resource type configuration abstracted into the base Helm charts, there remained a significant volume of configuration (mainly as environment variables) for each application. We debated having 1 config repository or having files in each application repository. There are pros and cons to both strategies, but we found that by using 1 repository for all configurations, it was easier to maintain consistency during our migration. As we added new apps to the single config repo, we were able to see when duplication was happening and factor out the configuration into reusable common blocks. We ended up with config that was significantly DRYer. We also took the opportunity to migrate any application config that had leaked into environment variables over time back to the application config files where it belongs.

One downside of the centralized config was how it reduced the visibility and awareness of the environment config for the application developers (because it was no longer co-located in the applications repository). The increased “distance” between the code and it’s config (cross repo and cross team) did lead to occasions where required config changes were missed, or mis-communicated.

On balance, we feel that the advantages of using a single repo (DRY-ness and consistency) outweighed these cons during our migration to k8s. Whether or not we move the config back to the individual repos now that the migration is complete is still to be decided.

--

--