Kubernetes all the things Vol. 2

Milan Šťovíček
Careship Engineering
4 min readMay 4, 2021

At Careship, we successfully use Kubernetes to boost our performance. Here is how we didn’t let it use us. (Spoiler: we moved to Amazon’s managed service.)

Photo by Paul Teysen on Unsplash

The heart of our infrastructure

We’ve been running all our services with self-managed Kubernetes since 2019 (we described our old setup in https://medium.com/careship-engineering/k8s-all-the-things-b0cf7359aebb). Overall it’s been a great experience. The whole team got used to operational tasks very quickly. We added and improved deployment pipelines in Jenkins and monitored our clusters with Prometheus. We do it quite well, and we like it.

We no longer have a team member dedicated to running our infrastructure, which led to our clusters gradually becoming more and more outdated with more and more unplanned maintenance burden.

Kubernetes is the heart of our infrastructure, and we had to find a way to get it back on track.

The plan to move away from a self-managed cluster

We are a small team of 5 engineers, and we don’t have a dedicated system engineer. We’ve been looking at Amazon’s Elastic Kubernetes Service (EKS) for some time. The migration from a self-managed cluster to the managed service finally made it to the top of the backlog. We regrouped and traced a high-level plan:

  1. Launch EKS cluster for inhouse systems
  2. Set up user permissions and deploy stateless workloads
  3. Adapt the stateful workloads and move them to EKS
  4. Change DNS configuration and test everything
  5. Repeat the process for the production cluster

Migration and obstacles

We manage our infrastructure with Terraform. To define and spin up the new cluster, we used https://github.com/terraform-aws-modules/terraform-aws-eks. The module is well documented and simple to use. There were a few hidden obstacles, though.

EKS requires specific VPC subnet settings. We wanted to keep our worker nodes in private subnets and only a load balancer in the public one. We used an existing VPC (Amazon Virtual Private Cloud), and it took a bit of work to change the tags of our existing subnets.

Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource. (Source: https://kubernetes.io/docs/concepts/services-networking/ingress/)

Ingress isn’t part of EKS, and we had to install it ourselves. We used NGINX Ingress in our self-managed clusters, so it was a simple choice. The documentation provides a Kubernetes manifest YAML file with all necessary resources. We copied the YAML file to the repository, where we store all other resources and installed it via Ansible and kubectl. We’ve made one change to the original ingress YAML file — we replaced ingress-nginx-controller Deployment with a DaemonSet. Running a pod of the ingress controller on each node of the cluster makes the deployment more stable when worker nodes are replaced.

The manifest describes (besides other Kubernetes resources) a service of the type “LoadBalancer”. That’s a signal for EKS to create a network load balancer. It makes it simpler because we don’t have to create a load balancer and connect target groups as we did in the self-managed cluster. But now, we have a network load balancer instead of an application load balancer.

Network Load Balancer works on OSI layer 4 and cannot route the traffic the same way Application Load Balancer can. Fortunately, the routing is defined as Ingress configuration and handled inside a cluster. We had to migrate only two rules: HTTP to HTTPS and careship.de to www.careship.de redirections.

We use AWS Route 53 to manage DNS records, and it was effortless to create a new subdomain and connect it with a newly created load balancer. That allowed us to test the new cluster in an isolated environment and keep inhouse operations (continuous integration, release processes and testing) unaffected. When the new cluster was up and running at the end of the workload migration, we simply switched DNS records.

Once the EKS cluster was up and running, we started moving workloads — Kubernetes services and jobs. It went well for workloads that use Relational Database (RDS) or the network file system. It didn’t go so well for pods storing files on a worker node, expecting to always run on the same node. We solved this with AWS Elastic File System, EFS (NFS equivalent to back Persistent Volumes defined in Kubernetes).

We manage Kubernetes resources as YML files with Ansible. Ansible playbooks create a folder structure with YML files describing all Kubernetes deployments, jobs, services, etc. And our Jenkins uses the kubectl client to apply the changes to the Kubernetes cluster.

Once we were familiar with the EKS structure, decoupled all workloads from worked nodes and deployed them in our inhouse cluster, it was straightforward to repeat the process for our production cluster.

We are still struggling to set up correct permissions to see EKS clusters in AWS Console. It’s not pressing at the moment, however, and we can manage everything via the terminal. We will come back to that sometime in the future.

Closing thoughts

Thankfully, we run a vanilla Kubernetes without any special features or setup. It made the migration process from a self-managed Kubernetes cluster to AWS Elastic Kubernetes Service (EKS) relatively simple. We migrated all our workloads as well as internal and customer-facing applications without any downtime. It took us a little over one week of engineering time.

Our EKS clusters will now automatically receive regular updates and can easily scale with us. And because Kubernetes is a system of choice nowadays, we feel confident that if anything bad happens, we can spin up another cluster, move data and be up and running in a short time.

We are a small team, and we can’t manage all parts of our system ourselves. However, it’s not our intention to run everything in managed services. Some we can run cheaper, others better optimized for our use cases. It is an ever-changing equation of costs, time and team performance.

--

--