Building production grade EKS clusters using Terraform

Abstract

Ciriaco López
Devgurus
3 min readFeb 3, 2022

--

At Devgurus (A DMI Company), we build MACH-oriented (microservices, API First, Cloud-First, Headless) solutions, mostly running on containers. For that reason, we usually spin up Kubernetes clusters to deploy our workloads in a repeatable, scalable way.

One of the struggles that we had building Kubernetes clusters on AWS are the baseline configurations for production-grade workloads. Unlike, for instance, GKE, Amazon EKS doesn’t offer a complete, managed Kubernetes experience. Instead, it offers a managed control plane, with a set of separate building blocks that we can use to create a working cluster.

So, why create another Terraform module?

Since we’re spinning new AWS EKS clusters back and forth, with different configurations and standards, at Devgurus, we’ve thought about automating the setup of these components into a complete Terraform module that gets you from zero to fully functional, production grade, batteries included Kubernetes Cluster.

We started with the excellent, well-known Terraform base module, and then, we added our own elements on top, including:

  • Cluster Autoscaler Helm installation and required, Minimal Privilege IAM Roles
  • Metrics Server, which is a requirement to make the Horizontal Pod Autoscaler work
  • Strong security group settings to allow only authorized traffic from the control plane to the nodes
  • All networking-related resources (VPC creation, NAT Gateway, and Internet Gateway setup…)
  • IAM Roles for Service Accounts, a way to map IAM Roles to Kubernetes Service Account, for secure, keyless pod access to AWS Resources

Using this module, we would end up with an infrastructure like this:

AWS Infrastructure to be created by our Terraform Module

So, how can we leverage it?

We’ve published the module here, so the usage is very simple:

On the self_managed_node_groups block, you can add as many node pools as you need, with different compute settings (maybe you need a pool with GPUs, mixed with some standard compute nodes).

On running spot instance clusters

For most non-production environments (and even production, if your workloads can handle node terminations!), it is a good idea to enable spot instances.

Using Kubernetes, it is really easy to run a “cattle, not pets” strategy, treating all servers as disposable at any given time without prior notice.

On top of making your workloads more resilient and capable of handling node failures, the main advantage of spot instances is their cost savings. In some cases, you can save up to 70% compared to using on-demand instances!

Wrapping up

This module is the result of months of work, with different eCommerce sites, leveraging our scalable, cloud-first solutions, achieving high levels of throughput and scalability. Here’s one example from one of our customers:

Looks good, doesn’t it?

There’s still lots of work to do on this module, including:

  • Monitoring and detailed metrics on Cloudwatch by default
  • Advanced security, zero trust settings for maximum security
  • Service Mesh integration (we use Istio on some of our workloads, but we manage it separately)

As always, this module is free to use on every workload, but if you need help setting up, scaling, and operating your microservices environments, feel free to reach out to us at https://dminc.com/contact/, so we can assess your situation and assist you.

--

--

Ciriaco López
Devgurus

DevOps Engineer at DMI (Digital Management, LLC)